0% found this document useful (0 votes)
112 views

Dynamic Hashing

The document describes how extendible hashing works by using a directory of pointers to buckets. It allows buckets to be doubled by doubling just the directory and splitting only the overflowed bucket. Entries are searched for by applying a hash function to the key and using the last bits as a bucket number. When inserting an entry, if the bucket is full it is split and contents redistributed, and the directory is doubled if needed to accommodate more buckets.

Uploaded by

sreenu_pes
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views

Dynamic Hashing

The document describes how extendible hashing works by using a directory of pointers to buckets. It allows buckets to be doubled by doubling just the directory and splitting only the overflowed bucket. Entries are searched for by applying a hash function to the key and using the last bits as a bucket number. When inserting an entry, if the bucket is full it is split and contents redistributed, and the directory is doubled if needed to accommodate more buckets.

Uploaded by

sreenu_pes
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Directory of Pointers

 How else (as opposed to overflow pages) can we add a


data record to a full bucket in a static hash file?
 Reorganize the table (e.g., by doubling the number of buckets
and redistributing the entries across the new
 set of buckets)
 But, reading and writing all pages is expensive!

 In contrast, we can use a directory of pointers to buckets


 Buckets number can be doubled by doubling just the
directory and splitting “only” the bucket that overflowed
 The trick lies on how the hash function can be adjusted!
Extendible Hashing
 Extendible Hashing uses a directory of pointers to buckets
GLOBAL DEPTH
The result of applying a hash 4* 12* 32* 16* Bucket A

function h is treated as a 2
binary number and 00 1* 5* 21* Bucket B

the last d bits are 01

interpreted as an 10
10* Bucket C
 offset into the directory 11

DIRECTORY 15* 7* 19* Bucket D


d is referred to as the global depth
of the hash file and is kept as part DATA PAGES
 of the header of the file
Extendible Hashing: Searching for
Entries
 To search for a data entry, apply a hash function h to the
key and take the last d bits of its binary representation to
get the bucket number

 Example: search for 5*


2

00 Bucket A
4* 12* 32* 16*
01 Bucket B
5 = 101 1* 5* 21*
10
Bucket C
10*
11
Bucket D
15* 7* 19*
DIRECTORY
DATA PAGES
Extendible Hashing: Inserting Entries
 An entry can be inserted as follows:
 Find the appropriate bucket (as in search)
 Split the bucket if full and redistribute contents
(including the new entry to be inserted) across
the old bucket and its “split image”
 Double the directory if necessary
 Insert the given entry
Extendible Hashing: Inserting Entries
 Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry

 Example: insert 13*


2

00 Bucket A
4* 12* 32* 16*
01 Bucket B
13 = 1101 1* 5* 21* 13*
10
Bucket C
10*
11
Bucket D
15* 7* 19*
DIRECTORY
Extendible Hashing: Inserting Entries
 Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry

 Example: insert 20*


FULL, hence, split and redistribute!
2

00 Bucket A
4* 12* 32* 16*
01 Bucket B
20 = 10100 1* 5* 21* 13*
10
Bucket C
10*
11
Bucket D
15* 7* 19*
DIRECTORY
Extendible Hashing: Inserting Entries
 Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry
Bucket A
32* 16*
 Example: insert 20* 2

00
Bucket B
1* 5* 21* 13*
01
10
20 = 10100 Bucket C
11 10*

DIRECTORY
Bucket D
15* 7* 19*

Bucket A2
Is this enough? 4* 12* 20*
(`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
 Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry
Bucket A
32* 16*
 Example: insert 20* 2

00
Bucket B
1* 5* 21* 13*
01
10
20 = 10100 Bucket C
11 10*

DIRECTORY
Bucket D
15* 7* 19*

Double the directory and


Bucket A2
increase the global depth 4* 12* 20*
(`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
 Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry 32* 16* Bucket A
GLOBAL DEPTH

 Example: insert 20* 3


0 00 1* 5* 21* 13* Bucket B
001
These two bits indicate a data entry that
010
belongs to one of these two buckets
011 10* Bucket C
1 00
101
The third bit distinguishes between these
110 15* 7* 19* Bucket D
two buckets!
111

But, is it necessary always to DIRECTORY 4* 12* 20* Bucket A2


double the directory? (`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
 Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry
GLOBAL DEPTH 32* 16* Bucket A

 Example: insert 9* 3 FULL, hence, split!


000 1* 5* 21* 13* Bucket B
001
010
9 = 1001 011 10* Bucket C
100
101
110 15* 7* 19* Bucket D
111

DIRECTORY 4* 12* 20* Bucket A2


(`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
 Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry
GLOBAL DEPTH 32* 16* Bucket A

 Example: insert 9* 3
000 1* 9* Bucket B
001
010
10* Bucket C
9 = 1001 011
100
101 15* 7* 19* Bucket D
110
Almost there… 111
4* 12* 20* Bucket A2
(`split image‘ of A)
DIRECTORY
5* 21* 13* Bucket B2
(`split image‘ of B)
Extendible Hashing: Inserting Entries
 Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry
GLOBAL DEPTH 32* 16* Bucket A

 Example: insert 9* 3
000 1* 9* Bucket B
001
010
10* Bucket C
9 = 1001 011
100
There was no need to 101 15* 7* 19* Bucket D
double the directory! 110
111
4* 12* 20* Bucket A2
(`split image‘ of A)
When NOT to double the DIRECTORY
directory? 5* 21* 13* Bucket A2
(`split image‘ of A)
Extendible Hashing: Inserting Entries
 Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
LOCAL DEPTH
given entry 3
GLOBAL DEPTH 32* 16* Bucket A

 Example: insert 9* 3 3
000 1* 9* Bucket B
001 2
010
10* Bucket C
9 = 1001 011
100 2
101 15* 7* 19* Bucket D
If a bucket whose local depth 110 3
equals to the global depth is 111
4* 12* 20* Bucket A2
split, the directory must be (`split image‘ of A)
doubled DIRECTORY 3
5* 21* 13* Bucket A2
(`split image‘ of A)
Extendible Hashing: Inserting Entries
 Example: insert 9*
LOCAL DEPTH 3
Repeat… 32* 16* Bucket A
GLOBAL DEPTH
FULL, hence, split!
3 2
000 1* 5* 21* 13* Bucket B
001
010 2
9 = 1001 011 10* Bucket C
100
101 2
Because the local depth 110 15* 7* 19* Bucket D
(i.e., 2) is less than the 111
global depth (i.e., 3), NO 3
need to double the DIRECTORY 4* 12* 20* Bucket A2
directory (`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
 Example: insert 9*
LOCAL DEPTH 3
Repeat… 32* 16* Bucket A
GLOBAL DEPTH

3 3
000 1* 9* Bucket B
001 2
010
10* Bucket C
9 = 1001 011
100 2
101 15* 7* 19* Bucket D
110 3
111
4* 12* 20* Bucket A2
(`split image‘ of A)
DIRECTORY 3
5* 21* 13* Bucket B2
(`split image‘ of B)
Extendible Hashing: Inserting Entries
 Example: insert 9*
LOCAL DEPTH 3
Repeat… 32* 16* Bucket A
GLOBAL DEPTH

3 3
000 1* 9* Bucket B
001 2
010
10* Bucket C
9 = 1001 011
100 2
101 15* 7* 19* Bucket D
FINAL STATE! 110 3
111
4* 12* 20* Bucket A2
(`split image‘ of A)
DIRECTORY 3
5* 21* 13* Bucket B2
(`split image‘ of B)
Extendible Hashing: Inserting Entries
 Example: insert 20*
FULL, hence, split!
Repeat… LOCAL DEPTH 2
Bucket A
GLOBAL DEPTH 4* 12* 32* 16*

2 2
Bucket B
00 1* 5* 21* 13*

01
20 = 10100
10 2
Bucket C
11 10*
Because the local depth
and the global depth are
DIRECTORY 2
both 2, we should double
Bucket D
the directory! 15* 7* 19*

DATA PAGES
Extendible Hashing: Inserting Entries
 Example: insert 20*
Repeat… LOCAL DEPTH 2
Bucket A
GLOBAL DEPTH 32*16*

2 2

00 1* 5* 21*13* Bucket B
01
10 2
20 = 10100 11 10* Bucket C

2
DIRECTORY Bucket D
15* 7* 19*
Is this enough?
2
4* 12* 20* Bucket A2
(`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
 Example: insert 20*
LOCAL DEPTH 2
Repeat… 32* 16* Bucket A
GLOBAL DEPTH

3 2
000 1* 5* 21* 13* Bucket B
001
010 2

011 10* Bucket C


100
101 2

110 15* 7* 19* Bucket D


Is this enough? 111
2
DIRECTORY 4* 12* 20* Bucket A2
(`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
 Example: insert 20*
LOCAL DEPTH 3
Repeat… 32* 16* Bucket A
GLOBAL DEPTH

3 2
000 1* 5* 21* 13* Bucket B
001
2
FINAL STATE! 010
011 10* Bucket C
100
101 2

110 15* 7* 19* Bucket D


111
3
DIRECTORY 4* 12* 20* Bucket A2
(`split image'
of Bucket A)
Linear Hashing
 Another way of adapting gracefully to insertions and
deletions (i.e., pursuing dynamic hashing) is to use
Linear Hashing (LH)
 In contrast to Extendible Hashing, LH
 Does not require a directory
 Deals naturally with collisions
 Offers a lot of flexibility w.r.t the timing of bucket split
(allowing trading off greater overflow chains for higher
average space utilization)
How Linear Hashing Works?
 LH uses a family of hash functions h0, h1, h2, ...
 hi(key) = h(key) mod(2iN); N = initial # buckets
 h is some hash function (range is not 0 to N-1)
 If N = 2d0, for some d0, hi consists of applying h and
looking at the last di bits, where di = d0 + i
 hi+1 doubles the range of hi (similar to directory
doubling)
How Linear Hashing Works? (Cont’d)
 LH uses overflow pages, and chooses buckets to split in
a round-robin fashion
Buckets split
in this round
 Splitting proceeds in “rounds”
 A round ends when all NR Next
 (for round R) initial
 buckets are split Buckets that existed at the
beginning of this round:
 Buckets 0 to Next-1 this is the range of
 have been split; h
Level
 Next to NR yet to be split
 Current round number
 is referred to as Level ‘split image’
buckets created
in this round
Linear Hashing: Searching For Entries
 To find a bucket for data entry r, find hLevel(r):
 If hLevel(r) in range `Next to NR’ , r belongs there
 Else, r could belong to bucket hLevel(r) or bucket
 hLevel(r) + NR; must apply hLevel+1(r) to find out

 Example: search for 5* Level=0, N=4


h h PRIMARY
1 0 Next=0 PAGES

32*44* 36*
Level = 0  h0 000 00
5* = 101  01 9* 25* 5*
Data entry r
001 01 with h(r)=5

14* 18*10*30* Primary


010 10
bucket page
31*35* 7* 11*
011 11
Linear Hashing: Inserting Entries
 Find bucket as in search
 If the bucket to insert the data entry into is full:
 Add an overflow page and insert data entry
 (Maybe) Split Next bucket and increment Next

 Some points to Keep in mind:


 Unlike Extendible Hashing, when an insert triggers a split,
the bucket into which the data entry is inserted is not
necessarily the bucket that is split

 As in Static Hashing, an overflow page is added to store


the newly inserted data entry

 However, since the bucket to split is chosen in a round-


Linear Hashing: Inserting Entries
 Example: insert 43*

Level = 0  h0 Level=0, N=4


43* = 101011  11
h h PRIMARY
1 0 Next=0 PAGES

32*44* 36*
000 00

001 01 9* 25* 5*

14* 18*10* 30*


010 10

31*35* 7* 11*
011 11
Add an overflow page and
insert data entry
Linear Hashing: Inserting Entries
 Example: insert 43*

Level = 0  h0 Level=0, N=4


43* = 101011  11
h h PRIMARY OVERFLOW
1 0 Next=0 PAGES PAGES

32*44* 36*
000 00

001 01 9* 25* 5*

14* 18*10* 30*


010 10

Split Next bucket and 011


31*35* 7* 11*
43*
11
increment Next
Linear Hashing: Inserting Entries
 Example: insert 43*

Level = 0  h0 Level=0, N=4


43* = 101011  11
PRIMARY OVERFLOW
h h
1 0 Next=0 PAGES PAGES

000 00 32*

001 01 9* 25* 5*

010 10 14* 18*10* 30*

011 11 31*35* 7* 11*


Almost there… 43*

100 00 44* 36*


Linear Hashing: Inserting Entries
 Example: insert 43*

Level = 0  h0 Level=0, N=4


43* = 101011  11
PRIMARY OVERFLOW
h h
PAGES PAGES
1 0

000 00 32*
Next=1
001 01 9* 25* 5*

010 10 14* 18*10* 30*

011 11 31*35* 7* 11*


FINAL STATE! 43*

100 00 44* 36*


Linear Hashing: Inserting Entries
 Another Example: insert 50*
Level=0, N= 4
PRIMARY OVERFLOW
Level = 0  h0 h1 h0 PAGES PAGES
50* = 110010  10
000 00 32*

001 01 9* 25*

010 10 66*18* 10* 34*


Next=3
011 11 31*35* 7* 11* 43*

100 00 44*36*

101 01 5* 37*29*
Add an overflow page and
insert data entry 110 10 14*30*22*
Linear Hashing: Inserting Entries
 Another Example: insert 50*
Level=0, N= 4
PRIMARY OVERFLOW
Level = 0  h0 h1 h0 PAGES PAGES
50* = 110010  10
000 00 32*

001 01 9* 25*

010 10 66*18* 10* 34* 50*


Next=3
011 11 31*35* 7* 11* 43*

100 00 44*36*

Split Next bucket and 101 01 5* 37*29*


increment Next
110 10 14*30*22*
Linear Hashing: Inserting Entries
 Another Example: insert 50*
Level=0
PRIMARY OVERFLOW
h1 h0 PAGES PAGES
Level = 0  h0 000 00 32*
50* = 110010  10
001 01 9* 25*

010 10 66* 18* 10* 34* 50*


Next=3
011 11 43* 35* 11*

100 00 44* 36*

101 11 5* 37* 29*


Almost there…
110 10 14* 30* 22*

111 11 31*7*
Linear Hashing: Inserting Entries
 Another Example: insert 50*
Level=0
PRIMARY OVERFLOW
h1 h0 PAGES PAGES
Next=0
Level = 0  h0 000 00 32*
50* = 110010  10
001 01 9* 25*

010 10 66* 18* 10* 34* 50*

011 11 43* 35* 11*

100 00 44* 36*

101 11 5* 37* 29*


Almost there…
110 10 14* 30* 22*

111 11 31*7*
Linear Hashing: Inserting Entries
 Another Example: insert 50*
Level=1
PRIMARY OVERFLOW
h1 h0 PAGES PAGES
Next=0
Level = 0  h0 000 00 32*
50* = 110010  10
001 01 9* 25*

010 10 66* 18* 10* 34* 50*

011 11 43* 35* 11*

100 00 44* 36*

101 11 5* 37* 29*


FINAL STATE!
110 10 14* 30* 22*

111 11 31*7*
Linear Hashing: Deleting Entries
 Deletion is essentially the inverse of insertion

 If the last bucket in the file is empty, it can be removed and Next can be
decremented

 If Next is zero and the last bucket becomes empty


 Next is made to point to bucket M/2 -1 (where M is the current
number of buckets)
 Level is decremented
 The empty bucket is removed

 The insertion examples can be worked out backwards as examples of


deletions!

You might also like