0% found this document useful (0 votes)
50 views

Unit5 File Organization

The document discusses various file organization techniques in DBMS like sequential file organization, heap file organization, hash file organization, clustered file organization, and B+ tree file organization. It also covers indexing techniques like primary indexing, secondary indexing, and multilevel indexing. The main objective of file organization is efficient storage and fast retrieval of records from the database. Indexing improves query performance by reducing disk accesses during search operations.

Uploaded by

Pinky
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Unit5 File Organization

The document discusses various file organization techniques in DBMS like sequential file organization, heap file organization, hash file organization, clustered file organization, and B+ tree file organization. It also covers indexing techniques like primary indexing, secondary indexing, and multilevel indexing. The main objective of file organization is efficient storage and fast retrieval of records from the database. Indexing improves query performance by reducing disk accesses during search operations.

Uploaded by

Pinky
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 112

Unit-5

File Organization &


indexes in DBMS
Data on external storage:
• Data in a DBMS is stored on
storage devices such as disks and
tapes
• The disk space manager is
responsible for keeping track of
available disk space.
• The file manager, which provides
the abstraction of a file of records
to higher levels
• DBMS code, issues requests to the
disk space manager to obtain and
relinquish space on disk.
Storage device hierarchy
At the top, we have primary storage,
which consists of cache and main
memory , and provides very
fast access to data. then comes
secondary storage, which consists of
slower devices such as magnetic
disks. tertiary storage is the slowest
class of storage devices; for
example, optical disks and tapes.
File organization in DBMS
• File organization is the
practice of storing files in a
specific order.
• File organization describes
the logical connections
between the different
records that make up a file,
especially regarding the
methods for identifying and
accessing any particular
record.
• File structure describes the
layout of logical control
records, label and data
blocks, and any other such
block.
Purpose
The main objective of file organization is
• Optimal selection of records i.e.; records should be accessed
as fast as possible.
• Any insert, update or delete transaction on records should
be easy, quick and should not harm other records.
• No duplicate records should be induced as a result of insert,
update or delete
• Records should be stored efficiently so that cost of storage is
minimal.
The following are the types of file organization in
DBMS:
•Sequential File Organization
•Hash File Organization
•Heap File Organization
•Clustered File Organization
•B+ Tree File Organization
Sequential File Organization
• This method simply stores the records in files in
sequential order, one after another in a series like
a sequence of books on a bookshelf.
• To access these files, we must search through
the entire sequence until we reach our desired file
in O(n), provided there is no order in the files,
such as they are unsorted; otherwise, we can use
binary search to access in O(logn).
• Depending on the ordering structure of the
records, there are two ways to arrange
them sequentially.
• Pile File Method
• Records are stored in this method
sequentially, one after another, and they
are inserted at the end of the file in the
same order that we insert them in the
table using the SQL query, so it is just an
O(1) space complexity operation since the
order of the records does not matter.
• Sorted Method
• As the name implies, the file in this
method must always be kept in sorted
order. In this approach, the file is sorted
using a primary key or another reference
after each delete, insert, and update
operation.
Heap File Organization in DBMS
• Data blocks are used in heap file
organization.
• Records are inserted using this
method into the data blocks at the
file's end. This method doesn't call
for any sorting or ordering.
• The new record is stored in a new
block if a data block is full. In this
case, the other block does not
necessarily have to be the next data
block; it can be any block in the
memory. The task of managing and
storing the new records falls to the
• Since the records are not sorted and not stored in consecutive
data blocks in memory, searching a record is time-consuming
process in this method. Update and delete operations also give
poor performance as the records needs to be searched first for
updation and deletion, which is already a time consuming
operation. However if the file size is small, these operations give
one of the best performances compared to other methods so this
method is widely used for small size files.
• This method requires memory optimization and cleanup as this
method doesn’t free up the allocated data block after a record is
deleted.
Hash File Organization in DBMS
• In this method, hash function is used to compute the
address of a data block in memory to store the record.
The hash function is applied on certain columns of the
records, known as hash columns to compute the block
address. These columns/fields can either be key or non-
key attributes.
• The following diagram demonstrates, the hash file
organization. As shown here, the records are stored in
database in no particular order and the data blocks are
not consecutive. These memory addresses are computed
by applying hash function on certain attributes of these
records.
• Fetching a record is faster in this method as the record
can be accessed using hash key column. No need to
search through the entire file to fetch a record.
Cluster File Organization in DBMS
• In a clustered file organization, multiple records or tables are combined
into a single file based on the clustered key or hash clusters; these files
contain multiple tables in the same memory block, and they are all
combined using a single clustered key/hash key to a single table.
B+ Tree File Organization in DBMS
• B+ file organization works with key & index value of the records.
It stores the records in a tree like structure, that is why it is also
known as B+ Tree file organization. In B+ file organization, the
leaf nodes store the records and intermediate nodes contain the
pointer to the leaf nodes, these intermediate nodes do not store
any record.
• Root node and intermediate nodes contain key field and index
field. The key field is a primary key of record which can be used
to distinctly identify a record, the index field contains the pointer
(address) to the leaf node where the actual record is stored.
Indexing and Hashing in
DBMS
Indexing in DBMS
• Indexing is a technique for improving database
performance by reducing the number of disk accesses
necessary when a query is run.
• An index is a form of data structure. It’s used to swiftly
identify and access data and information present in a
database table.
• An index is a small table with only two columns that are
each a key-value pair. Copies of specific columns from the
tabular data of the database are contained in the two
columns of the index table (i.e., the key-value pair).
Structure of Index
We can create indices using some columns of the
database.
•The search key is the database’s
first column, and it contains a
duplicate or copy of the table’s
candidate key or primary key. The
primary key values are saved in
sorted order so that the related
data can be quickly accessible.
•The data reference is the
database’s second column. It
contains a group of pointers that
point to the disk block where the
value of a specific key can be
Example
Advantages of indexing
There are many advantages of indexing. Some of the
advantages are mentioned here.
1.Better performance of queries.
2.Fast searching from the database.
3.Fast retrieval of data.
4.Increase performance in SELECT query.
Disadvantages of indexing
1.Indexing takes more space.
2.Decrease performance in INSERT, DELETE and
UPDATE query.
Methods of Indexing
Ordered index:
Ordered index = an index file where the index entries are sorted (in the order of
the search key)

Example: an ordered index


Primary index
Example: a primary index

•Primary index =
an ordered
index whose search
key is also the sort
key used for
the sequential file
The primary Indexing in DBMS is also further divided into two
types.
•Dense Index What is Sparse Index:
• When there are large database tables and if we
•Sparse Index use the dense index, then its size increases, so
the solution to this problem is sparse index.
• According to sparse index, index points to records
What is a Dense primary index in the main tables in the form of group. For
Each record in the main table strictly example, one sparse index can point to more
than one records of the main database tables.
has one entry in the index table.
What is the clustered index?
In a clustered index, table records are sorted physically to
match the index.
What is the Secondary Index?
• Secondary index: an index whose search key does not specify an order different from
the sequential order of the file. Also called non-clustering index.
• Secondary index manages the index in multi-levels.
• Multi-level indexing is an advancement in the secondary matrix, and we use
more and more levels in multi-level indexing.
Multilevel indexes
• The purpose of multilevel indexing is to reduce the
number of block access required to locate a record.
• More than on level of index files are maintained.
• Every level will reduce the number of block access
required by a factor of bfr(blocking factor). This is
called fan out of the multilevel index.
• The first level is an ordered file with a distinct value
for each K(i).
• The second level is primary index for the first level
has one entry for block of the first level.
• The third level is the primary index for the second
level has an entry for some index level fit in a single
block. This is called the top index level.
Multilevel indexes
Example:
17.3 Dynamic Multilevel Indexes Using B-
Trees and B+ -Trees
• Tree data structure terminology
• Tree is formed of nodes
• Each node (except root) has one parent and zero or more child nodes
• Leaf node has no child nodes
• Unbalanced if leaf nodes occur at different levels
• Nonleaf node called internal node
• Subtree of node consists of node and all descendant nodes

Slide 17- 31
Difference Between B-Tree And B+ Tree

B-Tree B+ Tree
Data is stored in leaf nodes as well as Data is stored only in leaf nodes.
internal nodes.
Searching is a bit slower as data is Searching is faster as the data is stored
stored in internal as well as leaf nodes. only in the leaf nodes.
No redundant search keys are present. Redundant search keys may be present.
Deletion operation is complex. Deletion operation is easy as data can
be directly deleted from the leaf nodes.
Leaf nodes cannot be linked together. Leaf nodes are linked together to form a
linked list.
Motivation
• Tree-based data structures
– O(logN) access time (Find, Insert, Delete)
• Can we do better than this?
– If we consider the average case rather than worst case, is there a O(1)
time approach with high probability?
– Hashing is such a data structure that allows for efficient insertion,
deletion, and searching of keys in O(1) time on average.
• Numerous applications
– Symbol table of variables in Compilers
– Virtual to physical memory translation in Operating Systems
– String matching
Components of Hashing
• Hash table is an array of some fixed size, containing the items.
– Generally a search is performed on some part of the item, called the
key.
– The item could consist of a string or a number (that serves as the key)
and additional data members (for instance, a name that is part of a
large employee structure).
– The size of the table is TableSize.
• Hash function h(k) maps search key k to some location in the
hash table in the range [0.. TableSize-1]. Different keys
might be mapped (or called hashed) to the same location and
this is called collision.
General Idea

• Insertion: Compute the location in the hash table for the input item and
insert it into the table.
• Deletion: Compute the location in the hash table for the input item and
remove it from the table.
Issues in Hashing

• How to select a “good” hash function?


– Easy to compute.
– It should distribute the keys evenly among the hash table
• How big is the hash table?
• How to resolve collision?
Hashing Function
• When the keys of data item are integers,
key % TableSize can be used as a hash function.
Tips: A good idea is to choose a prime as the table size. If the the total
number of the items is N, then you might choose the first prime that
larger than N as the table size.

Hash Table
Hash Function
0
Items: 1
18 2
23 key % TableSize 3
26 4
9 5
7 6

TableSize = 7
Hash-based indexes
n Hash-based indexes are best for equality selections. Cannot support
range searches.
n Static and dynamic hashing techniques exist; trade-offs similar to ISAM
vs. B+ trees.

n Recall, 3 alternatives for data entries k*:


1. Data record with key value k
2. <k, rid of data record with search key value k>
3. <k, list of rids of data records w/search key k>
Choice is orthogonal to the indexing technique

1.72
Static Hashing
 Static hashing
 In static hashing, a search key value is provided
by the designed Hash Function always computes
the same address
 For example, if mod (4) hash function is used, then
it shall generate only 5 values.
 The number of buckets provided remains
unchanged at all times
 Bucket address = h (K): the address of the desired
data item which is used for insertion updating and
deletion operations

1.73
Static Hashing

 # primary pages fixed, allocated sequentially, never de-allocated; overflow


pages if needed.
 A simple hash function (for N buckets):
h(k) = k MOD N
is bucket # where data entry with key k belongs.

0
h(key)
1
key
h

N-1
Primary bucket pages Overflow pages

1.74
Static hashing comes with the following disadvantages −

 It cannot work efficiently with the databases that can be


scaled.

 It is not a good option for large-size databases.

 Bucket overflow issue occurs if there is more data and less


memory.

1.75
Open Hashing (Separate chaining)
• Collisions are resolved using a list of
elements to store objects with the same
key together.

• Suppose you wish to store a set of


numbers = {36,18,72,43,6,10,5,15} into
a hash table of size 8.

• Now, assume that we have a hash


function H, such that H(x) = x%8

• So, if we were to map the given data


with the given hash function we'll get the
corresponding values{4,2,0,3,6,2,5,7}
Closed Hashing (Open Addressing)
• This collision resolution technique requires a hash table with fixed
and known size. During insertion, if a collision is encountered,
alternative cells are tried until an empty bucket is found. These
techniques require the size of the hash table to be supposedly
larger than the number of objects to be stored (something with a
load factor < 1 is ideal).
• There are various methods to find these empty buckets:
• a. Linear Probing
• b. Quadratic probing
• c. Double hashing
Collision Resolution: Open Addressing
• If a collision happens at one hash table call, then look for some other cell
in the table which is free.
• Problems with open addressing
– When a collision occurs, need some good way to look for a free cell in the
hash table.
– The size of the hash table should be larger than the total number of items
(normally it is two times the total number of items).
• General strategy for looking for a free cell
hi(x) = (hash(x) + f(i)) mod TableSize
Where, f(0) = 0,
f is the collision resolution strategy.
– The basic idea is, if a collision occurs, try h1(x), h2(x), … to find the first free
cell.
– Open addressing depends on the collision resolution strategy f.
Linear Probing
• In linear probing, f is a linear function of i, typically f(i) = i.
• This means to trying cells sequentially (with wraparound) in search of an
empty cell.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69
0
h0(89) = hash(89) = 89 mod 10 = 9
1
2
3
4
5
6
7
8
9 89
Linear Probing
• In linear probing, f is a linear function of i, typically f(i) = i.
• This means to trying cells sequentially (with wraparound) in search of an
empty cell.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69
0
h0(18) = hash(18) = 18 mod 10 = 8
1
2
3
4
5
6
7
8 18
9 89
Linear Probing
• In linear probing, f is a linear function of i, typically f(i) = i.
• This means to trying cells sequentially (with wraparound) in search of an
empty cell.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69
0 49
h0(49) = hash(49) = 49 mod 10 = 9
1
h1(49) = (hash(49) + f(1)) mod 10
2
= (9 + 1) mod 10 = 0
3
4
5
Question: if f(I) = 2*i, which location should
6
we put 49 into the table?
7
8 18
9 89
Linear Probing
• In linear probing, f is a linear function of i, typically f(i) = i.
• This means to trying cells sequentially (with wraparound) in search of an
empty cell.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69
0 49
h0(58) = hash(58) = 58 mod 10 = 8
1 58
h1(58) = (hash(58) + f(1)) mod 10
2
= (8 + 1) mod 10 = 9
3
h2(58) = (hash(58) + f(2)) mod 10 4
= (8 + 2) mod 10 = 0 5
h3(58) = (hash(58) + f(3)) mod 10 6
= (8 + 3) mod 10 = 1 7
8 18
9 89
Linear Probing
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69
h0(69) = hash(69) = 69 mod 10 = 9
h1(69) = (hash(69) + f(1)) mod 10
= (9 + 1) mod 10 = 0
h2(69) = (hash(69) + f(2)) mod 10 0 49
= (9 + 2) mod 10 = 1 1 58
h3(69) = (hash(69) + f(3)) mod 10 2 69
= (9 + 3) mod 10 = 2 3
As long as the table is big enough, a free cell 4
can always be found, but the time to do so 5
can get quite large. 6
In the worst case, even if the table is relatively 7
empty, blocks of occupied cells start forming. 8 18
In average case, the access time is still O(1). 9 89
Quadratic Probing
• In quadratic probing, f is a quadratic function of i, typically f(i) = i 2.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i2,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(89) = hash(89) = 89 mod 10 = 9


0
1
2
3
4
5
6
7
8
9 89
Quadratic Probing
• In quadratic probing, f is a quadratic function of i, typically f(i) = i 2.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i2,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(18) = hash(18) = 18 mod 10 = 8


0
1
2
3
4
5
6
7
8 18
9 89
Quadratic Probing
• In quadratic probing, f is a quadratic function of i, typically f(i) = i 2.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i2,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(49) = hash(49) = 49 mod 10 = 9


h1(49) = (hash(49) + f(1)) mod 10 0 49
= (9 + 12) mod 10 = 0 1
2
3
4
5
6
7
8 18
9 89
Quadratic Probing
• In quadratic probing, f is a quadratic function of i, typically f(i) = i 2.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i2,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(58) = hash(58) = 58 mod 10 = 8


h1(58) = (hash(58) + f(1)) mod 10 0 49
= (8 + 12) mod 10 = 9 1
h2(58) = (hash(58) + f(2)) mod 10 2 58
= (8 + 22) mod 10 = 2 3
4
By using linear probing, need 3 probes. 5
6
7
8 18
9 89
Quadratic Probing
• In quadratic probing, f is a quadratic function of i, typically f(i) = i 2.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i2,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(69) = hash(69) = 69 mod 10 = 9


h1(69) = (hash(69) + f(1)) mod 10 0 49
= (9 + 12) mod 10 = 0 1
h2(69) = (hash(69) + f(2)) mod 10 2 58
= (9 + 22) mod 10 = 3 3 69
4
By using linear probing, need 3 probes. 5
6
7
8 18
9 89
Double Hashing
• In double hashing, there are two hash functions. The second hash function is used to
provide an offset value in case the first function causes a collision.

• The following function is an example of double hashing:

• (firstHash(key) + i * secondHash(key)) % tableSize


• In the computation above, the value of i will keep incrementing (the offset will keep
increasing) until an empty slot is found.

• the final hashing function looks like:


H(x, i) = (H1(x) + i*H2(x))%N
• Typically for H1(x) = x%N a good H2 is H2(x) = P - (x%P), where P is a prime number
smaller than N.
• A good H2 is a function which never evaluates to zero and ensures that all the cells of a
table are effectively traversed.
Double Hashing
• For double hashing, one popular choice is f(i) = i * hash 2(x).
• This formula says that we apply a second hash function to x and probe at
a distance hash2(x), 2hash2(x), 3hash2(x), …, and so on.
• The choice of hash2(x) is essential.
– The function must never evaluate to zero.
– It is important to make sure all cells can be probed.
– A function such as hash2(x) = R – (x mod R), with R a prime smaller than
TableSize.
– TableSize need to be prime.
• The cost of double hashing: the use of a second hash function.
Examples for Double Hashing
• For example: TableSize = 10, hash(x) = x mod 10,
f(i) = i*hash2(x), hash2(x) = R – (x mod R), R = 7
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(89) = hash(89) = 89 mod 10 = 9


0
1
2
3
4
5
6
7
8
9 89
Examples for Double Hashing
• For example: TableSize = 10, hash(x) = x mod 10,
f(i) = i*hash2(x), hash2(x) = R – (x mod R), R = 7
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(18) = hash(18) = 18 mod 10 = 8


0
1
2
3
4
5
6
7
8 18
9 89
Examples for Double Hashing
• For example: TableSize = 10, hash(x) = x mod 10,
f(i) = i*hash2(x), hash2(x) = R – (x mod R), R = 7
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(49) = hash(49) = 49 mod 10 = 9


hash2(49) = R – (49 mod R) 0
= 7 – (49 mod 7) = 7 1
h1(49) = (hash(49) + f(1)) mod 10 2
= (9 + 1*hash2(49)) mod 10 = 6 3
4
5
6 49
7
8 18
9 89
Examples for Double Hashing
• For example: TableSize = 10, hash(x) = x mod 10,
f(i) = i*hash2(x), hash2(x) = R – (x mod R), R = 7
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(58) = hash(58) = 58 mod 10 = 8


hash2(58) = R – (58 mod R) 0
= 7 – (58 mod 7) = 5 1
h1(58) = (hash(58) + f(1)) mod 10 2
= (8 + 1*hash2(58)) mod 10 = 3 3 58
4
By using quadratic probing, need 2 probes. 5
By using linear probing, need 3 probes. 6 49
7
8 18
9 89
Examples for Double Hashing
• For example: TableSize = 10, hash(x) = x mod 10,
f(i) = i*hash2(x), hash2(x) = R – (x mod R), R = 7
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(69) = hash(69) = 69 mod 10 = 9


hash2(69) = R – (69 mod R) 0 69
= 7 – (69 mod 7) = 1 1
h1(69) = (hash(69) + f(1)) mod 10 2
= (9 + 1*hash2(69)) mod 10 = 0 3 58
4
By using quadratic probing, need 2 probes. 5
By using linear probing, need 3 probes. 6 49
7
8 18
9 89
Dynamic hashing (extensible
hashing)
• The drawback of static hashing
is that it does not expand or
shrink its size based on the
requirement of the database
• Dynamic hashing provides a
mechanism in which data
buckets are added and removed
dynamically and on-demand
• Generally, Hash Function in
dynamic hashing is designed to
produce a large number of
values such that only a few of
them are used at initial stages
Basic Working of Extendible Hashing:
Organization
•The prefix of an entire hash value is taken as a hash index.
•Only a portion of the hash value is used for computing
bucket addresses.
•Every hash index has a depth value to represent how many
bits are used for computing a hash function.
•These bits can address 2^n buckets.
•When all these bits are consumed i.e, when all the buckets
are full then the depth value is increased linearly and twice
the buckets are allocated.
•Tackling Over Flow Condition during Data
Insertion: Many times, while inserting data in the
buckets, it might happen that the Bucket overflows.
In such cases, we need to follow an appropriate
procedure to avoid mishandling of data.
First, Check if the local depth is less than or equal to
the global depth. Then choose one of the cases
below.
•Case1: If the local depth of the overflowing Bucket
is equal to the global depth, then Directory
Expansion, as well as Bucket Split, needs to be
performed. Then increment the global depth and the
local depth value by 1. And, assign appropriate
pointers.
Directory expansion will double the number of
directories present in the hash structure.
•Case2: In case the local depth is less than the global
depth, then only Bucket Split takes place. Then
increment only the local depth value by 1. And,
assign appropriate pointers.
Example based on Extendible Hashing: Now, let us consider a prominent example of hashing
the following elements: 16,4,6,22,24,10,31,7,9,20,26.
Bucket Size: 3 (Assume)
Hash Function: Suppose the global depth is X. Then the Hash Function returns X LSBs.

Solution: First, calculate the binary forms of each of the given numbers.
16- 10000
4- 00100
6- 00110
22- 10110
24- 11000
10- 01010
31- 11111
7- 00111
9- 01001
20- 10100
26- 11010
• Initially, the global-depth and local-depth is always 1. Thus, the hashing frame looks
like this:

• Inserting 16:
The binary format of 16 is 10000 and global-depth is 1. The hash function returns 1 LSB of 10000 which is 0.
Hence, 16 is mapped to the directory with id=0.
• Inserting 4 and 6:
Both 4(100) and 6(110)have 0 in their LSB. Hence, they are hashed as follows:

• Inserting 22: The binary form of 22 is 10110. Its LSB is 0. The bucket pointed by directory 0 is already full. Hence,
Over Flow occurs.
As directed by Case 1, Since Local Depth = Global Depth, the bucket splits and directory
expansion takes place. Also, rehashing of numbers present in the overflowing bucket takes
place after the split. And, since the global depth is incremented by 1, now,the global depth is
2. Hence, 16,4,6,22 are now rehashed w.r.t 2 LSBs.[ 16(10000),4(100),6(110),22(10110) ]

Notice that the bucket which was


underflow has remained
untouched. But, since the number
of directories has doubled, we
now have 2 directories 01 and 11
pointing to the same bucket. This
is because the local-depth of the
bucket has remained 1. And, any
bucket having a local depth less
than the global depth is pointed-to
by more than one directories.
Inserting 24 and 10: 24(11000) and 10 (1010) can be hashed based on directories with
id 00 and 10. Here, we encounter no overflow condition.
Inserting 31,7,9: All of these elements[ 31(11111), 7(111), 9(1001) ] have
either 01 or 11 in their LSBs. Hence, they are mapped on the bucket
pointed out by 01 and 11. We do not encounter any overflow condition
here.
Inserting 20: Insertion of data element 20 (10100) will
again cause the overflow problem.
20 is inserted in bucket pointed out by 00. As directed by Case 1, since the local depth of
the bucket = global-depth, directory expansion (doubling) takes place along with bucket
splitting. Elements present in overflowing bucket are rehashed with the new global
depth. Now, the new Hash table looks like this:
Inserting 26: Global depth is 3. Hence, 3 LSBs of 26(11010) are considered.
Therefore 26 best fits in the bucket pointed out by directory 010.
The bucket overflows, and, as directed by Case 2, since the local depth of bucket <
Global depth (2<3), directories are not doubled but, only the bucket is split and elements
are rehashed.
Finally, the output of hashing the given list of numbers is obtained.

Hashing of 11 Numbers is Thus Completed.


Key Observations:

• A Bucket will have more than one pointers pointing to it if its


local depth is less than the global depth.
• When overflow condition occurs in a bucket, all the entries in
the bucket are rehashed with a new local depth.
• If Local Depth of the overflowing bucket
• The size of a bucket cannot be changed after the data
insertion process begins.
Comparison of indexing and hashing

1.112

You might also like