0% found this document useful (0 votes)

128 views112 pages

File Organization & Indexing in DBMS

The document discusses various file organization techniques in DBMS like sequential file organization, heap file organization, hash file organization, clustered file organization, and B+ tree file organization. It also covers indexing techniques like primary indexing, secondary indexing, and multilevel indexing. The main objective of file organization is efficient storage and fast retrieval of records from the database. Indexing improves query performance by reducing disk accesses during search operations.

Uploaded by

Pinky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

128 views112 pages

File Organization & Indexing in DBMS

Uploaded by

Pinky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Unit-5

File Organization &

indexes in DBMS
Data on external storage:
• Data in a DBMS is stored on
storage devices such as disks and
tapes
• The disk space manager is
responsible for keeping track of
available disk space.
• The file manager, which provides
the abstraction of a file of records
to higher levels
• DBMS code, issues requests to the
disk space manager to obtain and
relinquish space on disk.
Storage device hierarchy
At the top, we have primary storage,
which consists of cache and main
memory , and provides very
fast access to data. then comes
secondary storage, which consists of
slower devices such as magnetic
disks. tertiary storage is the slowest
class of storage devices; for
example, optical disks and tapes.
File organization in DBMS
• File organization is the
practice of storing files in a
specific order.
• File organization describes
the logical connections
between the different
records that make up a file,
especially regarding the
methods for identifying and
accessing any particular
record.
• File structure describes the
layout of logical control
records, label and data
blocks, and any other such
block.
Purpose
The main objective of file organization is
• Optimal selection of records i.e.; records should be accessed
as fast as possible.
• Any insert, update or delete transaction on records should
be easy, quick and should not harm other records.
• No duplicate records should be induced as a result of insert,
update or delete
• Records should be stored efficiently so that cost of storage is
minimal.
The following are the types of file organization in
DBMS:
•Sequential File Organization
•Hash File Organization
•Heap File Organization
•Clustered File Organization
•B+ Tree File Organization
Sequential File Organization
• This method simply stores the records in files in
sequential order, one after another in a series like
a sequence of books on a bookshelf.
• To access these files, we must search through
the entire sequence until we reach our desired file
in O(n), provided there is no order in the files,
such as they are unsorted; otherwise, we can use
binary search to access in O(logn).
• Depending on the ordering structure of the
records, there are two ways to arrange
them sequentially.
• Pile File Method
• Records are stored in this method
sequentially, one after another, and they
are inserted at the end of the file in the
same order that we insert them in the
table using the SQL query, so it is just an
O(1) space complexity operation since the
order of the records does not matter.
• Sorted Method
• As the name implies, the file in this
method must always be kept in sorted
order. In this approach, the file is sorted
using a primary key or another reference
after each delete, insert, and update
operation.
Heap File Organization in DBMS
• Data blocks are used in heap file
organization.
• Records are inserted using this
method into the data blocks at the
file's end. This method doesn't call
for any sorting or ordering.
• The new record is stored in a new
block if a data block is full. In this
case, the other block does not
necessarily have to be the next data
block; it can be any block in the
memory. The task of managing and
storing the new records falls to the
• Since the records are not sorted and not stored in consecutive
data blocks in memory, searching a record is time-consuming
process in this method. Update and delete operations also give
poor performance as the records needs to be searched first for
updation and deletion, which is already a time consuming
operation. However if the file size is small, these operations give
one of the best performances compared to other methods so this
method is widely used for small size files.
• This method requires memory optimization and cleanup as this
method doesn’t free up the allocated data block after a record is
deleted.
Hash File Organization in DBMS
• In this method, hash function is used to compute the
address of a data block in memory to store the record.
The hash function is applied on certain columns of the
records, known as hash columns to compute the block
address. These columns/fields can either be key or non-
key attributes.
• The following diagram demonstrates, the hash file
organization. As shown here, the records are stored in
database in no particular order and the data blocks are
not consecutive. These memory addresses are computed
by applying hash function on certain attributes of these
records.
• Fetching a record is faster in this method as the record
can be accessed using hash key column. No need to
search through the entire file to fetch a record.
Cluster File Organization in DBMS
• In a clustered file organization, multiple records or tables are combined
into a single file based on the clustered key or hash clusters; these files
contain multiple tables in the same memory block, and they are all
combined using a single clustered key/hash key to a single table.
B+ Tree File Organization in DBMS
• B+ file organization works with key & index value of the records.
It stores the records in a tree like structure, that is why it is also
known as B+ Tree file organization. In B+ file organization, the
leaf nodes store the records and intermediate nodes contain the
pointer to the leaf nodes, these intermediate nodes do not store
any record.
• Root node and intermediate nodes contain key field and index
field. The key field is a primary key of record which can be used
to distinctly identify a record, the index field contains the pointer
(address) to the leaf node where the actual record is stored.
Indexing and Hashing in
DBMS
Indexing in DBMS
• Indexing is a technique for improving database
performance by reducing the number of disk accesses
necessary when a query is run.
• An index is a form of data structure. It’s used to swiftly
identify and access data and information present in a
database table.
• An index is a small table with only two columns that are
each a key-value pair. Copies of specific columns from the
tabular data of the database are contained in the two
columns of the index table (i.e., the key-value pair).
Structure of Index
We can create indices using some columns of the
database.
•The search key is the database’s
first column, and it contains a
duplicate or copy of the table’s
candidate key or primary key. The
primary key values are saved in
sorted order so that the related
data can be quickly accessible.
•The data reference is the
database’s second column. It
contains a group of pointers that
point to the disk block where the
value of a specific key can be
Example
Advantages of indexing
There are many advantages of indexing. Some of the
advantages are mentioned here.
[Link] performance of queries.
[Link] searching from the database.
[Link] retrieval of data.
[Link] performance in SELECT query.
Disadvantages of indexing
[Link] takes more space.
[Link] performance in INSERT, DELETE and
UPDATE query.
Methods of Indexing
Ordered index:
Ordered index = an index file where the index entries are sorted (in the order of
the search key)

Example: an ordered index

Primary index
Example: a primary index

•Primary index =
an ordered
index whose search
key is also the sort
key used for
the sequential file
The primary Indexing in DBMS is also further divided into two
types.
•Dense Index What is Sparse Index:
• When there are large database tables and if we
•Sparse Index use the dense index, then its size increases, so
the solution to this problem is sparse index.
• According to sparse index, index points to records
What is a Dense primary index in the main tables in the form of group. For
Each record in the main table strictly example, one sparse index can point to more
than one records of the main database tables.
has one entry in the index table.
What is the clustered index?
In a clustered index, table records are sorted physically to
match the index.
What is the Secondary Index?
• Secondary index: an index whose search key does not specify an order different from
the sequential order of the file. Also called non-clustering index.
• Secondary index manages the index in multi-levels.
• Multi-level indexing is an advancement in the secondary matrix, and we use
more and more levels in multi-level indexing.
Multilevel indexes
• The purpose of multilevel indexing is to reduce the
number of block access required to locate a record.
• More than on level of index files are maintained.
• Every level will reduce the number of block access
required by a factor of bfr(blocking factor). This is
called fan out of the multilevel index.
• The first level is an ordered file with a distinct value
for each K(i).
• The second level is primary index for the first level
has one entry for block of the first level.
• The third level is the primary index for the second
level has an entry for some index level fit in a single
block. This is called the top index level.
Multilevel indexes
Example:
17.3 Dynamic Multilevel Indexes Using B-
Trees and B+ -Trees
• Tree data structure terminology
• Tree is formed of nodes
• Each node (except root) has one parent and zero or more child nodes
• Leaf node has no child nodes
• Unbalanced if leaf nodes occur at different levels
• Nonleaf node called internal node
• Subtree of node consists of node and all descendant nodes

Slide 17- 31
Difference Between B-Tree And B+ Tree

B-Tree B+ Tree
Data is stored in leaf nodes as well as Data is stored only in leaf nodes.
internal nodes.
Searching is a bit slower as data is Searching is faster as the data is stored
stored in internal as well as leaf nodes. only in the leaf nodes.
No redundant search keys are present. Redundant search keys may be present.
Deletion operation is complex. Deletion operation is easy as data can
be directly deleted from the leaf nodes.
Leaf nodes cannot be linked together. Leaf nodes are linked together to form a
linked list.
Motivation
• Tree-based data structures
– O(logN) access time (Find, Insert, Delete)
• Can we do better than this?
– If we consider the average case rather than worst case, is there a O(1)
time approach with high probability?
– Hashing is such a data structure that allows for efficient insertion,
deletion, and searching of keys in O(1) time on average.
• Numerous applications
– Symbol table of variables in Compilers
– Virtual to physical memory translation in Operating Systems
– String matching
Components of Hashing
• Hash table is an array of some fixed size, containing the items.
– Generally a search is performed on some part of the item, called the
key.
– The item could consist of a string or a number (that serves as the key)
and additional data members (for instance, a name that is part of a
large employee structure).
– The size of the table is TableSize.
• Hash function h(k) maps search key k to some location in the
hash table in the range [0.. TableSize-1]. Different keys
might be mapped (or called hashed) to the same location and
this is called collision.
General Idea

• Insertion: Compute the location in the hash table for the input item and
insert it into the table.
• Deletion: Compute the location in the hash table for the input item and
remove it from the table.
Issues in Hashing

• How to select a “good” hash function?

– Easy to compute.
– It should distribute the keys evenly among the hash table
• How big is the hash table?
• How to resolve collision?
Hashing Function
• When the keys of data item are integers,
key % TableSize can be used as a hash function.
Tips: A good idea is to choose a prime as the table size. If the the total
number of the items is N, then you might choose the first prime that
larger than N as the table size.

Hash Table
Hash Function
0
Items: 1
18 2
23 key % TableSize 3
26 4
9 5
7 6

TableSize = 7
Hash-based indexes
n Hash-based indexes are best for equality selections. Cannot support
range searches.
n Static and dynamic hashing techniques exist; trade-offs similar to ISAM
vs. B+ trees.

n Recall, 3 alternatives for data entries k*:

1. Data record with key value k
2. <k, rid of data record with search key value k>
3. <k, list of rids of data records w/search key k>
Choice is orthogonal to the indexing technique

1.72
Static Hashing
 Static hashing
 In static hashing, a search key value is provided
by the designed Hash Function always computes
the same address
 For example, if mod (4) hash function is used, then
it shall generate only 5 values.
 The number of buckets provided remains
unchanged at all times
 Bucket address = h (K): the address of the desired
data item which is used for insertion updating and
deletion operations

1.73
Static Hashing

 # primary pages fixed, allocated sequentially, never de-allocated; overflow

pages if needed.
 A simple hash function (for N buckets):
h(k) = k MOD N
is bucket # where data entry with key k belongs.

0
h(key)
1
key
h

N-1
Primary bucket pages Overflow pages

1.74
Static hashing comes with the following disadvantages −

 It cannot work efficiently with the databases that can be

scaled.

 It is not a good option for large-size databases.

 Bucket overflow issue occurs if there is more data and less

memory.

1.75
Open Hashing (Separate chaining)
• Collisions are resolved using a list of
elements to store objects with the same
key together.

• Suppose you wish to store a set of

numbers = {36,18,72,43,6,10,5,15} into
a hash table of size 8.

• Now, assume that we have a hash

function H, such that H(x) = x%8

• So, if we were to map the given data

with the given hash function we'll get the
corresponding values{4,2,0,3,6,2,5,7}
Closed Hashing (Open Addressing)
• This collision resolution technique requires a hash table with fixed
and known size. During insertion, if a collision is encountered,
alternative cells are tried until an empty bucket is found. These
techniques require the size of the hash table to be supposedly
larger than the number of objects to be stored (something with a
load factor < 1 is ideal).
• There are various methods to find these empty buckets:
• a. Linear Probing
• b. Quadratic probing
• c. Double hashing
Collision Resolution: Open Addressing
• If a collision happens at one hash table call, then look for some other cell
in the table which is free.
• Problems with open addressing
– When a collision occurs, need some good way to look for a free cell in the
hash table.
– The size of the hash table should be larger than the total number of items
(normally it is two times the total number of items).
• General strategy for looking for a free cell
hi(x) = (hash(x) + f(i)) mod TableSize
Where, f(0) = 0,
f is the collision resolution strategy.
– The basic idea is, if a collision occurs, try h1(x), h2(x), … to find the first free
cell.
– Open addressing depends on the collision resolution strategy f.
Linear Probing
• In linear probing, f is a linear function of i, typically f(i) = i.
• This means to trying cells sequentially (with wraparound) in search of an
empty cell.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69
0
h0(89) = hash(89) = 89 mod 10 = 9
1
2
3
4
5
6
7
8
9 89
Linear Probing
• In linear probing, f is a linear function of i, typically f(i) = i.
• This means to trying cells sequentially (with wraparound) in search of an
empty cell.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69
0
h0(18) = hash(18) = 18 mod 10 = 8
1
2
3
4
5
6
7
8 18
9 89
Linear Probing
• In linear probing, f is a linear function of i, typically f(i) = i.
• This means to trying cells sequentially (with wraparound) in search of an
empty cell.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69
0 49
h0(49) = hash(49) = 49 mod 10 = 9
1
h1(49) = (hash(49) + f(1)) mod 10
2
= (9 + 1) mod 10 = 0
3
4
5
Question: if f(I) = 2*i, which location should
6
we put 49 into the table?
7
8 18
9 89
Linear Probing
• In linear probing, f is a linear function of i, typically f(i) = i.
• This means to trying cells sequentially (with wraparound) in search of an
empty cell.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69
0 49
h0(58) = hash(58) = 58 mod 10 = 8
1 58
h1(58) = (hash(58) + f(1)) mod 10
2
= (8 + 1) mod 10 = 9
3
h2(58) = (hash(58) + f(2)) mod 10 4
= (8 + 2) mod 10 = 0 5
h3(58) = (hash(58) + f(3)) mod 10 6
= (8 + 3) mod 10 = 1 7
8 18
9 89
Linear Probing
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69
h0(69) = hash(69) = 69 mod 10 = 9
h1(69) = (hash(69) + f(1)) mod 10
= (9 + 1) mod 10 = 0
h2(69) = (hash(69) + f(2)) mod 10 0 49
= (9 + 2) mod 10 = 1 1 58
h3(69) = (hash(69) + f(3)) mod 10 2 69
= (9 + 3) mod 10 = 2 3
As long as the table is big enough, a free cell 4
can always be found, but the time to do so 5
can get quite large. 6
In the worst case, even if the table is relatively 7
empty, blocks of occupied cells start forming. 8 18
In average case, the access time is still O(1). 9 89
Quadratic Probing
• In quadratic probing, f is a quadratic function of i, typically f(i) = i 2.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i2,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(89) = hash(89) = 89 mod 10 = 9

0
1
2
3
4
5
6
7
8
9 89
Quadratic Probing
• In quadratic probing, f is a quadratic function of i, typically f(i) = i 2.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i2,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(18) = hash(18) = 18 mod 10 = 8

0
1
2
3
4
5
6
7
8 18
9 89
Quadratic Probing
• In quadratic probing, f is a quadratic function of i, typically f(i) = i 2.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i2,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(49) = hash(49) = 49 mod 10 = 9

h1(49) = (hash(49) + f(1)) mod 10 0 49
= (9 + 12) mod 10 = 0 1
2
3
4
5
6
7
8 18
9 89
Quadratic Probing
• In quadratic probing, f is a quadratic function of i, typically f(i) = i 2.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i2,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(58) = hash(58) = 58 mod 10 = 8

h1(58) = (hash(58) + f(1)) mod 10 0 49
= (8 + 12) mod 10 = 9 1
h2(58) = (hash(58) + f(2)) mod 10 2 58
= (8 + 22) mod 10 = 2 3
4
By using linear probing, need 3 probes. 5
6
7
8 18
9 89
Quadratic Probing
• In quadratic probing, f is a quadratic function of i, typically f(i) = i 2.
• For example: TableSize = 10, hash(x) = x mod 10, f(i) = i2,
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(69) = hash(69) = 69 mod 10 = 9

h1(69) = (hash(69) + f(1)) mod 10 0 49
= (9 + 12) mod 10 = 0 1
h2(69) = (hash(69) + f(2)) mod 10 2 58
= (9 + 22) mod 10 = 3 3 69
4
By using linear probing, need 3 probes. 5
6
7
8 18
9 89
Double Hashing
• In double hashing, there are two hash functions. The second hash function is used to
provide an offset value in case the first function causes a collision.

• The following function is an example of double hashing:

• (firstHash(key) + i * secondHash(key)) % tableSize

• In the computation above, the value of i will keep incrementing (the offset will keep
increasing) until an empty slot is found.

• the final hashing function looks like:

H(x, i) = (H1(x) + i*H2(x))%N
• Typically for H1(x) = x%N a good H2 is H2(x) = P - (x%P), where P is a prime number
smaller than N.
• A good H2 is a function which never evaluates to zero and ensures that all the cells of a
table are effectively traversed.
Double Hashing
• For double hashing, one popular choice is f(i) = i * hash 2(x).
• This formula says that we apply a second hash function to x and probe at
a distance hash2(x), 2hash2(x), 3hash2(x), …, and so on.
• The choice of hash2(x) is essential.
– The function must never evaluate to zero.
– It is important to make sure all cells can be probed.
– A function such as hash2(x) = R – (x mod R), with R a prime smaller than
TableSize.
– TableSize need to be prime.
• The cost of double hashing: the use of a second hash function.
Examples for Double Hashing
• For example: TableSize = 10, hash(x) = x mod 10,
f(i) = i*hash2(x), hash2(x) = R – (x mod R), R = 7
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(89) = hash(89) = 89 mod 10 = 9

0
1
2
3
4
5
6
7
8
9 89
Examples for Double Hashing
• For example: TableSize = 10, hash(x) = x mod 10,
f(i) = i*hash2(x), hash2(x) = R – (x mod R), R = 7
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(18) = hash(18) = 18 mod 10 = 8

0
1
2
3
4
5
6
7
8 18
9 89
Examples for Double Hashing
• For example: TableSize = 10, hash(x) = x mod 10,
f(i) = i*hash2(x), hash2(x) = R – (x mod R), R = 7
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(49) = hash(49) = 49 mod 10 = 9

hash2(49) = R – (49 mod R) 0
= 7 – (49 mod 7) = 7 1
h1(49) = (hash(49) + f(1)) mod 10 2
= (9 + 1*hash2(49)) mod 10 = 6 3
4
5
6 49
7
8 18
9 89
Examples for Double Hashing
• For example: TableSize = 10, hash(x) = x mod 10,
f(i) = i*hash2(x), hash2(x) = R – (x mod R), R = 7
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(58) = hash(58) = 58 mod 10 = 8

hash2(58) = R – (58 mod R) 0
= 7 – (58 mod 7) = 5 1
h1(58) = (hash(58) + f(1)) mod 10 2
= (8 + 1*hash2(58)) mod 10 = 3 3 58
4
By using quadratic probing, need 2 probes. 5
By using linear probing, need 3 probes. 6 49
7
8 18
9 89
Examples for Double Hashing
• For example: TableSize = 10, hash(x) = x mod 10,
f(i) = i*hash2(x), hash2(x) = R – (x mod R), R = 7
hi(x) = (hash(x) + f(i)) mod TableSize
Insert keys: 89, 18, 49, 58, 69

h0(69) = hash(69) = 69 mod 10 = 9

hash2(69) = R – (69 mod R) 0 69
= 7 – (69 mod 7) = 1 1
h1(69) = (hash(69) + f(1)) mod 10 2
= (9 + 1*hash2(69)) mod 10 = 0 3 58
4
By using quadratic probing, need 2 probes. 5
By using linear probing, need 3 probes. 6 49
7
8 18
9 89
Dynamic hashing (extensible
hashing)
• The drawback of static hashing
is that it does not expand or
shrink its size based on the
requirement of the database
• Dynamic hashing provides a
mechanism in which data
buckets are added and removed
dynamically and on-demand
• Generally, Hash Function in
dynamic hashing is designed to
produce a large number of
values such that only a few of
them are used at initial stages
Basic Working of Extendible Hashing:
Organization
•The prefix of an entire hash value is taken as a hash index.
•Only a portion of the hash value is used for computing
bucket addresses.
•Every hash index has a depth value to represent how many
bits are used for computing a hash function.
•These bits can address 2^n buckets.
•When all these bits are consumed i.e, when all the buckets
are full then the depth value is increased linearly and twice
the buckets are allocated.
•Tackling Over Flow Condition during Data
Insertion: Many times, while inserting data in the
buckets, it might happen that the Bucket overflows.
In such cases, we need to follow an appropriate
procedure to avoid mishandling of data.
First, Check if the local depth is less than or equal to
the global depth. Then choose one of the cases
below.
•Case1: If the local depth of the overflowing Bucket
is equal to the global depth, then Directory
Expansion, as well as Bucket Split, needs to be
performed. Then increment the global depth and the
local depth value by 1. And, assign appropriate
pointers.
Directory expansion will double the number of
directories present in the hash structure.
•Case2: In case the local depth is less than the global
depth, then only Bucket Split takes place. Then
increment only the local depth value by 1. And,
assign appropriate pointers.
Example based on Extendible Hashing: Now, let us consider a prominent example of hashing
the following elements: 16,4,6,22,24,10,31,7,9,20,26.
Bucket Size: 3 (Assume)
Hash Function: Suppose the global depth is X. Then the Hash Function returns X LSBs.

Solution: First, calculate the binary forms of each of the given numbers.
16- 10000
4- 00100
6- 00110
22- 10110
24- 11000
10- 01010
31- 11111
7- 00111
9- 01001
20- 10100
26- 11010
• Initially, the global-depth and local-depth is always 1. Thus, the hashing frame looks
like this:

• Inserting 16:
The binary format of 16 is 10000 and global-depth is 1. The hash function returns 1 LSB of 10000 which is 0.
Hence, 16 is mapped to the directory with id=0.
• Inserting 4 and 6:
Both 4(100) and 6(110)have 0 in their LSB. Hence, they are hashed as follows:

• Inserting 22: The binary form of 22 is 10110. Its LSB is 0. The bucket pointed by directory 0 is already full. Hence,
Over Flow occurs.
As directed by Case 1, Since Local Depth = Global Depth, the bucket splits and directory
expansion takes place. Also, rehashing of numbers present in the overflowing bucket takes
place after the split. And, since the global depth is incremented by 1, now,the global depth is
2. Hence, 16,4,6,22 are now rehashed w.r.t 2 LSBs.[ 16(10000),4(100),6(110),22(10110) ]

Notice that the bucket which was

underflow has remained
untouched. But, since the number
of directories has doubled, we
now have 2 directories 01 and 11
pointing to the same bucket. This
is because the local-depth of the
bucket has remained 1. And, any
bucket having a local depth less
than the global depth is pointed-to
by more than one directories.
Inserting 24 and 10: 24(11000) and 10 (1010) can be hashed based on directories with
id 00 and 10. Here, we encounter no overflow condition.
Inserting 31,7,9: All of these elements[ 31(11111), 7(111), 9(1001) ] have
either 01 or 11 in their LSBs. Hence, they are mapped on the bucket
pointed out by 01 and 11. We do not encounter any overflow condition
here.
Inserting 20: Insertion of data element 20 (10100) will
again cause the overflow problem.
20 is inserted in bucket pointed out by 00. As directed by Case 1, since the local depth of
the bucket = global-depth, directory expansion (doubling) takes place along with bucket
splitting. Elements present in overflowing bucket are rehashed with the new global
depth. Now, the new Hash table looks like this:
Inserting 26: Global depth is 3. Hence, 3 LSBs of 26(11010) are considered.
Therefore 26 best fits in the bucket pointed out by directory 010.
The bucket overflows, and, as directed by Case 2, since the local depth of bucket <
Global depth (2<3), directories are not doubled but, only the bucket is split and elements
are rehashed.
Finally, the output of hashing the given list of numbers is obtained.

Hashing of 11 Numbers is Thus Completed.

Key Observations:

• A Bucket will have more than one pointers pointing to it if its

local depth is less than the global depth.
• When overflow condition occurs in a bucket, all the entries in
the bucket are rehashed with a new local depth.
• If Local Depth of the overflowing bucket
• The size of a bucket cannot be changed after the data
insertion process begins.
Comparison of indexing and hashing

1.112

File Organization and Indexing Methods
No ratings yet
File Organization and Indexing Methods
7 pages
CIT 401: File Organization & Indexing
No ratings yet
CIT 401: File Organization & Indexing
46 pages
External Storage and File Organization
No ratings yet
External Storage and File Organization
58 pages
Physical Database Design Essentials
No ratings yet
Physical Database Design Essentials
9 pages
Database Storage and Query Techniques
No ratings yet
Database Storage and Query Techniques
37 pages
Types of File Organization in DBMS
No ratings yet
Types of File Organization in DBMS
15 pages
DBMS Indexing and File Organization Guide
No ratings yet
DBMS Indexing and File Organization Guide
33 pages
Heap File Organization in DBMS
No ratings yet
Heap File Organization in DBMS
81 pages
Understanding File Organization in DBMS
No ratings yet
Understanding File Organization in DBMS
15 pages
External Storage and File Indexing Techniques
No ratings yet
External Storage and File Indexing Techniques
23 pages
Storage and Indexing in DBMS
No ratings yet
Storage and Indexing in DBMS
10 pages
File Organization and Indexing Techniques
No ratings yet
File Organization and Indexing Techniques
62 pages
File Organization and Index Structures Guide
No ratings yet
File Organization and Index Structures Guide
27 pages
Database Indexing and Storage Techniques
No ratings yet
Database Indexing and Storage Techniques
23 pages
File Organization and Indexing Techniques
No ratings yet
File Organization and Indexing Techniques
23 pages
DBMS Indexing and Hashing Techniques
No ratings yet
DBMS Indexing and Hashing Techniques
31 pages
Database Record Storage Techniques
No ratings yet
Database Record Storage Techniques
18 pages
File Organization in DBMS: Overview
No ratings yet
File Organization in DBMS: Overview
20 pages
File Organization and Indexing in DBMS
No ratings yet
File Organization and Indexing in DBMS
14 pages
File Organization and Indexing Techniques
No ratings yet
File Organization and Indexing Techniques
23 pages
File Organization Methods in DBMS
100% (2)
File Organization Methods in DBMS
23 pages
File Structure and Organization Overview
No ratings yet
File Structure and Organization Overview
29 pages
File Organization Techniques in DBMS
No ratings yet
File Organization Techniques in DBMS
23 pages
Understanding Index Scans in DBMS
No ratings yet
Understanding Index Scans in DBMS
16 pages
File Organization and Storage Methods
No ratings yet
File Organization and Storage Methods
11 pages
Database Indexing Techniques Explained
No ratings yet
Database Indexing Techniques Explained
20 pages
Understanding Database Indexing Methods
No ratings yet
Understanding Database Indexing Methods
43 pages
File Organization in Database Systems
No ratings yet
File Organization in Database Systems
13 pages
Disk Storage and Indexing Overview
No ratings yet
Disk Storage and Indexing Overview
20 pages
File Organization and Indexing Techniques
No ratings yet
File Organization and Indexing Techniques
13 pages
Types and Benefits of DBMS Indexing
No ratings yet
Types and Benefits of DBMS Indexing
12 pages
File Organization and Indexing in DBMS
No ratings yet
File Organization and Indexing in DBMS
53 pages
Understanding Secondary Indexing in Databases
No ratings yet
Understanding Secondary Indexing in Databases
35 pages
File Organization and Data Mapping Techniques
No ratings yet
File Organization and Data Mapping Techniques
18 pages
Understanding File Organization Methods
No ratings yet
Understanding File Organization Methods
27 pages
File Organization and Indexing in Databases
No ratings yet
File Organization and Indexing in Databases
40 pages
Understanding Multiple Granularity in Databases
No ratings yet
Understanding Multiple Granularity in Databases
17 pages
Indexing and Hashing in DBMS
No ratings yet
Indexing and Hashing in DBMS
53 pages
DBMS Unit 5: Storage Structures Overview
No ratings yet
DBMS Unit 5: Storage Structures Overview
38 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
Dense Indexing in DBMS Explained
No ratings yet
Dense Indexing in DBMS Explained
28 pages
Types of File Organization in DBMS
No ratings yet
Types of File Organization in DBMS
45 pages
Single vs Multilevel Indexing Explained
No ratings yet
Single vs Multilevel Indexing Explained
6 pages
DBMS Unit 5 Notes Overview
No ratings yet
DBMS Unit 5 Notes Overview
24 pages
File Organization Methods in DBMS
No ratings yet
File Organization Methods in DBMS
21 pages
File Organization Methods in DBMS
No ratings yet
File Organization Methods in DBMS
12 pages
DBMS Storage Systems and File Organization
No ratings yet
DBMS Storage Systems and File Organization
25 pages
Storage Strategies: Indices & Hashing
No ratings yet
Storage Strategies: Indices & Hashing
12 pages
File Organization Techniques Overview
No ratings yet
File Organization Techniques Overview
31 pages
Types of Indexing Methods Explained
No ratings yet
Types of Indexing Methods Explained
60 pages
File Organization and Indexing in DBMS
No ratings yet
File Organization and Indexing in DBMS
19 pages
B+ Tree and File Organization Techniques
No ratings yet
B+ Tree and File Organization Techniques
25 pages
File Organization in Database Systems
No ratings yet
File Organization in Database Systems
42 pages
Heap File Organization and Indexing
No ratings yet
Heap File Organization and Indexing
41 pages
File Organization and Indexing in DBMS
No ratings yet
File Organization and Indexing in DBMS
23 pages
File Organization and Indexing Techniques
No ratings yet
File Organization and Indexing Techniques
20 pages
Online Order Processing Guide
No ratings yet
Online Order Processing Guide
34 pages
Information Security Incident Policy
100% (1)
Information Security Incident Policy
14 pages
Project Management System Overview
No ratings yet
Project Management System Overview
8 pages
IBM solidDB SQL Guide V6.5
No ratings yet
IBM solidDB SQL Guide V6.5
422 pages
Technology's Impact on Business Processes
No ratings yet
Technology's Impact on Business Processes
28 pages
Java Interview Questions Guide
No ratings yet
Java Interview Questions Guide
25 pages
Android Application Development Course
No ratings yet
Android Application Development Course
4 pages
Telkomsel Central Java QC Report
No ratings yet
Telkomsel Central Java QC Report
1,221 pages
ADO.NET Data Access Overview
No ratings yet
ADO.NET Data Access Overview
18 pages
Cyber Security Fundamentals Explained
No ratings yet
Cyber Security Fundamentals Explained
11 pages
Power BI Data Preparation Guide
No ratings yet
Power BI Data Preparation Guide
3 pages
K2View Fabric: Transforming Data Management
No ratings yet
K2View Fabric: Transforming Data Management
14 pages
Introduction to ABAP Programming
No ratings yet
Introduction to ABAP Programming
344 pages
Types of Oracle Data Guard Databases
No ratings yet
Types of Oracle Data Guard Databases
2 pages
SQL Functions
No ratings yet
SQL Functions
4 pages
Infrastructure Team Setup Proposal
No ratings yet
Infrastructure Team Setup Proposal
10 pages
DBMS Concepts and ER Modeling Guide
No ratings yet
DBMS Concepts and ER Modeling Guide
18 pages
Easyio Quick Start 03 - CPT Tool V3.0
No ratings yet
Easyio Quick Start 03 - CPT Tool V3.0
65 pages
PivotTable Layout: Compact, Outline, Tabular
No ratings yet
PivotTable Layout: Compact, Outline, Tabular
8 pages
ABAP Data Handling for SFIR Transactions
No ratings yet
ABAP Data Handling for SFIR Transactions
15 pages
BDA Module 2: Hadoop Core Components
No ratings yet
BDA Module 2: Hadoop Core Components
7 pages
Azure Landing Zones Implementation Guide
No ratings yet
Azure Landing Zones Implementation Guide
14 pages
Salesforce Developer with SFDX Expertise
No ratings yet
Salesforce Developer with SFDX Expertise
5 pages
PCI Compliance Workflow and Services
100% (1)
PCI Compliance Workflow and Services
1 page
Quality Engineer Resume - Akash Kandharkar
No ratings yet
Quality Engineer Resume - Akash Kandharkar
3 pages
Task Management System Overview
No ratings yet
Task Management System Overview
12 pages
Transaction Management in DBMS
No ratings yet
Transaction Management in DBMS
68 pages
Presales Architect at PT Caraka Prakasa
No ratings yet
Presales Architect at PT Caraka Prakasa
2 pages
Banking App Vulnerability Report
No ratings yet
Banking App Vulnerability Report
6 pages
Synchronizing Five Processes Using Barriers
No ratings yet
Synchronizing Five Processes Using Barriers
12 pages

File Organization & Indexing in DBMS

Uploaded by

File Organization & Indexing in DBMS

Uploaded by

Unit-5

File Organization &

Example: an ordered index

• How to select a “good” hash function?

n Recall, 3 alternatives for data entries k*:

 # primary pages fixed, allocated sequentially, never de-allocated; overflow

 It cannot work efficiently with the databases that can be

 It is not a good option for large-size databases.

 Bucket overflow issue occurs if there is more data and less

• Suppose you wish to store a set of

• Now, assume that we have a hash

• So, if we were to map the given data

h0(89) = hash(89) = 89 mod 10 = 9

h0(18) = hash(18) = 18 mod 10 = 8

h0(49) = hash(49) = 49 mod 10 = 9

h0(58) = hash(58) = 58 mod 10 = 8

h0(69) = hash(69) = 69 mod 10 = 9

• The following function is an example of double hashing:

• (firstHash(key) + i * secondHash(key)) % tableSize

• the final hashing function looks like:

h0(89) = hash(89) = 89 mod 10 = 9

h0(18) = hash(18) = 18 mod 10 = 8

h0(49) = hash(49) = 49 mod 10 = 9

h0(58) = hash(58) = 58 mod 10 = 8

h0(69) = hash(69) = 69 mod 10 = 9

Notice that the bucket which was

Hashing of 11 Numbers is Thus Completed.

• A Bucket will have more than one pointers pointing to it if its

You might also like