UNIT V Imp Questions
UNIT V Imp Questions
Note that each index file contains more entries and each key serves as
a sperator for the content of the pages pointed to by the pointer to its left
and right.
B+ tree
The B+ tree is derived from the ISAM tree, but is fully dynamic with
respect to updates:
o Search performance is only dependent on the height of the B+ tree.
o No overflow pages, B+ tree remains balance.
o B+ tree offers efficient insert/delete procedures, the underlying data file
can grow/shrink dynamically
o B+ tree nodes(desipte the root page) are guaranteed to have a minimun
occupany of 50%.
UNIT-V (5 Marks)
1. Explain the Insertion and deletion Operations in B+ trees with
example.
Steps for insertion in B+ Tree
Every element is inserted into a leaf node. So, go to the appropriate leaf
node.
Insert the key into the leaf node in increasing order only if there is no
overflow. If there is an overflow go ahead with the following steps
mentioned below to deal with overflow while maintaining the B+ Tree
properties.
Properties for insertion B+ Tree
Case 1: Overflow in leaf node
Split the leaf node into two nodes.
First node contains ceil((m-1)/2) values.
Second node contains the remaining values.
Copy the smallest search key value from second node to the parent
node.(Right biased)
Below is the illustration of inserting 8 into B+ Tree of order of 5:
Case 2: Overflow in non-leaf node
Split the non leaf node into two nodes.
First node contains ceil(m/2)-1 values.
Move the smallest among remaining to the parent.
Second node contains the remaining keys.
Let's break down some of these elements to further understand how hash-
based indexing works in practice:
Buckets
In hash-based indexing, the data space is divided into a fixed number of slots
known as "buckets." A bucket usually contains a single page (also known as a
block), but it may have additional pages linked in a chain if the primary page
becomes full. This is known as overflow.
Hash Function
The hash function is a mapping function that takes the search key as an input
and returns the bucket number where the record should be located. Hash functions
aim to distribute records uniformly across buckets to minimize the number of
collisions (two different keys hashing to the same bucket).
Disk I/O Efficiency
Hash-based indexing is particularly efficient when it comes to disk I/O
operations. Given a search key, the hash function quickly identifies the bucket (and
thereby the disk page) where the desired record is located. This often requires only
one or two disk I/Os, making the retrieval process very fast.
Insert Operations
When a new record is inserted into the dataset, its search key is hashed to find
the appropriate bucket. If the primary page of the bucket is full, an additional
overflow page is allocated and linked to the primary page. The new record is then
stored on this overflow page.
Search Operations
To find a record with a specific search key, the hash function is applied to the
search key to identify the bucket. All pages (primary and overflow) in that bucket
are then examined to find the desired record.
Limitations
Hash-based indexing is not suitable for range queries or when the search key is
not known. In such cases, a full scan of all pages is required, which is resource-
intensive.
Hash-Based Indexing Example
Let's consider a simple example using employee names as the search key.
Employee Records
| Name | Age | Salary
|-----------|----------|--------
| Alice | 28 | 50000
| Bob | 35 | 60000
| Carol | 40 | 70000
Hash Function: H(x) = ASCII value of first letter of the name mod 3
Alice: 65 mod 3 = 2
Bob: 66 mod 3 = 0
Carol: 67 mod 3 = 1
Buckets:
Bucket 0: Bob
Bucket 1: Carol
Bucket 2: Alice
Pros of Hash-Based Indexing
Extremely fast for exact match queries.
Well-suited for equality comparisons.
Cons of Hash-Based Indexing
Not suitable for range queries (e.g., "SELECT * FROM table WHERE age
BETWEEN 20 AND 30").
Performance can be severely affected by poor hash functions or a large number
of collisions.