Storage and Indexing
Storage and Indexing
Data entries
Data entries
(Index File)
(Data file)
CLUSTERED UNCLUSTERED
Index Classification (Contd.)
• Dense vs. Sparse: If
there is at least one
data entry per search Ashby, 25, 3000
44
smaller;
Index Classification (Contd.)
• Composite Search Keys: Search Examples of composite key
indexes using lexicographic order.
on a combination of fields.
– Equality query: Every field 11,80 11
value is equal to a constant 12,10 12
name age sal
value. E.g. wrt <sal,age> 12,20 12
13,75 bob 12 10 13
index: <age, sal> cal 11 80 <age>
• age=20 and sal =75 joe 12 20
P K P K 2 P K m Pm
0 1 1 2
40 Root
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
B+ Tree: The Most Widely Used
Index
• Insert/delete at log F N cost; keep tree height-
balanced. (F = fanout, N = # leaf pages)
• Minimum 50% occupancy (except for root).
Each node contains d <= m <= 2d entries.
The parameter d is called the order of the tree.
Root
Index Entries
Data Entries
Example B+ Tree
• Search begins at root, and key comparisons
direct it to a leaf.
• Search for 5*, 15*, all data entries >=
24* ...
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
B+ Trees in Practice
• Typical order: 100. Typical fill-factor: 67%.
– average fanout = 133
• Typical capacities:
– Height 4: 1334 = 312,900,700 records
– Height 3: 1333 = 2,352,637 records
• Can often hold top levels in buffer pool:
– Level 1 = 1 page = 8 Kbytes
– Level 2 = 133 pages = 1 Mbyte
– Level 3 = 17,689 pages = 133 MBytes
Inserting a Data Entry into a B+
Tree
• Find correct leaf L.
• Put data entry onto L.
– If L has enough space, done!
– Else, must split L (into L and a new node L2)
• Redistribute entries evenly, copy up middle key.
• Insert index entry pointing to L2 into parent of L.
• This can happen recursively
– To split index node, redistribute entries evenly, but
push up middle key. (Contrast with leaf splits.)
Insertion in a B+ Tree
Insert (K, P)
• Find leaf where K belongs, insert
• If no overflow (2d keys or less), halt
• If overflow (2d+1 keys), split node, insert in parent:
(K3, ) to parent
K1 K2 K3 K4 K5 K1 K2 K4 K5
P0 P1 P2 P3 P4 p5 P0 P1 P2 P3 P4 p5
• If leaf, keep K3 too in right node
• When root splits, new root has 1 key only
Insertion in a B+ Tree
Insert K=19
80
10 15 18 20 30 40 50 60 65 80 85 90
10 15 18 20 30 40 50 60 65 80 85 90
Insertion in a B+ Tree
After insertion
80
10 15 18 19 20 30 40 50 60 65 80 85 90
10 15 18 19 20 30 40 50 60 65 80 85 90
Insertion in a B+ Tree
Now insert 25
80
10 15 18 19 20 30 40 50 60 65 80 85 90
10 15 18 19 20 30 40 50 60 65 80 85 90
Insertion in a B+ Tree
After insertion
80
10 15 18 19 20 2 30 4 50 60 65 80 85 90
5 0
10 15 18 19 20 25 30 40 50 60 65 80 85 90
Insertion in a B+ Tree
But now have to split !
80
10 15 18 19 20 2 30 4 50 60 65 80 85 90
5 0
10 15 18 19 20 25 30 40 50 60 65 80 85 90
Insertion in a B+ Tree
After the split
80
10 15 18 19 20 2 30 4 5 60 65 80 85 90
5 0 0
10 15 18 19 20 25 30 40 50 60 65 80 85 90
Insertion in a B+ Tree
Another B+ Tree
80
10 15 18 19 20 2 30 4 5 60 65 80 85 90
5 0 0
10 15 18 19 20 25 30 40 50 60 65 80 85 90
Insertion in a B+ Tree
Now Insert 12
80
10 15 18 19 20 2 30 4 5 60 65 80 85 90
5 0 0
10 15 18 19 20 25 30 40 50 60 65 80 85 90
Insertion in a B+ Tree
Need to split leaf
80
10 1 15 1 19 20 2 30 4 5
2 8 5 0 0
10 12 15 18 19 20 25 30 40 50
Insertion in a B+ Tree
Need to split branch
80
10 1 15 1 1 20 2 30 4 5
2 8 9 5 0 0
10 12 15 18 19 20 25 30 40 50
Insertion in a B+ Tree
After split
30 80
10 1 15 1 1 20 2 30 4 5
2 8 9 5 0 0
10 12 15 18 19 20 25 30 40 50
Deleting a Data Entry from a B+ Tree
• Start at root, find leaf L where entry belongs.
• Remove the entry.
– If L is at least half-full, done!
– If L has only d-1 entries,
• Try to re-distribute, borrowing from sibling
(adjacent node with same parent as L).
• If re-distribution fails, merge L and sibling.
• If merge occurred, must delete entry (pointing to L
or sibling) from parent of L.
• Merge could propagate to root, decreasing height.
Deletion from a B+ Tree
Delete 30
80
10 15 18 19 20 2 30 4 5 60 65 80 85 90
5 0 0
10 15 18 19 20 25 30 40 50 60 65 80 85 90
Deletion from a B+ Tree
After deleting 30
May change to 80
40, or not
10 15 18 19 20 2 40 5 60 65 80 85 90
5 0
10 15 18 19 20 25 40 50 60 65 80 85 90
Deletion from a B+ Tree
Now delete 25
80
10 15 18 19 20 2 40 5 60 65 80 85 90
5 0
10 15 18 19 20 25 40 50 60 65 80 85 90
Deletion from a B+ Tree
After deleting 25
Need to rebalance
80
Rotate
10 15 18 19 20 40 5 60 65 80 85 90
0
10 15 18 19 20 40 50 60 65 80 85 90
Deletion from a B+ Tree
Now delete 40
80
10 15 18 19 2 40 5 60 65 80 85 90
0 0
10 15 18 19 20 40 50 60 65 80 85 90
Deletion from a B+ Tree
After deleting 40
Rotation not possible
80
Need to merge nodes
10 15 18 19 2 50 60 65 80 85 90
0
10 15 18 19 20 50 60 65 80 85 90
Deletion from a B+ Tree
Final tree
80
10 15 18 19 2 5 60 65 80 85 90
0 0
10 15 18 19 20 50 60 65 80 85 90