0% found this document useful (0 votes)
34 views

B Tree-2020

The document discusses B+-trees, which are a self-balancing tree data structure used to organize data in a way that maintains efficiency during insertion and deletion operations. It outlines the structure and properties of B+-trees, including how nodes are organized, the process for inserting and deleting data from the tree while maintaining balance, and differences from B-trees. The goal is to learn how to use B+-trees to index and organize large datasets.

Uploaded by

ali.abweh.344
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

B Tree-2020

The document discusses B+-trees, which are a self-balancing tree data structure used to organize data in a way that maintains efficiency during insertion and deletion operations. It outlines the structure and properties of B+-trees, including how nodes are organized, the process for inserting and deleting data from the tree while maintaining balance, and differences from B-trees. The goal is to learn how to use B+-trees to index and organize large datasets.

Uploaded by

ali.abweh.344
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Multi-level Indexes Using

B-Trees and B+-Trees

Husam A. Halim
2020

Hussam A. Halim Computer Science De


pt. 2020
Outline
• B+-Tree Motivation
• Definition of B+-Trees
• Insertion into a B+-Tree
• Deletion from a B+-Tree
• B+-Tree vs. B-Tree
• Determining the Size Of B+-Tree

Hussam A. Halim Computer Science De


pt. 2020
Objectives
• Learn about B+-Trees.
• Discover how to insert and delete items in a
B+-Tree.
• Explore the differences between B+-Trees and
B-Trees
• Learn how to organize data in a B+-Tree.
• Learn how to compute the size of a B+-tree

Hussam A. Halim Computer Science De


pt. 2020
B+-Tree Motivation
• The B+ Tree index structure is the most widely
used of several index structures that maintain
efficiency despite insertion and deletion of
data.
• They use idea of a balance tree in which every
path from the root of the tree to a leaf is of
the same length.
• All leaf nodes are at the same level.

Hussam A. Halim Computer Science De


pt. 2020
Definition of B+-Trees
• The structure of an internal node:
– q ≤ p.
– Pi is a pointer to
another node.

• The structure of a leaf node:


– Pnext is a pointer to
next leaf node.
– Pri is a data pointer
to the record whose key value is equal to Ki (or to the
data file block containing that record).
Hussam A. Halim Computer Science De
pt. 2020
B+ Tree node Structure

A B

Values <= A Values > A Values > B


B =< &&

Hussam A. Halim Computer Science De


pt. 2020
Definition of B+-Trees
• Each node is kept between half-full and completely full.

• A B+-tree of order P:
– Root has between 2 and P pointers (unless it’s a leaf).
– Internal nodes have between ⌊(P-1)/2⌋ and P-1 keys, and
#pointers = #keys + 1.
– Leaf nodes have between ⎡(P-1)/2⎤ and P-1 keys.

• Search-key values are kept in sorted order.

Hussam A. Halim Computer Science De


pt. 2020
B+-Tree: An example

A B+-Tree of order p = 4
• Internal nodes : ⌊(P-1)/2⌋ ≤ #keys ≤ P-1 1 ≤ #keys ≤ 3
• Leaf nodes : ⎡(P-1)/2⎤ ≤ #keys ≤ P-1 2 ≤ #keys ≤ 3
Hussam A. Halim Computer Science De
pt. 2020
B+-Tree : Insertion
• Find correct leaf L and place the key in L in sorted order.

Case 1 : Leaf not full -> done!


Case 2 : Leaf overflow
• Must split L (into L and a new leaf L2)
Left leaf L : records with keys <= middle key.
Right leaf L2 : records with keys > middle key.
• Copy up middle key, insert pointer pointing to L2 into parent of L.
• If parent node is overflow, go to case 3.
Case 3 : Non-Leaf overflow
• Same as case 2, split it evenly but push up (not copy) middle key.
• If parent node is overflow, go to case 3.
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Insert : 8 5 1 7 3 12 9 6 15
=> Max # of keys in a node = P – 1 = 2

Insert 1
8
5
5 1 18 55 88

Case2
Case
: overflow
1 : Leaf (new
not full
level) !
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Insert : 8 5 1 7 3 12 9 6 15
=> Max # of keys in a node = P – 1 = 2

Insert 1
7
3
3 5
5

3 1 1 53 15 88 7

CaseCase
2 : Leaf overflow
1 : Leaf (split)
not full
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Insert : 8 5 1 7 3 12 9 6 15
=> Max # of keys in a node = P – 1 = 2

Insert 12 3 5 8 8

3 1 5 812 78 7 8
812

Case 3 : Case
Non-Leaf
2 : Leaf
overflow
overflow
(new level)
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Insert : 8 5 1 7 3 12 9 6 15
=> Max # of keys in a node = P – 1 = 2

Insert
Insert 12
9
6 5

3 8

3 1 5 88 77 6 12 9

Case 12 : Leaf not


overflow
full
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Insert : 8 5 1 7 3 12 9 6 15
=> Max # of keys in a node = P – 1 = 2

Insert 6 5

3 7 88

3 1 5 7 6 88 12 9

Hussam A. Halim Computer Science De


pt. 2020
Example : A B+ Tree of order P = 3. Insert : 8 5 1 7 3 12 9 6 15
=> Max # of keys in a node = P – 1 = 2

Insert 15
5

3 7 8
7 8 12

3 1 5 7 6 8 12
15 12
99 9 15

Case
Case
3 :2Non-Leaf
: Leaf overflow
overflow
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Insert : 8 5 1 7 3 12 9 6 15
=> Max # of keys in a node = P – 1 = 2

Insert 15
5 8

3 7 12

3 1 5 7 6 8 12 9 15

Hussam A. Halim Computer Science De


pt. 2020
B+-Tree : Deletion
• Find correct leaf L and delete the entry with the key k.

Case 1 : L is at least half-full


• Make changes to parent (if needed). Done!
Case 2 : L is underflow and have a proper sibling (with enough keys)
• Re-distribute, borrow from its sibling and make changes
to parent.
Case 3 : L is underflow and no proper sibling
• Merge L with its sibling and delete the entry (pointing to
L) from parent of L.
Case 4 : If a non-leaf is underflow try to re-distribute, if it fails, just
merge it with its sibling.
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Delete : 6 12 9 8
=> Min # of keys in a leaf = ⌊(P-1)/2⌋ = 1
=> Min # of keys in a non-leaf = ⎡(P-1)/2⎤ = 1

Delete 6
7

1 6
5 9

1 5 6 7 8 9 12

Case 1 : not underflow


Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Delete : 5 12 9 8
=> Min # of keys in a leaf = ⌊(P-1)/2⌋ = 1
=> Min # of keys in a non-leaf = ⎡(P-1)/2⎤ = 1

Delete 12
7

1 5 8
9

1 5 7 8 9 129

Case 2 : underflow (re-distribute)


Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Delete : 5 12 9 8
=> Min # of keys in a leaf = ⌊(P-1)/2⌋ = 1
=> Min # of keys in a non-leaf = ⎡(P-1)/2⎤ = 1

Delete 9
7

1 5 8

1 5 7 8 9

Case 4 : Case
non-leaf is underflow
3 : underflow (redistribute)
(merge)
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Delete : 5 12 9 8
=> Min # of keys in a leaf = ⌊(P-1)/2⌋ = 1
=> Min # of keys in a non-leaf = ⎡(P-1)/2⎤ = 1

Delete 9 7

1 5 7

1 5 7 8
7 8

Hussam A. Halim Computer Science De


pt. 2020
Example : A B+ Tree of order P = 3. Delete : 5 12 9 8
=> Min # of keys in a leaf = ⌊(P-1)/2⌋ = 1
=> Min # of keys in a non-leaf = ⎡(P-1)/2⎤ = 1

Delete 8
9
5

1 7

1 5 7 8

Case Case
4 : non-leaf
3 : underflow
is underflow
(merge)
(merge)
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Delete : 5 12 9 8
=> Min # of keys in a leaf = ⌊(P-1)/2⌋ = 1
=> Min # of keys in a non-leaf = ⎡(P-1)/2⎤ = 1

Delete 8 5

1 1 5

1 5 7

Hussam A. Halim Computer Science De


pt. 2020
B+-Trees vs. B-Trees
• In B+-Trees, data pointers are stored only at
the leaf nodes, where in B-trees, data pointers
are stored in all nodes.
• In B+-Trees, leaf nodes linked together to
provide ordered access on the search field to
the records.
• B+-Trees have less levels, since internal nodes
include keys and pointers without any data
pointers.
Hussam A. Halim Computer Science De
pt. 2020
A B+-Tree of order P = 4 :

Hussam A. Halim Computer Science De


pt. 2020
Determining the size of B+-tree
• The order of a non-leaf node is determined by the maximum
child pointers and keys :
(P-1) * key size + (P * block pointer size) = block size

• The order of a leaf node is determined by the maximum


number of keys, record pointer, and block pointer :
Pleaf * key size + Pleaf * record pointer size + block pointer size = block size

• The height of a tree with branching factor m is no more than:


⌈logm(# of leaf pages)⌉

Hussam A. Halim Computer Science De


pt. 2020
Example
• Suppose we have a B+-tree index of order P where :
Key field V = 9 bytes Block size B = 512 bytes
Record pointer Pr = 7 bytes Block pointer Pb = 6 bytes.
– To find the max #keys an internal node can hold:
(P*Pb) + ((P – 1) * V) ≤ B
(P* 6) + ((P − 1) * 9) ≤ 512
(15 * P) ≤ 521
A block can hold up to P = 34 pointers (and 33 key value)
– We can also calculate the max #keys that a leaf can hold :
(Pleaf * (Pr + V)) + Pb ≤ B
(Pleaf * (7 + 9)) + 6 ≤ 512
(16 * Pleaf ) ≤ 506
Each leaf node can hold up to Pleaf = 31 key value
Hussam A. Halim Computer Science De
pt. 2020
Example
• Suppose we have a B-tree index of order P where :
Key field V = 9 bytes Block size B = 512 bytes
Record pointer Pr = 7 bytes Block pointer Pb = 6 bytes.

– To find the max #keys a node can hold:


(P * Pb) + ((P – 1) * (V + Pr)) ≤ B
(P * 6) + ((P − 1) * (9 + 7)) ≤ 512
(22 * P) ≤ 528

So a block can hold up to P = 24 pointers. This is less than the


value of 34 for the B+-tree, resulting in a larger branching
factor and more entries in each internal node of a B+-tree
than in the corresponding B-tree.

Hussam A. Halim Computer Science De


pt. 2020
Example
• Suppose that we construct a B+-tree on the field in the
previous example. To calculate the approx. number of entries
in the B+-tree, we assume that each node is 69% full.
– On the avg., each internal node will have P * 0.69 = 34 * 0.69 or
approx. 23 pointers (and 22 key values).
– Each leaf node, on the average, will hold Pleaf * 0.69 = 0.69 * 31 or
approx. 21 data record pointers.
– We can start at the root and see how many values and pointers
can exist :
Root : 1 node 22 key entries 23 pointers
Level 1 : 23 nodes 506 key entries 529
pointers
Level 2 : 529 nodes 11,638 key entries 12,167
pointers Hussam A. Halim Computer Science De
pt. 2020
Example
• Again, suppose that we construct a B-tree on the same field in
the previous example, and also Assume that each node is 69%
full to calculate the approx. number of entries in the B-tree.
– On the avg., each node will have P * 0.69 = 24 * 0.69 or approx.
16 pointers (and 15 key values).
– Each leaf node, on the average, will hold Pleaf * 0.69 = 0.69 * 31
or approx. 21 data record pointers.
– The number of nodes, keys, and pointers :
Root : 1 node 15 key entries 16 pointers
Level 1 : 16 nodes 240 key entries 256
pointers
Level 2 : 256 nodes 3840 key entries 4096 pointers
Level 3 : 4096 nodes 61,440 key entries
Hussam A. Halim Computer Science De
pt. 2020
Example
• Suppose we have a data file with following parameters :
– Number of records = 2,000,000
– Record (sizes in bytes) = emp(SSN(40), Name(12), Dept(5), Age(5))
– Block size = 1000 bytes
– block pointer = 10 bytes
We want to construct a B+-Tree index on the field SSN. How large
would it be ? (links between leaves are not taken into account).

– Index entries per leaf (Pleaf)= block_size / (key_size + block_pointer_size)


= 1000 / 50 = 20

– # of leaf blocks = #records / Pleaf = ⎡2,000,000 / 20 ⎤ = 100,000

Hussam A. Halim Computer Science De


pt. 2020
.Example Cont
– Branching factor (P) :
(P*pointer_size) + ((P – 1) * key_size) ≤ block_size
(P* 10) + ((P − 1) * 40) ≤ 1000
(50 * P) ≤ 1040
P = ⌊ 1040 / 50⌋ = 20

– #blocks in upper level = #blocks_in_lower_level/P = ⎡ 100,000 / 20 ⎤ = 5,000


– #blocks in upper level = ⎡ 5,000 / 20 ⎤ = 250
– #blocks in upper level = ⎡ 250 / 20 ⎤ = 13
– #blocks in upper level = ⎡ 13 / 20 ⎤ = 1

– # of levels = ⌈log20(1000)⌉ = 4

Hussam A. Halim Computer Science De


pt. 2020
Thanks

Hussam A. Halim Computer Science De


pt. 2020

You might also like