0% found this document useful (0 votes)
25 views

B+-Trees: Adapted From Mike Franklin

This document describes B+ trees, which are a data structure used to store indexed data in databases. B+ trees allow for efficient searching, insertion, and deletion operations that take logarithmic time. They maintain a balanced tree structure where internal nodes can have a variable number of child nodes between a minimum and maximum threshold. The document provides examples of operations like searching, inserting, deleting on a sample B+ tree to demonstrate how the tree structure is updated. It also discusses properties of B+ trees like order, fill factor, and how they are implemented in practice.

Uploaded by

Rupali Misri
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

B+-Trees: Adapted From Mike Franklin

This document describes B+ trees, which are a data structure used to store indexed data in databases. B+ trees allow for efficient searching, insertion, and deletion operations that take logarithmic time. They maintain a balanced tree structure where internal nodes can have a variable number of child nodes between a minimum and maximum threshold. The document provides examples of operations like searching, inserting, deleting on a sample B+ tree to demonstrate how the tree structure is updated. It also discusses properties of B+ trees like order, fill factor, and how they are implemented in practice.

Uploaded by

Rupali Misri
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

B+-Trees

Adapted from Mike Franklin

Example Tree Index


Index entries:<search key value, page id> they direct search for data entries in leaves.
Example where each node can hold 2 entries;

Root
40

20

33

51

63

10*

15*

20*

27*

33*

37*

40*

46*

51*

55*

63*

97*

ISAM
Indexed Sequential Access Method Similar to what we discussed in the last class
Root Index Pages
40

20

33

51

63

Primary Leaf Pages


10* 15* 20* 27* 33* 37* 40* 46* 51* 55*

63*

97*

Overflow
Pages

23*

48*

41*

42*

Example B+ Tree
Search begins at root, and key comparisons direct it to a leaf. Search for 5*, 15*, all data entries >= 24* ...

Root
13 17 24 30

2*

3*

5*

7*

14* 16*

19* 20* 22*

24* 27* 29*

33* 34* 38* 39*

Based on the search for 15*, we know it is not in the tree!

B+ Tree - Properties
Balanced Every node except root must be at least full. Order: the minimum number of keys/pointers in a non-leaf node Fanout of a node: the number of pointers out of the node

B+ Trees in Practice
Typical order: 100. Typical fill-factor: 67%.
average fanout = 133

Typical capacities:
Height 3: 1333 = 2,352,637 entries Height 4: 1334 = 312,900,700 entries

Can often hold top levels in buffer pool:


Level 1 = 1 page = 8 Kbytes Level 2 = 133 pages = 1 Mbyte Level 3 = 17,689 pages = 133 MBytes

B+ Trees: Summary
Searching:
logd(n) Where d is the order, and n is the number of entries

Insertion:
Find the leaf to insert into If full, split the node, and adjust index accordingly Similar cost as searching

Deletion
Find the leaf node Delete May not remain half-full; must adjust the index accordingly

Insert 23*
Root
13 17 24 30

2*

3*

5*

7*

14* 16*

19* 20* 22*

24* 27* 29*

33* 34* 38* 39*

No splitting required.
Root
13 17 24

30

2*

3*

5*

7*

14* 16*

19* 20* 22* 23*

24* 27* 29*

33* 34* 38* 39*

Example B+ Tree - Inserting 8*


Root Root
13 17 17 24 30

13

24

30

2* 2*

3* 3*

5*

7*

5*

14* 16* 7* 8*

14* 16*

24* 27* 29* 19* 20* 22* 19* 20* 22* 24* 27* 29*

33* 34* 38* 39* 33* 34* 38* 39*

Notice that root was split, leading to increase in height. In this example, we can avoid split by re-distributing entries; however, this is usually not done in practice.

Data vs. Index Page Split


(from previous example of inserting 8)
Observe how minimum occupancy is guaranteed in both leaf and index pg splits. Note difference between copy-up and push-up; be sure you understand the reasons for this. Data Page Split
2* 3* 5* 7* 8*

Entry to be inserted in parent node. (Note that 5 is copied up and s continues to appear in the leaf.)

2*

3*

5*

7*

8*

13 17

Index Page Split

24

30

17

Entry to be inserted in parent node. (Note that 17 is pushed up and only appears once in the index. Contrast this with a leaf split.)

13

24

30

Delete 19*
Root
17

Root
5 13 13 2* 3* 5* 7* 8* 17 24 30 33* 34* 38* 39* 24 30

14* 16*

19* 20* 22*

24* 27* 29*

2*

3*

5*

7*

14* 16*

19* 20* 22*

24* 27* 29*

33* 34* 38* 39*

Root
17

13

24

30

2*

3*

5*

7* 8*

14* 16*

20* 22*

24* 27* 29*

33* 34* 38* 39*

Delete 20* ...


Root
17

13

24

30

2*

3*

5*

7* 8*

14* 16*

20* 22*

24* 27* 29*

33* 34* 38* 39*

Root
17

13

27

30

2*

3*

5*

7* 8*

14* 16*

22* 24*

27* 29*

33* 34* 38* 39*

Delete 19* and 20* ...


Deleting 19* is easy. Deleting 20* is done with re-distribution. Notice how middle key is copied up. Further deleting 24* results in more drastic changes

Delete 24* ...


Root
17

13

27

30

2*

3*

5*

7* 8*

14* 16*

22* 24*

27* 29*

33* 34* 38* 39*

Root
17

No redistribution from neighbors possible

13

27

30

2*

3*

5*

7* 8*

14* 16*

22*

27* 29*

33* 34* 38* 39*

Deleting 24*
Must merge. Observe `toss of index entry (on right), and `pull down of index entry (below).
22* 27* 29* 30

33*

34*

38*

39*

Root
5 13

17

30

2*

3*

5*

7*

8*

14* 16*

22* 27* 29*

33* 34* 38* 39*

Example of Non-leaf Redistribution


Tree is shown below during deletion of 24*. (What could be a possible initial tree?) In contrast to previous example, can re-distribute entry from left child of root to right child.
Root
22

13

17

20

30

2* 3*

5* 7* 8*

14* 16*

17* 18*

20* 21*

22* 27* 29*

33* 34* 38* 39*

After Re-distribution
Intuitively, entries are re-distributed by `pushing through the splitting entry in the parent node. It suffices to re-distribute index entry with key 20; weve redistributed 17 as well for illustration.
Root
17

13

20

22

30

2* 3*

5* 7* 8*

14* 16*

17* 18*

20* 21*

22* 27* 29*

33* 34* 38* 39*

Primary vs Secondary Index


Note: We were assuming the data items were in sorted order
This is called primary index

Secondary index:
Built on an attribute that the file is not sorted on.

A Secondary B+-Tree index


Root
17

13

20

22

30

14 16

17 18

20 21

22 27 29

33 34 38 39

2* 16* 5* 39*

Primary vs Secondary Index


Note: We were assuming the data items were in sorted order
This is called primary index

Secondary index:
Built on an attribute that the file is not sorted on.

Can have many different indexes on the same file.

More
Hash-based Indexes
Static Hashing Extendible Hashing
Read on your own.

Linear Hashing

Grid-files R-Trees etc

You might also like