B+ Tree
Definition: A B+tree is a balanced tree in which every path from the root of the tree to a leaf is of the
same length, and each nonleaf node of the tree has between [n/2] and [n] children, where n is fixed for a
particular tree. It contains index pages and data pages. The capacity of a leaf has to be 50% or more. For
example: if n = 4, then the key for each node is between 2 to 4. The index page will be 4 + 1 = 5.
B+-tree Structure
A B+-tree is a generalization of a binary search tree (BST). The main difference is that nodes of a B+tree
will point to many children nodes rather than being limited to only two. Since the goal is to minimize disk
accesses whenever we are trying to locate records, we want to make the height of the multiway search tree
as small as possible. This goal is achieved by having the tree branch in large amounts at each node.
A B+-tree of order m is a tree where each internal node contains up to m branches (children nodes) and
thus store up to m-1 search key values -- in a BST, only one key value is needed since there are just two
children nodes that an internal node can have. m is also known as the branching factor or the fanout of the
tree.
   1. The B+-tree stores records (or pointers to actual records) only at the leaf nodes, which are all
      found at the same level in the tree, so the tree is always height balanced.
   2. All internal nodes, except the root, have between Ceiling(m /2) and C children
   3. The root is either a leaf or has at least two children.
   4. Internal nodes store search key values, and are used only as placeholders to guide the search. The
      number of search key values in each internal node is one less than the number of its non-empty
      children, and these keys partition the keys in the children in the fashion of a search tree. The keys
      are stored in non-decreasing order (i.e. sorted in lexicographical order).
   5. Depending on the size of a record as compared to the size of a key, a leaf node in a B+-tree of
      order m may store more or less than m records. Typically this is based on the size of a disk block,
      the size of a record pointer, etcetera. The leaf pages must store enough records to remain at least
      half full.
   6. The leaf nodes of a B+-tree are linked together to form a linked list. This is done so that the
      records can be retrieved sequentially without accessing the B+-tree index. This also supports fast
      processing of range-search queries.
Example of a B+ tree with four keys (n = 4) looks like this:
B+-tree operations
To understand the B+-tree operations more clearly, assume, without loss of generality, that there is a table
whose primary is a single attribute and that it has a B+-tree index organized on the PK attribute of the
table.
    Searching for records that satisfy a simple condition
To retrieve records, queries are written with conditions that describe the values that the desired records are
to have. The most basic search on a table to retrieve a single record given its PK value K.
  Search in a B+-tree is an alternating two-step process, beginning with the root node of the B+-tree. Say
that the search is for the record with key value K -- there can only be one record because we assume that
the index is built on the PK attribute of the table.
    1. Perform a binary search on the search key values in the current node -- recall that the search key
       values in a node are sorted and that the search starts with the root of the tree. We want to find the
       key Ki such that Ki ≤ K < Ki+1.
    2. If the current node is an internal node, follow the proper branch associated with the key Ki by
       loading the disk page corresponding to the node and repeat the search process at that node.
    3. If the current node is a leaf, then:
              a. If K=Ki, then the record exists in the table and we can return the record associated with Ki
              b. Otherwise, K is not found among the search key values at the leaf, we report that there is
                 no record in the table with the value K.
Example:
                 Since no structure change in a B+ tree during a searching process, so just compare the key
         value with the data in the tree, then give the result back.
  For example: find the value 45, and 15 in below tree.
                  Result:
          1. For the value of 45, not found.
          2. For the value of 15, return the position where the pointer located.
     Inserting into a B+-tree
Insertion in a B+-tree is similar to inserting into other search trees, a new record is always inserted at one
of the leaf nodes. The complexity added is that insertion could overflow a leaf node that is already full.
When such overflow situations occur a brand new leaf node is added to the B+-tree at the same level as the
other leaf nodes. The steps to insert into a B+-tree are:
   1. Follow the path that is traversed as if a Search is being performed on the key of the new record to
      be inserted.
   2. The leaf page L that is reached is the node where the new record is to be indexed.
   3. If L is not full then an index entry is created that includes the seach key value of the new row and a
      reference to where new row is in the data file. We are done; this is the easy case!
   4. If L is full, then a new leaf node Lnew is introduced to the B+-tree as a right sibling of L. The keys
       in L along with the an index entry for the new record are distributed evenly among L and Lnew.
       Lnew is inserted in the linked list of leaf nodes just to the right of L. We must now link Lnew to
       the tree and since Lnew is to be a sibling of L, it will then be pointed to by the partent of L. The
       smallest key value of Lnew is copied and inserted into the parent of L -- which will also be the
       parent of Lnew. This entire step is known as commonly referred to as a split of a leaf node.
           a. If the parent P of L is full, then it is split in turn. However, this split of an internal node is
               a bit different. The search key values of P and the new inserted key must still be distributed
               evenly among P and the new page introduced as a sibling of P. In this split, however,
               the middle key is moved to the node above -- note, that unlike splitting a leaf node where
               the middle key is copied and inserted into the parent, when you split an internal node the
               middle key is removed from the node being split and inserted into the parent node. This
               splitting of nodes may continue upwards on the tree.
           b. When a key is added to a full root, then the root splits into two and the middle key is
               promoted to become the new root. This is the only way for a B+-tree to increase in height --
               when split cascades the entire height of the tree from the leaf to the root.
Example: Creating a B+ Tree of the following items:
          CNGAHEKQMFWLTZDPRXYS
Step 1:
Step 2:
          CNGAHEKQMFWLTZDPRXYS
          CNGAHEKQMFWLTZDPRXYS
Step 4:
          CNGAHEKQMFWLTZDPRXYS
Step 5:
          CNGAHEKQMFWLTZDPRXYS
Step 6:
          CNGAHEKQMFWLTZDPRXYS
Step 7:
          CNGAHEKQMFWLTZDPRXYS
Step 8:
           CNGAHEKQMFWLTZDPRXYS
Step 9:
           CNGAHEKQMFWLTZDPRXYS
            Step 9.1:
             Step 9.2:
             Step 9.3:
Step 10:
           CNGAHEKQMFWLTZDPRXYS
Step 11:
           CNGAHEKQMFWLTZDPRXYS
           Right Sub-Tree
Step 12:
       CNGAHEKQMFWLTZDPRXYS
       Right Sub-Tree
   Deletion
   Deletion from a B+-tree again needs to be sure to maintain the property that all nodes must be at least
half full. The complexity added is that deletion could underflow a leaf node that has only the minimum
number of entries allowed. When such underflow situations take place, adjacent sibling nodes are
examined; if one of them has more than the minimum entries required then some of its entries are taken
from it to prevent a node from underflowing. Otherwise, if both adjacent sibling nodes are also at their
minimum, then two of these nodes are merged into a single node. The steps to delete from a B+-tree are:
   1. Perform the search process on the key of the record to be deleted. This search will end at a leaf L.
   2. If the leaf L contains more than the minimum number of elements (more than m/2 - 1), then the
       index entry for the record to be removed can be safely deleted from the leaf with no further action.
   3. If the leaf contains the minimum number of entries, then the deleted entry is replaced with another
       entry that can take its place while maintaining the correct order. To find such entries, we inspect
       the two sibling leaf nodes Lleft and Lright adjacent to L -- at most one of these may not exist.
           a. If one of these leaf nodes has more than the minimum number of entries, then enough
               records are transferred from this sibling so that both nodes have the same number of
               records. This is a heuristic and is done to delay a future underflow as long as possible;
               otherwise, only one entry need be transferred. The placeholder key value of the parent node
               may need to be revised.
           b. If both Lleft and Lright have only the minimum number of entries, then L gives its records to
               one of its siblings and it is removed from the tree. The new leaf will contain no more than
               the maximum number of entries allowed. This merge process combines two subtrees of the
               parent, so the separating entry at the parent needs to be removed -- this may in turn cause
               the parent node to underflow; such an underflow is handled the same way that an
               underflow of a leaf node.
           c. If the last two children of the root merge together into one node, then this merged node
               becomes the new root and the tree loses a level.
Example:
1: delete 70 from the following tree
             Result:
2: delete 25 from below tree, but 25 appears in the index page.
    Result: replace 28 in the index page.
3: delete 60 from the below tree
             Result: delete 60 from the index page and combine the rest of index pages.
Speed in B+ Tree Index
   •   In processing a query, we traverse a path from the root to a leaf node. If there are K search key
       values in the file, this path is no longer than log(n/2) K , where n is number of links possible in any
       given node.
   •   This means that the path is not long, even in large files. For a 4k byte disk block with a search-key
       size of 12 bytes and a disk pointer of 8 bytes, n is around 200. If n =100, a look-up of 1 million
       search-key values may take log50(1,000,000) = 4 nodes to be accessed. Since root is in usually in
       the buffer, so typically it takes only 3 or fewer disk reads.
                                        B - Tree Index Files
Definition
   • Similar to B+-tree, but B-tree allows search-key values to appear only once; eliminates redundant
       storage of search keys.
   • Search keys in nonleaf nodes appear nowhere else in the B-tree; an additional pointer field for each
       search key in a nonleaf node must be included.
   • Nonleaf node – pointers Bi are the bucket or file record pointers.
Advantages of B-Tree indices:
  • May use less tree nodes than a corresponding B+-Tree.
  • Sometimes possible to find search-key value before reaching leaf node.
Disadvantages of B-Tree indices:
   • Only small fraction of all search-key values are found early
   • Non-leaf nodes are larger, so fan-out is reduced. Thus B-Trees typically have greater depth than
      corresponding B+-Tree
  •   Insertion and deletion more complicated than in B+-Trees
  •   Implementation is harder than B+-Trees.
                           B+ TREE INSERTION AND DELETION
Adding Records to a B+ Tree
Deleting Keys from a B+ tree