Index Structures
Index Structures
II
Index Structures
Index Structures
It is not sufficient simply to scatter the records that represent tuples of a relation among various
blocks. (SELECT * FROM R)
Instead,
Select *from R where x=5;
An index is any data structure that takes the value of one or more fields and finds the records
with that value “quickly.” In particular, an index lets us find a record without having to look at
more than a small fraction of all possible records.
The field(s) on whose values the index is based is called the search key, or just “key” if the
index is understood.
Indexes help to speed up queries that specify values for one or more attributes.
Fig: An index takes a value for some field(s) and finds records with the
Basics of Index Structures
Storage structures consist of files, which are similar to the files used by operating systems. A data
file may be used to store a relation, for example. The data file may have one or more index files.
Each index file associates values of the search key with pointers to data-file records that have that
value for the attribute(s) of the search key.
Indexes are of two types:
Dense: There is an entry in the index file for every record of the data file
Sparse: Only some of the data records are represented in the index, often one index entry per block
of the data file.
Indexes can also be
Primary: Can determine the location of the records of the data file.
Secondary: Cannot determine the location of the records of the data file.
For example, it is common to create a primary index on the primary key of a relation and to create
secondary indexes on some of the other attributes.
Sequential files and Dense index
A sequential file is created by sorting the tuples of a relation by their primary
key. The tuples are then distributed among blocks, in this order.
If records are sorted, we can build on them a dense index, which is a sequence
of blocks holding only the keys of the records and pointers to the records
themselves
Indexfile
datafile
Fig: A sparse index on a sequential data file
Searching using Sparse index
To find the record with search-key value K , we search the sparse index for
the largest key less than or equal to K .
Since the index file is sorted by key, a binary search can locate this entry. We
follow the associated pointer to a data block.
Now, we must search this block for the record with key K . Of course the
block must have enough format information that the records and their contents
can be identified.
Multiple Levels of Index
An index file can cover many blocks. Even if we use binary search to find the
desired index entry, we still may need to do many disk I/O ’s to get to the record
we want. By putting an index on the index, we can make the use of the first level of
index more efficient.
•The first column indicates the type of occurrence, i.e., its marking, if any.
•The second and third columns are together the pointer to the occurrence.
•The third column indicates the document, and the second column gives the
number of the word in the document.
B Trees
While one or two levels of index are often very helpful in speeding up queries,
there is a more general structure that is commonly used in commercial systems.
This family of data structures is called B-trees, and the particular variant that is
most often used is known as a B+ tree.
B-trees automatically maintain as many levels of index as is appropriate for the size of
the file being indexed.
B-trees manage the space on the blocks they use so that every block is between half
used and completely full.
The Structure of B-Trees
A B-tree organizes its blocks into a tree that is balanced, meaning that all paths
from the root to a leaf have the same length.
Typically, there are three layers in a B-tree: the root, an intermediate layer, and
leaves, but any number of layers is possible.
The Structure of B-Trees
There is a parameter ‘n’ associated with each B-tree index, and this parameter
determines the layout of all blocks of the B-tree.
Each block will have space for n search-key values and n + 1 pointers.
Suppose our blocks are 4096 bytes. Also let keys be integers of 4 bytes and let
pointers be 8 bytes. If there is no header information kept on the blocks, then we
want to find the largest integer value of n such that 4n + 8(n + 1) < 4096. That
value is n = 340.
The Structure of B-Trees
There are several important rules about what can appear in the blocks of a B-tree:
1. The keys in leaf nodes are copies of keys from the data file. These keys are distributed among
the leaves in sorted order, from left to right.
2. At the root, there are at least two used pointers. All pointers point to B-tree blocks at the level
below.
3. At a leaf, the last pointer points to the next leaf block to the right, i.e., to the block with the
next higher keys. Among the other n pointers in a leaf block, at least ceil((n + 1)/2) of these
pointers are used and point to data records; unused pointers are null and do not point
anywhere. The ith pointer, if it is used, points to a record with the ith key.
Properties of B-Trees
B-Tree of Order m has the following properties...
Property #1 - All leaf nodes must be at same level.
Property #2 - All nodes except root must have at least [m/2]-1 keys and maximum of
m-1 keys.
Property #3 - All non leaf nodes except root (i.e. all internal nodes) must have at least
m/2 children.
Property #4 - If the root node is a non leaf node, then it must have atleast 2 children.
Property #5 - A non leaf node with n-1 keys must have n number of children.
Property #6 - All the key values in a node must be in Ascending Order.
Insertion Operation in B-Tree
In a B-Tree, a new element must be added only at the leaf node. That means, the new keyValue
is always attached to the leaf node only. The insertion operation is performed as follows...
Step 1 - Check whether tree is Empty.
Step 2 - If tree is Empty, then create a new node with new key value and insert it into the tree as a root
node.
Step 3 - If tree is Not Empty, then find the suitable leaf node to which the new key value is added using
Binary Search Tree logic.
Step 4 - If that leaf node has empty position, add the new key value to that leaf node in ascending order
of key value within the node.
Step 5 - If that leaf node is already full, split that leaf node by sending middle value to its parent node.
Repeat the same until the sending value is fixed into a node.
Step 6 - If the splitting is performed at root node then the middle value becomes new root node for the
tree and the height of the tree is increased by one.
Deletion operation B-Tree
Case 1 − If the key to be deleted is in a leaf node and the deletion does not
violate the minimum key property, just delete the node.
Case 2 − If the key to be deleted is in a leaf node but the deletion violates the
minimum key property, borrow a key from either its left sibling or right sibling.
In case if both siblings have exact minimum number of keys, merge the node
in either of them.
Case 3 − If the key to be deleted is in an internal node, it is replaced by a
key in either left child or right child based on which child has more keys. But
if both child nodes have minimum number of keys, they’re merged together.