Indexing - II
Indexing - II
Secondary Index
Multi-Level Index
B+- Tree
Index Structure
S
Search key
value
Location Mechanism
Location mechanism
facilitates finding
index entry for S
S Index entries
block
Search K: find entry with largest key K
Sparse Vs Dense Index
Dense index: index entry for each data
record
Unclustered index must be dense
Clustered index need not be dense
Sparse index: index entry for each block
of data file
Sparse Vs. Dense Index
Id Name Dept
Sparse,
clustered
index sorted
on Id
data file sorted Dense,
on Id unclustered
index sorted
on Name
Clustered vs. Unclustered Index
10 10
10 20
20
20 50
30
20
30 10
40 50
50
60
20
Pointers in one index block may refer to
multiple data blocks
Results in more number of Disk I/Os
Unavoidable problem
Using ‘bucket file’ between index file and data
file
Single entry <k,p> for each value ‘k’ where p
points to location in bucket file containing all
other pointers of records with value ‘k’
Avoids wastage of space due to multiple storage
of same value ‘k’
Definition of Bucket
10 10
20 20
30
40 50
30
50
60 10
50
60
Index file 20
Disney 1995
Data
Index Block 0
Block 0
M
Data
Block 1
M
Index
Block 1
M
M
CIS552 23
Multi-level indexes
When an index is too large with even binary
search taking too many disk I/Os
Define second level index: index on index
This can continue to multi-level index structure
Second and higher level indexes must be sparse
Second level index in previous example would
take only 100 blocks, 400KB
Search involves 6 disk I/Os and searching in the
block
A Two-level Primary Index
Estimating Costs
For simplicity we estimate the cost of an operation by
counting the number of blocks that are read or
written to disk.
We ignore the possibility of blocked access which
could significantly lower the cost of I/O.
We assume that each relation is stored in a separate
file with B blocks and R records per block.
value in an attribute
Range searches – records with an attribute
29
B+-Tree Index
A B+-tree is a rooted tree satisfying the following properties:
o All paths from root to leaf are of the same length
o Each node that is not a root or a leaf has between n/2 and
n children. [Non leaf node]
o A leaf node has between (n–1)/2 and n–1 values
o Special cases:
o If the root is not a leaf, it has at least 2 children.
Index Entries
(Direct search)
Data Entries
("Sequence set")
Dynamic Multilevel Indexes Using B-Trees
and B+-Trees
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Root
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
8,5,1,7,3,12,9,6
Deleting a Data Entry from a B+ Tree
17
5 13 27 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
Must merge. 30
Root
5 13 17 30