Indexing and Hashing
Indexing and Hashing
pointer
Index files are typically much smaller than the original file
Two basic kinds of indices:
Ordered indices: search keys are stored in sorted
order
Hash indices: search keys are distributed uniformly
across buckets using a hash function.
27
Main concepts
search keys are sorted in the index file and point to
the actual records
primary vs. secondary indices
Clustering (sparse) vs non-clustering (dense) indices
Primary key index: on primary key (no duplicates)
123
234
345
456
567
STUDENT
Ssn
123
234
678
456
345
Nam e
smith
jones
tom s o n
stevens
smith
Address
main str
forbes a ve
main str
forbes a ve
forbes a ve
Address-index
STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith
Address
main str
forbes ave
main str
forbes ave
forbes ave
28
Postings lists
STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith
forbes ave
main str
Address
main str
forbes ave
main str
forbes ave
forbes ave
>=123
>=456
STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith
Address
main str
forbes ave
main str
forbes ave
forbes ave
29
Ssn
345
234
567
456
123
Nam e
tom s o n
jones
smith
ste v e n s
smith
Address
main str
forbes a ve
forbes a ve
forbes a ve
main str
ISAM
What if index is too large to search sequentially?
>=123
123
3,423
123
456
>=456
block
STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith
Address
main str
forbes ave
main str
forbes ave
forbes ave
30
Multilevel Index
If primary index does not fit in memory, access
becomes expensive.
if index is too large, store it on disk and keep
index-on-the-index
usually two levels of indices, one first- level entry
per disk block (why? )
What about insertions/deletions?
Index Update: Deletion
If deleted record was the only record in the file with
its particular search-key value, the search-key is
deleted from the index also.
>=123
123
3,423
123
456
>=456
124; peterson; fifth ave.
STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith
Address
main str
forbes ave
main str
forbes ave
forbes ave
31
123
3,423
123
456
STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith
overflows
Address
main str
forbes ave
main str
forbes ave
forbes ave
Problems?
32
B-trees
Eg., B-tree of order 3:
<6
>6
1
<9
7
9
>9
13
33
<6
>6
3
H steps
>9
<9
7
13
<6
>6
1
<9
7
9
>9
13
34
<6
>=6
1
9
>=9
<9
6
13
35
36
37
38
Queries on B+-Trees
39
B+ tree insertion
Find the leaf node in which the search-key value
would appear
If the search-key value is already there in the leaf
node, record is added to file and if necessary a
pointer is inserted into the bucket.
If the search-key value is not there, then add the
record to the main file and create a bucket if necessary. Then:
If there is room in the leaf node, insert (key-value,
pointer) pair in the leaf node
Otherwise, split the node (along with the new (keyvalue, pointer) entry).
Splitting a node:
take the n(search-key value, pointer) pairs (including
the one being inserted) in sorted order. Place the first
n 2 in the original node, and the rest in a new
node.
40
let the new node be p, and let k be the least key value
in p. Insert (k,p) in the parent of the node being split.
If the parent is full, split it and propagate the split
further up.
The splitting of nodes proceeds upwards till a node
that is not full is found. In the worst case the root
node may be split increasing the height of the tree
by 1.
/* ATTENTION:
a split at the LEAF level is handled by COPYING
the middle key upstairs;
A split at a higher level is handled by PUSHING
the middle key upstairs */
INSERTION OF KEY K
insert search-key value to L such that the keys are in
order;
if ( L overflows) {
split L ;
insert (ie., COPY) smallest search-key value
of new node to parent node P;
if (P overflows) {
repeat the B-tree split procedure recursively;
/* Notice: the B-TREE split; NOT the B+ -tree */
}
}
DBMS: Rajeev Wankar
41
E g ., in s e r t 8
6
<6
>=9
<9
>=6
3
13
E g ., in s e r t 8
6
<6
>=9
<9
>=6
1
13
C O P Y m id d le u p s t a ir s
Eg., insert 8
6
<6
>=9
>=6
<9
6
7
7
13
42
N o n -leaf overflow
just PU S H t h e
m iddle
E g ., insert 8
6
<6
>=9
<9
>=6
1
13
C O P Y m iddle upstairs
E g ., in s e r t 8
<7
>=7
9
<6
<9
>=9
>=6
1
FIN A L T R E E
13
43
44
45