0% found this document useful (0 votes)
22 views

CS2606: Data Structures and Object-Oriented Development Chapter 10: Indexing

The document discusses indexing for efficiently storing and searching large files. It covers linear indexing and tree indexing, specifically 2-3 trees and B-trees. B-trees improve on 2-3 trees by keeping similar values together, guaranteeing nodes are full, and always being balanced. B-trees support efficient insertion, deletion, and range searches. The most common implementation is the B+-tree where internal nodes store only keys and leaf nodes store records.

Uploaded by

anon_484100541
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

CS2606: Data Structures and Object-Oriented Development Chapter 10: Indexing

The document discusses indexing for efficiently storing and searching large files. It covers linear indexing and tree indexing, specifically 2-3 trees and B-trees. B-trees improve on 2-3 trees by keeping similar values together, guaranteeing nodes are full, and always being balanced. B-trees support efficient insertion, deletion, and range searches. The most common implementation is the B+-tree where internal nodes store only keys and leaf nodes store records.

Uploaded by

anon_484100541
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

Course notes

CS2606: Data Structures and


Object-Oriented Development

Chapter 10: Indexing


Department of Computer Science
Virginia Tech
Spring 2008
(The following notes were derived from Cliff Shaffer’s textbook and notes)
Indexing
Goals:
– Store large files
– Support multiple search keys
– Support efficient insert, delete, and
range queries
Terms(1)
Entry sequenced file: Order records
by time of insertion.
– Search with sequential search

Index file: Organized, stores


pointers to actual records.
– Could be organized with a tree or
other data structure.
Terms(2)
Primary Key: A unique identifier for
records. May be inconvenient for
search.
Secondary Key: An alternate search
key, often not unique for each
record. Often used for search
key.
Linear Indexing
Linear index: Index file organized
as a simple sequence of
key/record pointer pairs with key
values are in sorted order.
Linear indexing is good for
searching variable-length records.
Linear Indexing (2)
If the index is too large to fit in
main memory, a second-level
index might be used.
Tree Indexing (1)
Linear index is poor for
insertion/deletion.

Tree index can efficiently support


all desired operations:
– Insert/delete
– Multiple search keys (multiple
indices)
– Key range search
Tree Indexing (2)
Difficulties when storing
tree index on disk:
– Tree must be balanced.
– Each path from root to
leaf should cover few disk
pages.
2-3 Tree (1)
A 2-3 Tree has the following
properties:
1. A node contains one or two keys
2. Every internal node has either two
children (if it contains one key) or
three children (if it contains two keys).
3. All leaves are at the same level in the
tree, so the tree is always height
balanced.

The 2-3 Tree has a search tree


property analogous to the BST.
2-3 Tree (2)
The advantage of the 2-3 Tree over
the BST is that it can be updated
at low cost.
2-3 Tree Insertion (1)
2-3 Tree Insertion (2)
2-3 Tree Insertion (3)
B-Trees (1)
The B-Tree is an extension of the 2-
3 Tree.

The B-Tree is now the standard file


organization for applications
requiring insertion, deletion, and
key range searches.
B-Trees (2)
1. B-Trees are always balanced.
2. B-Trees keep similar-valued records
together on a disk page, which
takes advantage of locality of
reference.
3. B-Trees guarantee that every node
in the tree will be full at least to a
certain minimum percentage. This
improves space efficiency while
reducing the typical number of disk
fetches necessary during a search
or update operation.
B-Tree Definition
A B-Tree of order m has these properties:
– The root is either a leaf or has at least two
children.
– Each node, except for the root and the
leaves, has between m/2 and m children.
– All leaves are at the same level in the tree,
so the tree is always height balanced.

A B-Tree node is usually selected to match


the size of a disk block.
– A B-Tree node could have hundreds of
children.
B-Tree Search (1)
Search in a B-Tree is a
generalization of search in a 2-3
Tree.
1. Do binary search on keys in current
node. If search key is found, then
return record. If current node is a
leaf node and key is not found, then
report an unsuccessful search.
2. Otherwise, follow the proper branch
and repeat the process.
B+-Trees
The most commonly implemented form
of the B-Tree is the B+-Tree.
Internal nodes of the B+-Tree do not
store records -- only key values to
guide the search.
Leaf nodes store records or pointers to
records.
A leaf node may store more or fewer
records than an internal node stores
keys.
B+-Tree Example
B+-Tree Insertion
B+-Tree Deletion (1)
B+-Tree Deletion (2)
B+-Tree Deletion (3)
B-Tree Space Analysis (1)
B+-Trees nodes are always at least half
full.

The B*-Tree splits two pages for three,


and combines three pages into two. In
this way, nodes are always 2/3 full.

Asymptotic cost of search, insertion, and


deletion of nodes from B-Trees is (log
n).
– Base of the log is the (average) branching
factor of the tree.
B-Tree Space Analysis (2)
Example: Consider a B+-Tree of order
100 with leaf nodes containing 100
records.
1 level B+-tree:
2 level B+-tree:
3 level B+-tree:
4 level B+-tree:

Ways to reduce the number of disk


fetches:
– Keep the upper levels in memory.
– Manage B+-Tree pages with a buffer pool.

You might also like