CS2606: Data Structures and Object-Oriented Development Chapter 10: Indexing
The document discusses indexing for efficiently storing and searching large files. It covers linear indexing and tree indexing, specifically 2-3 trees and B-trees. B-trees improve on 2-3 trees by keeping similar values together, guaranteeing nodes are full, and always being balanced. B-trees support efficient insertion, deletion, and range searches. The most common implementation is the B+-tree where internal nodes store only keys and leaf nodes store records.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
22 views
CS2606: Data Structures and Object-Oriented Development Chapter 10: Indexing
The document discusses indexing for efficiently storing and searching large files. It covers linear indexing and tree indexing, specifically 2-3 trees and B-trees. B-trees improve on 2-3 trees by keeping similar values together, guaranteeing nodes are full, and always being balanced. B-trees support efficient insertion, deletion, and range searches. The most common implementation is the B+-tree where internal nodes store only keys and leaf nodes store records.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25
Course notes
CS2606: Data Structures and
Object-Oriented Development
Chapter 10: Indexing
Department of Computer Science Virginia Tech Spring 2008 (The following notes were derived from Cliff Shaffer’s textbook and notes) Indexing Goals: – Store large files – Support multiple search keys – Support efficient insert, delete, and range queries Terms(1) Entry sequenced file: Order records by time of insertion. – Search with sequential search
Index file: Organized, stores
pointers to actual records. – Could be organized with a tree or other data structure. Terms(2) Primary Key: A unique identifier for records. May be inconvenient for search. Secondary Key: An alternate search key, often not unique for each record. Often used for search key. Linear Indexing Linear index: Index file organized as a simple sequence of key/record pointer pairs with key values are in sorted order. Linear indexing is good for searching variable-length records. Linear Indexing (2) If the index is too large to fit in main memory, a second-level index might be used. Tree Indexing (1) Linear index is poor for insertion/deletion.
Tree index can efficiently support
all desired operations: – Insert/delete – Multiple search keys (multiple indices) – Key range search Tree Indexing (2) Difficulties when storing tree index on disk: – Tree must be balanced. – Each path from root to leaf should cover few disk pages. 2-3 Tree (1) A 2-3 Tree has the following properties: 1. A node contains one or two keys 2. Every internal node has either two children (if it contains one key) or three children (if it contains two keys). 3. All leaves are at the same level in the tree, so the tree is always height balanced.
The 2-3 Tree has a search tree
property analogous to the BST. 2-3 Tree (2) The advantage of the 2-3 Tree over the BST is that it can be updated at low cost. 2-3 Tree Insertion (1) 2-3 Tree Insertion (2) 2-3 Tree Insertion (3) B-Trees (1) The B-Tree is an extension of the 2- 3 Tree.
The B-Tree is now the standard file
organization for applications requiring insertion, deletion, and key range searches. B-Trees (2) 1. B-Trees are always balanced. 2. B-Trees keep similar-valued records together on a disk page, which takes advantage of locality of reference. 3. B-Trees guarantee that every node in the tree will be full at least to a certain minimum percentage. This improves space efficiency while reducing the typical number of disk fetches necessary during a search or update operation. B-Tree Definition A B-Tree of order m has these properties: – The root is either a leaf or has at least two children. – Each node, except for the root and the leaves, has between m/2 and m children. – All leaves are at the same level in the tree, so the tree is always height balanced.
A B-Tree node is usually selected to match
the size of a disk block. – A B-Tree node could have hundreds of children. B-Tree Search (1) Search in a B-Tree is a generalization of search in a 2-3 Tree. 1. Do binary search on keys in current node. If search key is found, then return record. If current node is a leaf node and key is not found, then report an unsuccessful search. 2. Otherwise, follow the proper branch and repeat the process. B+-Trees The most commonly implemented form of the B-Tree is the B+-Tree. Internal nodes of the B+-Tree do not store records -- only key values to guide the search. Leaf nodes store records or pointers to records. A leaf node may store more or fewer records than an internal node stores keys. B+-Tree Example B+-Tree Insertion B+-Tree Deletion (1) B+-Tree Deletion (2) B+-Tree Deletion (3) B-Tree Space Analysis (1) B+-Trees nodes are always at least half full.
The B*-Tree splits two pages for three,
and combines three pages into two. In this way, nodes are always 2/3 full.
Asymptotic cost of search, insertion, and
deletion of nodes from B-Trees is (log n). – Base of the log is the (average) branching factor of the tree. B-Tree Space Analysis (2) Example: Consider a B+-Tree of order 100 with leaf nodes containing 100 records. 1 level B+-tree: 2 level B+-tree: 3 level B+-tree: 4 level B+-tree:
Ways to reduce the number of disk
fetches: – Keep the upper levels in memory. – Manage B+-Tree pages with a buffer pool.