0% found this document useful (0 votes)
1 views

13

The document discusses indexing in data structures, highlighting the limitations of hash tables and the advantages of using index techniques like B trees and B+ trees for efficient data operations. It explains the properties and insertion/deletion processes of 2-3 trees and B+-trees, emphasizing their efficiency in handling large datasets stored on disks. Additionally, it touches on advanced tree structures such as tries and balanced trees, along with a homework assignment related to the topics covered.

Uploaded by

haidarfaiz979
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

13

The document discusses indexing in data structures, highlighting the limitations of hash tables and the advantages of using index techniques like B trees and B+ trees for efficient data operations. It explains the properties and insertion/deletion processes of 2-3 trees and B+-trees, emphasizing their efficiency in handling large datasets stored on disks. Additionally, it touches on advanced tree structures such as tries and balanced trees, along with a homework assignment related to the topics covered.

Uploaded by

haidarfaiz979
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Data Structure and Algorithm Analysis

Chapter 10: Indexing


Slides by: Tristan Wenzheng Xu
Lecturer: Yuhao Yi

College of Computer Science


Sichuan University
Application limitations of Hash
 Hash provides excellent performance for insert,
search, and delete, i.e.,
 Time complexity (1) on average
 But hash has some application limitations:
1. Do not support duplicate keys
2. Only provide exact-search, but not range search
• E.g., search the students with their height
between 1.7m and 1.75m
3. Do not support efficient searching the record with
the minimum or maximum key
What is Index ?
 Index provides following operations:
 efficient Insert (with duplicate keys): (log n)
 efficient exact-search: (log n)
 efficient range-search, time is related to the range,
but usually is much shorter than (n)
 Efficient minimum / maximum search: (log n)
 Efficient delete: (log n)
 Index is designed for a large collection of records
stored on disks, where the disk access time is much
slower than memory access time
Index techniques has many similarities
with BST
 A binary search tree (BST) is a special binary tree, iff
 For each node, assume the node value is K
 The values of the nodes in its left subtree are < K
 The values of the nodes in its right subtree are ≥ K

4
Index techniques
 Two different techniques:
 B trees
 B+ trees
 Why not adopt a binary search tree (BST) for index ?
 A BST may not be balanced
• E.g., One subtree has many nodes, while the
other has a few nodes, poor performance
 The depth of a balanced BST is still large
• Need about log2 n searches, possible log2 n
times of disk accesses, while a disk access is
very time-consuming
B tree
 B tree is a Balanced tree, has following
properties:
 The root is either a leaf or has at least two
children.
 Each node, except for the root and the leaves,
has between m/2 and m children.
• m usually is very large, e.g., m=100
 All leaves are at the same level in the tree, so
the tree is always height balanced
B tree example: 2-3 tree, i.e., m = 3
 Each internal nodes in a 2-3 tree has 2 or 3 children
 A node contains one or two keys
 All leaves are at the same level
 The 2-3 Tree has a property analogous to the BST:
 left subtree < 1st key;
 1st key ≤ mid subtree < 2nd key;
 right subtree ≥ 2nd key
2-3 Tree
 The advantage of the 2-3 Tree over the BST is
that it can be updated at low cost, e.g., insert 14
2-3 Tree Insertion, insert 55
 Split the node has keys 50 and 52, and
 Promote the median of 50, 52, 55, i.e., 52, to its parent
2-3 Tree Insertion, insert 19

 Split the node has 20,21, promote 20, to node has 23, 30
 Then, split node has 23,30, promote 23 to root has 18, 33
2-3 Tree Insertion, insert 19
 Split the root has 18, 33 due to the insertion of 23,
and
 promote 23, by creating a new root
 The tree height increase by 1
 But all leaves at the same level
Node deletion in B tree

 The process of node deletion is similar to that


in BST
 Complicated
B+-Trees
 The most commonly implemented form of the B-
Tree is the B+-Tree.
 Internal nodes of the B+-Tree do not store record
-- only key values to guild the search.
 Leaf nodes store records or pointers to records.
 A leaf node can store no more than m+1 records
 B+ tree supports  time to search the
previous or next record, of a given record

13
B+-Tree Example with order m=4
 Each internal node should have from
m/2 =2 to m=4 children
 A leaf has no more than m+1= 5 records,
but at least (m+1)/2=3 records
 Nodes in the same level are linked in
order

14
B+-Tree Insertion

 Insert 55
 Similar the insertion in B tree
B+-Tree Deletion (1)-delete 18
 Just remove key 18 from its leaf node
B+-Tree Deletion (2)-delete 12
 Borrow one node 18 from its sibling to
make it at least 3 nodes
B+-Tree Deletion-delete 33
 Node having 33,45,47 cannot borrow from its
siblings, merge with its one sibling node 48,50,52
 Node 48 has one less child, borrow one child from
its sibling node having 18,23, modify guide keys
B-Tree Space Analysis (1)
 Asymptotic cost of search, insertion, and deletion
of nodes from B-Trees is (log n).
 Base of the log is the (average) branching
factor of the tree.
 Example: Consider a B+-Tree of order 100 with leaf nodes
containing m=100 records.
 1 level B+-tree: Min 0, Max 100
 2 level: Min: 2 leaves of 50 (100 records). Max: 100
leaves with 100 (10,000 records).
 3 level: Min 2 x 50 nodes of leaves, for 5000 records.
Max: 1003 = 1,000,000 records.
 4 level: Min: 250,000 records (2 * 50 * 50 * 50). Max:
1004 = 100 million records.
19
Advanced Tree Structures

 13.1 Tries
 object space decomposition in regular BST
 key space decmoposition in Tries
 Balanced Trees
 AVL Tree: based on BST, rotations during
insertion and deletion (2 kinds of rotations)
 Splay Tree: rotation opertations during insertion,
deletion, and searching. (3 kinds of roations)
Advanced Tree Structures II

 Spacial data structures


 more than one key.
 in a plane
 k-d tree
 PR quadtree
Homework 6
 7.2, 7.3, 7.4
 7.11, 7.19

 Deadline: Dec. 24

You might also like