08 - Search Trees
08 - Search Trees
Programmazione
SEARCH TREES
Goal
This lecture aims at presenting an ADT suitable for
searching: Search Tree, the related operations,
and the visiting techniques.
Prerequisites
Lectures:
◦ Introduction to ADT
◦ Trees
Outline
Search binary trees
Other types of Trees:
◦ HBT
◦ B tree
◦ Variants of B tree
Outline
Search binary trees
Other types of Trees:
◦ HBT
◦ B tree
◦ Variants of B tree
Search binary trees
The Binary Search Tree (BST) are binary trees in which an x element of the set is stored in each
node.
In particular, for each node:
◦ all items stored in its left subtree have a key < x.key
◦ all items stored in its right subtree have a key > x.key
10 15
14 18
5 14
5
7 12 18
10
15 7 12
BST as ADT
The most significant operations that can be defined on BSTs are the same already introduced for
dictionaries.
Search
The constructive characteristic of a BST facilitates search operations: to check if x belongs to the
set, x.key is compared with the key of the element r stored in the root.
You can proceed as follows:
◦ if x.key <r.key: x can only be found in the left subtree of the root
◦ if x.key = r.key: x coincides with the root
◦ if x.key> r.key: x can only be found in the right subtree of the root
◦ iteratively, the test is repeated on the right or left subtree
Search
The constructive characteristic of a BST facilitates search operations: to check if x belongs to the
set, x.key is compared with the key of the element r stored in the root.
You can proceed as follows:
◦ if x.key <r.key: x can only be found in the left subtree of the root
◦ if x.key = r.key: x coincides with the root
◦ if x.key> r.key: x can only be found in the right subtree of the root
◦ iteratively, the test is repeated on the right or left subtree
Searching a given element involves at most a number of comparisons equal to the height of the
tree
Search
The minimum is always in the leftmost element
of the BST 10
The maximum is always in the rightmost
element of the BST
5 14
7 12 18
15
Search
A possible implementation in C could be the following:
search(r,key){
if (r == null)
return(ERROR)
if (key < r.key)
return(search(r.left,key))
if (key > r.key)
return(search(r.right,key))
return(r)
}
Insertion
Requires the search()function
If the key to be entered does not yet exist, a new node is created and attached as a child of the
last node visited by search().
Example
Inserting element x with key 10
12
6 14
5 11 15
4 8
3 7
Example
12 Comparison: x <> t.key ?
6 14
5 11 15
4 8
3 7
Example
12 x < t.key
6 14
5 11 15
4 8
3 7
Example
12 x < t.key
5 11 15
4 8
3 7
Example
12 x < t.key
x > t.key 6 14
5 11 15
4 8
3 7
Example
12 x < t.key
x > t.key 6 14
5 11 15
Comparison: x <> t.key ?
4 8
3 7
Example
12 x < t.key
x > t.key 6 14
5 11 15
x < t.key
4 8
3 7
Example
12 x < t.key
x > t.key 6 14
5 11 15
x < t.key
4 8
Comparison: x <> t.key ?
3 7
Example
12 x < t.key
x > t.key 6 14
5 11 15
x < t.key
4 8
x > t.key
3 7
Example
12 x < t.key
x > t.key 6 14
5 11 15
x < t.key
4 8 x > t.key
t = NIL
3 7
Example
12 x < t.key
x > t.key 6 14
5 11 15
x < t.key
4 8 x > t.key
3 7 10
Unbalanced tree
Random insertion into a BST can make the tree unbalanced.
In particular, if the insertions are made in ascending or descending order for the key, the tree is
transformed into a list.
Unbalanced tree
Insertion order: 7 6 5 4 3 2 1 7
As a result, different records can have very
different access times 6
1
Unbalanced tree: Complexity
In the worst case of a tree of n records, 7
completely unbalanced, i.e., a linear list, the
number of comparisons required for each
search is O(n). 6
This complexity can only be reduced by 5
ensuring that the tree is always fully balanced.
In this case, in fact, the number of comparisons
at most is equal to the height of the tree, 4
namely: O(log2 (n + 1)).
3
1
Unbalanced tree: Cost
Making balanced a generic tree can be very expensive, as moving one element can involve
moving all the others.
E D
C G B E
B D A C G
A
Unbalanced tree: Cost
It should be ensured that the time saved for search operations thanks to the balance tree is
greater than the overhead necessary to guarantee the balance itself.
On a practical level, it is convenient to reorganize the tree after each insertion to ensure the
balance only if the frequency of the insertions is much lower than the frequency of searches.
It can be proved that assuming that all the key values have the same probability, for large values
of n, the average complexity of the search operation in which the re-balancing is not done is
O(1.4 log2 n).
Deletion
There are three cases:
◦ the element to be deleted has no children
◦ the element to be deleted has a child
◦ the element to be deleted has two children.
Deletion
The element to be deleted has no children:
◦ no problem: it is possible to delete it immediately. 10
5 14
7 12 18
15
Deletion
The element to be deleted has no children:
◦ no problem: it is possible to delete it immediately. 10
5 14
7 12 18
15
Deletion
The element to be deleted has no children:
◦ no problem: it is possible to delete it immediately. 10
5 14
12 18
15
Deletion
The element to be deleted has a child:
◦ delete the element and replace it with the only 10
child. In practice, it means storing the pointer to
the only child in the parent node.
5 14
7 12 18
15
Deletion
The element to be deleted has a child:
◦ delete the element and replace it with the only 10
child. In practice, it means storing the pointer to
the only child in the parent node.
5 14
7 12 18
15
Deletion
The element to be deleted has a child:
◦ delete the element and replace it with the only 10
child. In practice, it means storing the pointer to
the only child in the parent node.
5 14
7 12 18
15
Deletion
The element to be deleted has two children:
◦ the element is replaced either by the next lower 10
element (located in the rightmost node of the
left subtree) or next higher element (located in
the leftmost node of the right subtree)
5 14
7 12 18
15
Deletion
The element to be deleted has two children:
◦ the element is replaced either by the next lower 10
element (located in the rightmost node of the
left subtree) or next higher element (located in
the leftmost node of the right subtree)
5 14
7 12 18
15
Deletion
10 7
10 12
5 14 5 14 5 14
5 14
12 18 7 18
7 12 18
7 12 18
15 15
15
15
Outline
Search binary trees
Other types of Trees:
◦ HBT
◦ B tree
◦ Variants of B tree
Other types of trees
As previously noted, rebalancing a BST can involve moving all nodes of the tree.
Other types of trees have been defined in the literature, such as:
◦ M-Way Search Tree
◦ B-Tree
These solutions are characterized by the property that any operations necessary for the
rebalancing are "local", i.e., limited to the path from the node concerned to the root.
As a result, for these trees, all the search, insertion, and deletion operations have, at most,
complexity O (log n).
Examples
M-way search tree
A binary search tree has one value in each node and two subtrees. This notion easily generalizes
to an M-way search tree, which has (M-1) values per node and M subtrees. M is called the
degree of the tree. A binary search tree, therefore, has degree 2.
In an M-way subtree a node can have anywhere from 1 to (M-1) values, and the number of (non-
empty) subtrees can range from 0 (for a leaf) to 1+(the number of values)
Examples
M-way search tree
For example, here is a 3-way search tree:
10 44
● ● ●
3 7 22 55 70
● ● ● ● ● ● ● ● ●
50 66 68
● ● ● ● ● ●
Examples
B-tree
B-trees are Perfectly Height-balanced M-way search trees. B-trees can be considered as a
generalizations of binary search trees in that they can have a variable number of subtrees at
each node.
Since child-nodes have a pre-defined range, they will not necessarily be filled with data,
meaning B-trees can potentially waste some space.
Due to the variable range of their node length, B-trees are optimized for systems that read large
blocks of data, they are also commonly used in databases.
The time complexity for searching a B-tree is O(log n).
Examples
B-tree
For example, here is a 3-way B-tree:
50
● ● ●
10 66
● ● ● ● ● ●
3 7 22 44 55 68 70
● ● ● ● ● ● ● ● ● ● ● ●
Examples
B-tree
And here is a 5-way B-tree (each node other than the root must contain between 2 and 4
values):
10 50
● ● ● ● ●
3 7 22 44 55 66 68 70
● ● ● ● ● ● ● ● ●
References
◦ A.V. Aho, J.E. Hopcroft, J.D. Ullman:
“Data Structures and Algorithms,”
Addison Wesley, Reading MA (USA), 1983
pp. 155-197
◦ G. Ausiello, A. Marchetti-Spaccamela, M. Protasi:
“Teoria e Progetto di Algoritmi Fondamentali,”
Ed. Franco Angeli, Milano, 1985, pp. 186-220
◦ D. Comer:
“The ubiquitous B-Treee,”
Computing Surveys, Vol. 11, No. 2, June 1979,
pp. 121-137
References
◦ G.H. Gonnet:
“Handbook of Algorithms and Data Structures,”
Addison Wesley, Reading MA (USA), 1984, pp. 69-117
◦ R. Sedgewick:
“Algorithms in C,”
Addison Wesley, Reading MA (USA), 1990
pp. 215-230, 245-258
◦ C.J. Van Wyk:
“Data Structures and C Programs,” Addison Wesley, Reading MA
(USA), 1988
pp. 193-224
References
◦ M.A. Weiss:
“Data Structures and Algorithm Analysis,”
The Benjamin/Cummings Publishing Company, Redwood City,
CA (USA), 1992, pp. 98-146
◦ N. Wirth:
“Algorithms + Data Structures = Programs,” Prentice Hall,
Englewood Cliffs NJ (USA), 1976
pp. 169-263
◦ R.J. Wilson:
“Introduzione alla teoria dei grafi,”
Cremonese, Roma 1978, pp. 57-76
Riferimenti
Queste slide sono una rielaborazione del materiale realizzato dal prof. P. Prinetto per il corso di
Algoritmo e Programmazione A.A. 2020/2021 presso il Politecnico di Torino.