Unit 4
Unit 4
NOTES
Subject Name: Data Structures & Analysis of Algorithms Subject Code: KCA-205 Semester: II
UNIT-4
Basic terminology used with Tree, Binary Trees, Binary Tree Representation: Array Representation and
Pointer (Linked List) Representation, Binary Search Tree, Complete Binary Tree, A Extended Binary
Trees, Tree Traversal algorithms: Inorder, Preorder and Postorder,
Constructing Binary Tree from given Tree Traversal, Operation of Insertion, Deletion, Searching &
Modification of data in Binary Search Tree. Threaded Binary trees, Huffman coding using Binary Tree,
AVL Tree and B Tree.
B C
B
C
D E
(a)
(b)
A B C
AB C D E
In array, the elements of the binary are placed in the array according to their number assigned. The
array starts indexing from 1.The main drawback of array representation is wasteful of memory when
there are many missing elements.
n
The binary tree with n elements requires array size up to 2 . Suppose array positions indexing from
n
0, then array size reduces to 2 -1. The right skewed binary trees have maximum waste of space. The
following right skewed binary tree’s array representation is shown as follows.
A
A B C D
0 1 2 3 4 5 6 7 8 9 101112131415
B
Linked list Representation: The most popular way to represent a binary tree is by using links or
pointers. The node structure used in this representation consists of two pointers and an element for each
node. The node structure is given as:
Binary tree traversals: There are four ways to traverse a binary tree. They are
(a) Pre order (b) In order ( c ) Post order (d) Level order
The first three traversals are performed using recursive approach and are done using linked list
scheme. In these, the left sub tree is visited before visiting right sub tree. The difference among these
is position of visiting the node.
(a) Pre order:
i) Visit Root node. ii) Visit Left sub tree. iii) Visit Right Sub tree.
(b) In order:
i) Visit Left sub tree. ii) Visit Root node. iii) Visit Right Sub tree.
(c) Post order:
i) Visit Left sub tree. ii) Visit Root node. iii) Visit Right Sub tree
The following is an example binary tree with pre order, in order, post order and level order traversals:
20 pre order is : 20 15 12 18 25 22 32
in order is : 12 15 18 20 22 25 32
15 25
post order is : 12 18 15 22 32 25 20
12 18 22 32
level order is : 20 15 25 12 18 22 32
(a)
Tree Traversal
Traversal is a process to visit all the nodes of a tree and may print their values too. Because, all nodes are
connected via edges (links) we always start from the root (head) node. That is, we cannot randomly
access a node in a tree. There are three ways which we use to traverse a tree −
In-order Traversal
Pre-order Traversal
Post-order Traversal
Generally, we traverse a tree to search or locate a given item or key in the tree or to print all the values it
contains.
In-order Traversal
In this traversal method, the left subtree is visited first, then the root and later the right sub-tree. We
should always remember that every node may represent a subtree itself.
If a binary tree is traversed in-order, the output will produce sorted key values in an ascending order.
We start from A, and following in-order traversal, we move to its left subtree B. B is also traversed in-
order. The process goes on until all the nodes are visited. The output of inorder traversal of this tree will
be −
D→B→E→A→F→C→G
Algorithm
Pre-order Traversal
In this traversal method, the root node is visited first, then the left subtree and finally the right subtree.
We start from A, and following pre-order traversal, we first visit A itself and then move to its left
subtree B. B is also traversed pre-order. The process goes on until all the nodes are visited. The output of
pre-order traversal of this tree will be −
A→B→D→E→C→F→G
Algorithm
Post-order Traversal
In this traversal method, the root node is visited last, hence the name. First we traverse the left subtree,
then the right subtree and finally the root node.
We start from A, and following Post-order traversal, we first visit the left subtree B. B is also traversed
post-order. The process goes on until all the nodes are visited. The output of post-order traversal of this
tree will be −
D→E→B→F→G→C→A
Algorithm
Tree Construction
We are given the Inorder and Preorder traversals of a binary tree. The goal is to construct a tree
from given traversals.
Inorder traversal − In this type of tree traversal, a left subtree is visited first, followed by the
node and right subtree in the end.
Inorder (tree root)
Traverse left subtree of node pointed by root, call inorder ( root→left )
Visit the root
Traverse right subtree of node pointed by root, call inorder ( root→right )
Preorder traversal − In this type of tree traversal, the node visited first, followed by the left
subtree and right subtree in the end.
Preorder (tree root)
Inorder
2-3-4-5-6-8-10
Preorder
4-3-2-5-8-6-10
Now we’ll construct the above tree again for given preorder and inorder traversals.
Inorder
2-3-4-5-6-8-10
Preorder
5-3-2-4-8-6-10
As we know that preorder visits the root node first then the first value always represents
the root of the tree. From above sequence 5 is the root of the tree.
Preorder
5 -3-2-4-8-6-10
From above inorder traversal, we know that a node’s left subtree is traversed before it
followed by its right subtree. Therefore, all values to the left of 5 in inorder belong to its
left subtree and all values to the right belong to its right subtree.
Inorder
2-3-4 ← 5 → 6-8-10
20 30
15 25 5 40
12 18 22 2
(a) (b)
The number inside a node is the element key. The tree ( a ) is not a binary search tree because the right
sub tree of element 25 violating property 4.It means 22 is smaller than its parent 25. The trees of ( b )
is not a binary search tree.
When the property all keys are distinct is removed, then property 2 is replaced by smaller or equal and
property 3 is replaced by larger or equal. The resulting tree is called a binary search tree with duplicates.
(i) Histogramming (ii) Best fit bin packaging iii) Crossing Distribution
In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: If
A is a parent node of B then the key of node A is ordered with respect to the key of node B with the
same ordering applying across the heap. A heap can be classified further as either a "max heap" or a
"min heap". In a max heap, the keys of parent nodes are always greater than or equal to those of the
children and the highest key is in the root node. In a min heap, the keys of parent nodes are less than or
equal to those of the children and the lowest key is in the root node. Heaps are crucial in several
efficient graph algorithms such as Dijkstra's algorithm, and in the sorting algorithm heap sort. A
common implementation of a heap is the binary heap, in which the tree is a complete binary tree
(see figure).
In a heap, the highest (or lowest) priority element is always stored at the root, hence the name heap. A
heap is not a sorted structure and can be regarded as partially ordered. As visible from the heap-diagram,
there is no particular relationship among nodes on any given level, even among the siblings. When a
heap is a complete binary tree, it has a smallest possible height—a heap with N nodes always has log N
height. A heap is a useful data structure when you need to remove the object with the highest (or lowest)
priority.
Note that, as shown in the graphic, there is no implied ordering between siblings or cousins and
no implied sequence for an in-order traversal (as there would be in, e.g., a binary search tree). The heap
relation mentioned above applies only between nodes and their parents, grandparents, etc. The
maximum number of children each node can have depends on the type of heap, but in many types it is
at most two, which is known as a binary heap.
The heap is one maximally efficient implementation of an abstract data type called a priority
queue, and in fact priority queues are often referred to as "heaps", regardless of how they may be
implemented. Note that despite the similarity of the name "heap" to "stack" and "queue", the latter
two are abstract data types, while a heap is a specific data structure, and "priority queue" is the proper
term for the abstract data type.
A heap data structure should not be confused with the heap which is a common name for the
pool of memory from which dynamically allocated memory is allocated. The term was originally
used only for the data structure.
10 80
5 20 30 40 50 60 70 82 84 86 88
2 3 4 90 92 94 96 98
32 36
3.9.1.2 insertion: To insert an element with key, first search the tree. If it doesn’t contain the key, then
insert it. To insert the key 31, the search begins at root, then goes to middle child of root, then third
child of this node, then first child of this node which is the external node. Since the node [32,36] can ho
ld up to 6 elements, the new element is inserted as first element of this node.
Another example is insert an element with key 65, the search terminates at sixth child of middle
child of the root. The new element is created and inserted into there.
3.9.1.3 deletion: To delete an element with key, first search for it. If it is there, then delete it.(i) If the
deleted element is 20 in fig above, search for it. The searching ends at first element of the middle child of
the root. Since it’s childs c0 & c1 are 0, it can be deleted easily and results a node [30,40,50,60,70].
(ii) To delete 84, search ends at second element in third child of the root. Since it’s childs c1 & c2 are
0, it can be deleted easily by resulting [82,86,88].
(iii) To delete an element with key 5, more work to be done. It has a nonnull c0 and c1 is external
node. The largest key in the c0 is brought to the deleted node place.
(iv) To delete an element with key 10, the root take either largest in its c0 or smallest in c1. Suppose 5
of c0 was brought to the root, 4 in the c0 of deleted node 5 is brought to 5’s old place.
3.9.1.4 Height: An m- way search tree of height h may have as few as h elements and as many as mh-1.
This upper bound is obtained from the levels 1 through h -1 has exactly m children and nodes at level h
have no children. Since each of these nodes has m-1 elements, the number of elements is m h-1. An
200 way search tree of height 5 can hold 32 * 10 10 -1 elements but might fold as few as 5 only.
3.9.2 B – trees: A B – tree of order m is an m – way search tree. If the B – tree is not empty,
the corresponding extended tree must satisfy the following properties:
(i) The root has at least two children.
(ii) All internal nodes other than the root have at least [m/2] children.
(iii) All external nodes are at the same level.
(i) all the external nodes are not on the same level.
(ii) Some of the internal nodes have two (= node [5]) and three (= node 32,36]) children
nd
which is not satisfying 2 property i.e [7/2]=4 children.
The following is an B-tree of order 7:
10 80
2 4 6 20 30 40 50 60 70 82 84 86 88
In a B-tree of order 2, all the internal nodes have exactly two children. This requirement coupled with
all external nodes on the same level results full binary trees.
In a B-tree of order 3, internal nodes have either two or three children. It is also called 2 -3 tree.
In a B- tree of order 4, internal nodes have two, three or four children. These are also referred as 2-3-4
trees and are called 2-4 trees. The following is an 2-3 tree. It becomes 2-3-4 tree when adding 14 and
16 to left child of 20.
3.9.2.1 Height of a B – tree: Let T be a B – tree of order m and height be h. Let m=[d/2] and n
be number of elements in T.
h-1 h
(a) 2d -1 ≤ n ≤ m -1
(b) log m (n+1) ≤ h ≤ log d (n+1/2)+1
3.9.2.1 Searching: The searching an element in a B-tree is same as algorithm used for m-way search
tree. Searching an element in an internal node of a B-tree of height takes at most h because all internal
nodes need to be checked during the search.
3.9.2.2 Insertion: To insert an element, first search for the presence of the element with same key. If
such an element is found, insertion fails because duplicates are allowed. When searching is
unsuccessful, then insert new element into the last internal node encountered on the search path. To
insert an element with key 3 into Fig 2, search terminates at second child of left child of the root. It can
be inserted into [2,4,6] node results [2,3,4,6] node since this node hold up to 6 elements. The number of
disk accesses to do this is 3 in which two accesses for reading root and then its left child and another
for writing out modified node after insertion. It can be shown as follows.
To insert an element with key 25 in B-tree of order 7 (Fig 2 ), the element goes into middle
child of the root i.e [20,30,40,50,60,70] but this node is full. When element goes to full node, the
overfull node need to be split as follows.
Let the overfull node be P=[20,25,30,40,50,60,70]. Let it has m elements and m+1 children. It can
be denoted as m, c0, (e1, c1), (e2, c2),………, (em, cm).
where ei ‘s indicate elements and ci’s represent children pointers. The node is spli around ed
where d=[m/2].
The elements to the left remain in P and to the right move into a new node Q but P & Q must contain
at least [m/2] children.
The element ed moved to the parent of P. The format of P and Q are
P: d-1, c0, (e1,c1), …….., (ed-1,cd-1)
Q: m-d, cd, (ed+1,cd+1),……., (em,cm).
In this case, the overfull node is 7,0,(20,0),(25,0),(30,0),(40,0),(50,0),(60,0),(70,0). It can be split
around d=4 which yields P=3,0,(20,0),25,0),30,0) and Q=3,0,(50,0),(60,0),(70,0). The e4=40 moved to
P’s parent. Here, it is the root. It can be shown as follows. The number of disk accesses required is 5 in
which two for searching the proper position in the tree, two for writing out the split nodes and one for
writing modified root.
To insert element with key 44 into B – tree of order 3 like Fig 3 (c ), the element goes to [35,40]
node. Since it is full, the overfull node is [35,40,44] can be represented as 3,0,(35,0),(40,0),(44,0). It
can be split around d=[3/2]=2 yields
P=1,0,(35,0) and Q=1,0,(44,0). The element with key 40 move to P’s parent
A =[50,60]. The resulted overfull node be 3,P,(40,Q),(50,C),(60,D) where C & D are pointers to
the nodes [55] & [60]. The overfull node A be split to create a new node B. The new A & B are
A: 1,P,(40,Q) and B: 1, C, (60,D) .
Before insertion, root format is R: 2, S, (30,A),(80,T) where S & T are first and third sub trees of the
root. After insertion, the overfull node is R: 3, S, (30,A), (50,B),(80,T). This node is split around
d=[3/2]=2 yields R: 1,S,(30,A) and U= 1,B, (80,T). The element 50 moved to R’s parent. Since R has
no parent, it can be created as new root and that has format 1,R, (50,U). The resulting tree is shown as
below.
The total number of disk accesses is 10 in which 3 accesses for reading [30,80],[50,60]
and [30,40], six disk accesses for writing out 3 split nodes and one for writing out new root.
When insertion cause s nodes to split, the number of disk accesses is h ( reading the nodes on the
search path ) + 2s ( to write out two split parts of each node that is split ) + 1 ( to write out new root).
The total number of disk accesses is h+2s+1 which is at most 3h+1.
3.9.2.3 Deletion: Deletion first divided into 2 cases. (1) The element to be deleted in a node whose
children are external nodes. (2) the element to be deleted from a non leaf. case (2) is transformed
into case (1) by either largest element in its left neighboring sub tree or smallest element in its right
neighboring sub tree.
(i) To delete an element with key 80 in Fig 3 (a), the suitable replacement used is either the
largest element in its left sub tree 70 or smallest element 82 in its right sub tree.
(ii) To delete an element with key 80 in Fig 3 (c), the replacing element used is either 70 or 82. If 82
is selected, the problem of deleting 82 from the leaf remains.
The (ii) falls into 2 cases. One is delete an element from a leaf contains more than the
minimum number of elements (1 if the leaf is also root and [m/2]-1 if it is not) requires to simply write
out modified node.
The deletion from a B-tree of height h is when merging tales at levels h,h-1,…..and 3 and getting
an element from a nearest sibling at level 2 is 3h.
Note: when the element size is large relative to the size of a key, the following node structure is used.
s,c0,(k1,c1,p1),(k2,c2,p2),…….,(ks, cs, ps) where s is the number of elements in the node, k i ’s are element
keys, pi’s are the disk locations of the corresponding elements and ci’s are children pointers.
Exercise 1: Draw the B-tree of order 7 resulting from inserting the following keys into an empty tree T:
4,40,23,50,11,34,62,78,66,22,90,59,25,72,64,77,39 & 12.
Step 1: Since it is a B-tree of order 7, the maximum number of elements a node contain is 6.
4 11 23 34 40 50
Step 2: Next element to be inserted is 62, but this is full because the maximum number of childre n
that internal node have is 7 and minimum number of children is 4[=7/2].
The overfull node is P= [4,11, 23,34, 40,50,62]. It can be split around e4=34. The elements to left
are remain in P and to the right in Q. The element e4 goes to the node parent.
34
4 11 23 40 50 62
34
4 11 23 40 50 62 78
Step 4: The next element inserted is 66, it goes into root right child.
34
4 11 23 40 50 62 66 78
Step 5: The next elements 22 & 90 goes into root left child and root right child respectively. Now,
root right child is full. If any element insert into it needs split.
34
4 11 22 23 40 50 62 66 78 90
Step 6: The next element 59 goes into root right child and it becomes overfull node. This needs to
be split.
Let C= [40,50,59,62,66,78,90]. It can be split around e4=62 leaves C=[40,50,59] and D=[66,78,90].
The element 62 moves to the parent. Now, the root is 34,62]. Its Childs are P,C & D.
34 62
4 11 22 23 40 50 59 66 78 90
Step 7: The elements 25, 72,64,77,39 & 12 are inserted in which 25 & 12 are in root first sub tree, 39 to
root middle sub tree and 64,72 & 77 to root third sub tree.
34 62
4 11 12 22 23 25 39 40 50 59 64 66 72 77 78 90
Huffman coding
is a lossless data compression algorithm. The idea is to assign variable-length codes to input
characters, lengths of the assigned codes are based on the frequencies of corresponding characters.
The most frequent character gets the smallest code and the least frequent character gets the largest
code.
The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit
sequences) are assigned in such a way that the code assigned to one character is not the prefix of code
assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity
when decoding the generated bitstream.
Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d, and
their corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity because
code assigned to c is the prefix of codes assigned to a and b. If the compressed bit stream is 0001, the
de-compressed output may be “cccd” or “ccb” or “acd” or “ab”.
See this for applications of Huffman Coding.
There are mainly two major parts in Huffman Coding
1. Build a Huffman Tree from input characters.
2. Traverse the Huffman Tree and assign codes to characters.
Steps to build Huffman Tree
Input is an array of unique characters along with their frequency of occurrences and output is
Huffman Tree.
1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min
Heap is used as a priority queue. The value of frequency field is used to compare two
nodes in min heap. Initially, the least frequent character is at root)
2. Extract two nodes with the minimum frequency from the min heap.
3. Create a new internal node with a frequency equal to the sum of the two nodes frequencies.
Make the first extracted node as its left child and the other extracted node as its right child.
Add this node to the min heap.
4. Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the
root node and the tree is complete.
Let us understand the algorithm with an example:
character Frequency
a 5
b 9
c 12
d 13
e 16
f 45
Step 1. Build a min heap that contains 6 nodes where each node represents root of a tree with single
node.
Step 2 Extract two minimum frequency nodes from min heap. Add a new internal node with
frequency 5 + 9 = 14.
Now min heap contains 5 nodes where 4 nodes are roots of trees with single element each, and one
heap node is root of tree with 3 elements
character Frequency
c 12
d 13
Internal Node 14
e 16
f 45
Step 3: Extract two minimum frequency nodes from heap. Add a new internal node with frequency
12 + 13 = 25
Now min heap contains 4 nodes where 2 nodes are roots of trees with single element each, and two
heap nodes are root of tree with more than one nodes
character Frequency
Internal Node 14
e 16
Internal Node 25
f 45
Step 4: Extract two minimum frequency nodes. Add a new internal node with frequency 14 + 16 = 30