unit 4 DS
unit 4 DS
Tree
Other data structures such as arrays, linked lists, stacks, and queues are linear data structures that store
data sequentially. In order to perform any operation in a linear data structure, the time complexity
increases with the increase in the data size. But it is not acceptable in today's computational world.
Different tree data structures allow quicker and easier access to the data as it is a non-linear data
structure.
Trees are used to represent a wide range of structures, such as file systems, organization charts, family
trees, and data structures such as heaps, tries, and search trees. The hierarchical structure of a tree
allows for efficient searching, sorting, and insertion operations, making them a fundamental data
structure in computer science.
Tree Terminologies
In data structures, a tree is a data structure that represents a hierarchical structure. Each node in a tree
has a parent node and zero or more child nodes. Here are some common terminologies related to trees
in data structures:
Node: A single element in a tree, which contains data and references to its child nodes and parent
node (except the root node). The last nodes of each path are called leaf nodes or external nodes
that do not contain a link/pointer to child nodes.
Edge: The edges of a tree are known as branches. Elements of trees are called their nodes. The
nodes without child nodes are called leaf nodes. A tree with 'n' vertices has 'n-1' edges.
NODES AND EDGES OF A TREE
Leaf: A node that has no child nodes. Leaves are the last nodes on a tree. They are nodes without
children. Like real trees, we have the root, branches, and finally leaves.
Siblings: Two nodes connected to the same node which are the same distance from the root
vertexin a rooted tree are called siblings or in other words, nodes that have the same parent are
calledsiblings.
Cousins: The nodes belonging to the same level with different parent nodes.
Depth: In a tree, many edges from the root node to the particular node are called the depth of
the tree. In the tree, the total number of edges from the root node to the leaf node in the longest
path is known as the "Depth of the Tree". In the tree data structures, the depth of the root node
is 0.
Height: The height of a node is the number of edges from the node to the deepest leaf (ie. the
longest path from the node to a leaf node) or the number of edges from the given node to the
deepest leaf node in its subtree.
Degree of a Node: The degree of a node is the total number of branches of that node.
Forest: A collection of disjoint trees is called a forest. You can create a forest by cutting the root
of a tree.
Binary Tree: A tree where each node has at most two child nodes.
Binary Search Tree (BST): A binary tree where the left child of a node has a value less than or
equal to its parent node, and the right child of a node has a value greater than or equal to its
parent node.
AVL Tree: The height of a tree is defined as the number of edges on the longest path from the
root to a leaf node. In an AVL tree, the difference in height between the left and right subtrees
of any node is at most one. This property is called the balance factor. If the balance factor is
greater than one or less than a negative one, then the tree is unbalanced, and the AVL tree
performs a rotation operation to rebalance the tree.
Red-Black Tree: A self-balancing binary search tree that maintains a balance between the depth
of its left and right subtrees by ensuring that the paths from the root to any leaf node have the
same number of black nodes.
Applications of Tree:
File Systems: The file system of a computer is often represented as a tree. Each folder or directory is a
node in the tree, and files are the leaves.
XML Parsing: Trees are used to parse and process XML documents. An XML document can be thought
of as a tree, with elements as nodes and attributes as properties of the nodes.
Database Indexing: Many databases use trees to index their data. The B-tree and its variations are
commonly used for this purpose.
Compiler Design: The syntax of programming languages is often defined using a tree structure called a
parse tree. This is used by compilers to understand the structure of the code and generate machine
code from it.
Artificial Intelligence: Decision trees are often used in artificial intelligence to make decisions based on
a series of criteria.
Real-Time Applications of Tree:
Databases use tree data structure for indexing.
Tree data structure is used in file directory management.
DNS uses tree data structure.
Trees are used in several games like moves in chess.
Decision-based algorithms in machine learning uses tree algorithms.
Advantages of Tree:
Efficient searching: Trees are particularly efficient for searching and retrieving data. The time
complexity of searching in a tree is typically O(log n), which means that it is very fast even for very
large data sets.
Flexible size: Trees can grow or shrink dynamically depending on the number of nodes that are added
or removed. This makes them particularly useful for applications where the data size may change over
time.
Easy to traverse: Traversing a tree is a simple operation, and it can be done in several different ways
depending on the requirements of the application. This makes it easy to retrieve and process data
from a tree structure.
Easy to maintain: Trees are easy to maintain because they enforce a strict hierarchy and relationship
between nodes. This makes it easy to add, remove, or modify nodes without affecting the rest of the
tree structure.
Natural organization: Trees have a natural hierarchical organization that can be used to represent
many types of relationships. This makes them particularly useful for representing things like file
systems, organizational structures, and taxonomies.
Fast insertion and deletion: Inserting and deleting nodes in a tree can be done in O(log n) time, which
means that it is very fast even for very large trees.
Disadvantages of Tree:
Memory overhead: Trees can require a significant amount of memory to store, especially if they are
very large. This can be a problem for applications that have limited memory resources.
Imbalanced trees: If a tree is not balanced, it can result in uneven search times. This can be a problem
in applications where speed is critical.
Complexity: Trees can be complex data structures, and they can be difficult to understand and
implement correctly. This can be a problem for developers who are not familiar with them.
Limited flexibility: While trees are flexible in terms of size and structure, they are not as flexible as
other data structures like hash tables. This can be a problem in applications where the data size may
change frequently.
Inefficient for certain operations: While trees are very efficient for searching and retrieving data, they
are not as efficient for other operations like sorting or grouping. For these types of operations, other
data structures may be more appropriate.
Array Representation: One common way to represent a binary tree as an array is to use a
breadth-first-order traversal and fill the array level by level. The root node is placed in the first
index of the array, and its left and right children are placed in the second and third indices,
respectively. Then, the next level of the tree is filled in from left to right, with each node's children
being placed in consecutive indices. If a node has no child, its corresponding index in the array is
marked as empty (e.g., by using a special value like null or -1).
I’ll use the most convenient one where we traverse each level starting from the root node and
from left to right and mark them with the indices these nodes would belong to. The binary tree
also be implemented using the array data structure. If P is the index of the parent element, then
the left child will be stored in their index (2p) +1, and the right child will be stored in the
index (2p) +2.
And now we can simply make an array of length 7 and store these elements at their
corresponding indices.
We have another method to represent binary trees called the linked representation of binary
trees. Don’t confuse this with the linked lists you have studied. And the reason why I am saying
that is because linked lists are lists that are linear data structures.
You can see how closely this representation resembles a real tree node, unlike the array
representation where all the nodes succumbed to a 2D structure. And now we can very easily
transform the whole tree into its linked representation which is just how we imagined it would
have looked in real life.
Kind of Binary Tree:
A binary tree is a tree data structure in which each node has at most two children, referred to as the
left child and the right child. There are several kinds of binary trees, some of the most common ones
are:
Binary Search Tree:
A binary search tree (BST) is a type of data structure used in computer science and programming.
It is a tree structure where each node has at most two children, and the left subtree of a node
contains only values less than the node's value, while the right subtree of a node contains only
values greater than the node's value.
The key feature of a binary search tree is that it allows for efficient searching, insertion, and
deletion operations, with an average time complexity of O(log n), where n is the number of nodes
in the tree. This is because the structure of the tree ensures that each comparison reduces the
search space by half. The following are the properties of a binary search tree:
Having discussed all the properties, you must now tell me if the above binary tree was a binary
search tree or not. The answer should be no. Since the left subtree of the root node has a single
element that is greater than the root node violating the 3rd property, it is not a binary search tree.
Searching means to find or locate a specific element or node in a data structure. In Binary search tree,
searching a node is easy because elements in BST are stored in a specific order. The steps of searching a
node in Binary Search tree are listed as follows -
1. First, compare the element to be searched with the root element of the tree.
2. If root is matched with the target element, then return the node's location.
3. If it is not matched, then check whether the item is less than the root element, if it is smaller than
the root element, then move to the left subtree.
4. If it is larger than the root element, then move to the right subtree.
5. Repeat the above procedure recursively until the match is found.
6. If the element is not found or not present in the tree, then return NULL.
In a binary search tree, we must delete a node from the tree by keeping in mind that the property of BST
is not violated. To delete a node from BST, there are three possible situations occur:
It is the simplest case to delete a node in BST. Here, we have to replace the leaf node with NULL and
simply free the allocated space.
We can see the process to delete a leaf node from BST in the below image. In below image, suppose we
have to delete node 90, as the node to be deleted is a leaf node, so it will be replaced with NULL, and the
allocated space will free.
In this case, we have to replace the target node with its child, and then delete the child node. It means
that after replacing the target node with its child node, the child node will now contain the value to be
deleted. So, we simply have to replace the child node with NULL and free up the allocated space.
We can see the process of deleting a node with one child from BST in the below image. In the below
image, suppose we have to delete the node 79, as the node to be deleted has only one child, so it will be
replaced with its child 55.
So, the replaced node 79 will now be a leaf node that can be easily deleted.
When the node to be deleted has two children
This case of deleting a node in BST is a bit complex among other two cases. In such a case, the steps to be
followed are listed as follows -
The inorder successor is required when the right child of the node is not empty. We can obtain the
inorder successor by finding the minimum element in the right child of the node.
We can see the process of deleting a node with two children from BST in the below image. In the below
image, suppose we have to delete node 45 that is the root node, as the node to be deleted has two
children, so it will be replaced with its inorder successor. Now, node 45 will be at the leaf of the tree so
that it can be deleted easily.
Insertion in Binary Search tree
A new key in BST is always inserted at the leaf. To insert an element in BST, we have to start searching
from the root node; if the node to be inserted is less than the root node, then search for an empty
location in the left subtree. Else, search for the empty location in the right subtree and insert the data.
Insert in BST is similar to searching, as we always have to maintain the rule that the left subtree is smaller
than the root, and right subtree is larger than the root.
Now, let's see the process of inserting a node into BST using an example.
The complexity of the Binary Search tree: 'n' is the number of nodes in the given tree.
1. Time Complexity
Operations Best case time Average case time Worst case time
complexity complexity complexity
Insertion O(n)
Deletion O(n)
Search O(n)
o Every non-leaf node in a strictly binary tree has exactly two children.
o All leaf nodes in a strictly binary tree are at the same level.
o The number of nodes in a strictly binary tree with height h is (2^(h+1)) - 1.
o The height of a strictly binary tree with n nodes is n - 1.
Strictly binary trees are commonly used in computer science and data structures. They provide
an efficient way to store and retrieve data and can be used for a variety of applications, such as
representing hierarchical data structures or searching for data in sorted order.
o Every level of the tree, except possibly the last one, is completely filled with nodes.
o All nodes on the last level are as far left as possible.
In other words, a complete binary tree is a binary tree where all levels except the last are
completely filled, and in the last level, nodes are filled from left to right without any gaps. This
means that if a node has a left child, it must also have a right child, except possibly for nodes on
the last level.
Here the circles represent the internal nodes and the boxes represent the external nodes.
Properties of External binary tree
o The nodes from the original tree are internal nodes and the special nodes are external
nodes.
o All external nodes are leaf nodes and the internal nodes are non-leaf nodes.
Every internal node has exactly two children and every external node is a leaf. It displays the
result which is a complete binary tree.
Expression evaluation: Threaded binary trees can be used to evaluate arithmetic expressions in a way
that avoids recursion or a stack. The tree can be constructed from the input expression, and then
traversed in-order or pre-order to perform the evaluation.
Database indexing: In a database, threaded binary trees can be used to index data based on a specific
field (e.g. last name). The tree can be constructed with the indexed values as keys, and then traversed
in-order to retrieve the data in sorted order.
Symbol table management: In a compiler or interpreter, threaded binary trees can be used to store
and manage symbol tables for variables and functions. The tree can be constructed with the symbols
as keys, and then traversed in-order or pre-order to perform various operations on the symbol table.
Disk-based data structures: Threaded binary trees can be used in disk-based data structures (e.g. B-
trees) to improve performance. By threading the tree, it can be traversed in a way that minimizes disk
seeks and improves locality of reference.
Navigation of hierarchical data: In certain applications, threaded binary trees can be used to navigate
hierarchical data structures, such as file systems or web site directories. The tree can be constructed
from the hierarchical data, and then traversed in-order or pre-order to efficiently access the data in a
specific order.
You can see, none of the nodes above has a balance factor of more than 1 or less than -1. So, for
a balanced tree to be considered an AVL tree. If a tree is not the AVL tree then we have to make
some rotation according to the situation and then it will become the AVL tree. So, rotation is very
important for making AVL trees and it is of four types:
o LL Rotation: The name LL, is just because we inserted the new element to the left subtree
of the root. In this rotation technique, you just simply rotate your tree one time in the
clockwise direction as shown below:
o RR Rotation: The name RR, is just because we inserted the new element to the right
subtree of the root. In this rotation technique, you just simply rotate your tree one time
in the anticlockwise direction as shown below:
o LR Rotation: The method you will follow now to make this tree an AVL again is called the
LR rotation. The name LR is just because we inserted the new element to the right to the
left subtree of the root. In this rotation technique, there is a subtle complexity, which
says, first rotate the left subtree in the anticlockwise direction, and then the whole tree
in the clockwise direction. Follow the two steps illustrated below:
o RL Rotation: The method you will follow now to make this tree an AVL again is called the
RL rotation. The name RL is just because we inserted the new element to the left to the
right subtree of the root. We follow the same technique we used above, which says, first
rotate the right subtree in the clockwise direction, and then the whole tree in the
anticlockwise direction. Follow the two steps illustrated below:
B Tree:
The limitations of traditional binary search trees can be frustrating. Meet the B-Tree, the multi-
talented data structure that can handle massive amounts of data with ease. When it comes to
storing and searching large amounts of data, traditional binary search trees can become
impractical due to their poor performance and high memory usage. B-Trees, also known as B-
Tree or Balanced Tree, are a type of self-balancing tree that was specifically designed to
overcome these limitations.
B-tree is a special type of self-balancing search tree in which each node can contain more than
one key and can have more than two children. It is a generalized form of the binary search tree.
It is also known as a height-balanced m-way tree.
B-TREE
Each node in a B-Tree can contain multiple keys, which allows the tree to have a larger branching
factor and thus a shallower height. This shallow height leads to less disk I/O, which results in
faster search and insertion operations. B-Trees are particularly well suited for storage systems
that have slow, bulky data access such as hard drives, flash memory, and CD-ROMs.
Applications of B-Trees:
It is used in large databases to access data stored on the disk
Searching for data in a data set can be achieved in significantly less time using the B-Tree
With the indexing feature, multilevel indexing can be achieved.
Most of the servers also use the B-tree approach.
B-Trees are used in CAD systems to organize and search geometric data.
B-Trees are also used in other areas such as natural language processing, computer networks, and
cryptography.
Advantages of B-Trees:
B-Trees have a guaranteed time complexity of O(log n) for basic operations like insertion, deletion,
and searching, which makes them suitable for large data sets and real-time applications.
B-Trees are self-balancing.
High-concurrency and high-throughput.
Efficient storage utilization.
Disadvantages of B-Trees:
B-Trees are based on disk-based data structures and can have a high disk usage.
Not the best for all cases.
Slow in comparison to other data structures.
/* A binary tree node has data, pointer to left child and a pointer to right child */
struct node {
int data;
struct node* left;
struct node* right;
};
/* Helper function that allocates a new node with the given data and NULL left and right pointers. */
return (node);
}
/* Driver code*/
int main()
{
struct node* root = newNode(1);
root->left = newNode(2);
root->right = newNode(3);
root->left->left = newNode(4);
root->left->right = newNode(5);
// Function call
printf("\nInorder traversal of binary tree is \n");
printInorder(root);
getchar();
return 0;
}
Preorder Traversal: In this traversal method, the root node is visited first, then the left subtree and finally
the right subtree. The algorithm of Preorder tree traversal is:
/* A binary tree node has data, pointer to left child and a pointer to right child */
struct node {
int data;
struct node* left;
struct node* right;
};
/* Helper function that allocates a new node with the given data and NULL left and right pointers. */
struct node* newNode(int data)
{
struct node* node
= (struct node*)malloc(sizeof(struct node));
node->data = data;
node->left = NULL;
node->right = NULL;
return (node);
}
/* Given a binary tree, print its nodes in preorder*/
void printPreorder(struct node* node)
{
if (node == NULL)
return;
/* Driver code*/
int main()
{
struct node* root = newNode(1);
root->left = newNode(2);
root->right = newNode(3);
root->left->left = newNode(4);
root->left->right = newNode(5);
// Function call
printf("\nPreorder traversal of binary tree is \n");
printPreorder(root);
getchar();
return 0;
}
Postorder Traversal: In this traversal method, the root node is visited last, hence the name. First we
traverse the left subtree, then the right subtree and finally the root node. The algorithm is as follows:
return (node);
}
/* Driver code*/int
main()
{
struct node* root = newNode(1);
root->left = newNode(2);
root->right = newNode(3); root-
>left->left = newNode(4);
root->left->right = newNode(5);
// Function call
printf("\nPostorder traversal of binary tree is \n");
printPostorder(root);
getchar();
return 0;
}
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to
input characters, lengths of the assigned codes are based on the frequencies of corresponding
characters. Huffman codes are of variable-length, and without any prefix (that means no code is a prefix of
any other). Any prefix-free binary code can be displayed or visualized as a binary tree with the encoded
characters stored at the leaves.
Huffman tree or Huffman coding tree defines as a full binary tree in which each leaf of the tree corresponds
to a letter in the given alphabet.
The Huffman tree is treated as the binary tree associated with minimum external path weight that means, the
one associated with the minimum sum of weighted path lengths for the given set of leaves. So the goal is to
construct a tree with the minimum external path weight.
Steps to build Huffman Tree
Input is an array of unique characters along with their frequency of occurrences and output is Huffman
Tree.
1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as
a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the
least frequent character is at root)
2. Extract two nodes with the minimum frequency from the min heap.
3. Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the
first extracted node as its left child and the other extracted node as its right child. Add this node to the
min heap.
4. Repeat steps 2 and 3 until the heap contains only one node. The remaining node is the root node and
the tree is complete.
Frequency 2 7 24 32 37 42 42 120
Huffman code
Letter Freq Code Bits
e 120 0 1
d 42 101 3
l 42 110 3
u 37 100 3
c 32 1110 4
m 24 11111 5
k 7 111101 6
z 2 111100 6