DS-Unit III
DS-Unit III
Tree
In linear data structure data is organized in sequential order and in non-linear data structure data
is organized in random order. A tree is a very popular non-linear data structure used in a wide
range of applications. A tree data structure can be defined as follows...
Tree is a non-linear data structure which organizes data in hierarchical structure and this
is a recursive definition.
structure recursively
Terminology
In a tree data structure, we use the following terminology...
1. Root
In a tree data structure, the first node is called
as Root Node. Every tree must have a root node.
We can say that the root node is the origin of the
tree data structure. In any tree, there must be only
one root node. We never have multiple root nodes
in a tree.
2. Edge
In a tree data structure, the connecting
link between any two nodes is called
as EDGE. In a tree with 'N' number of
nodes there will be a maximum of 'N-1'
number of edges.
3. Parent
In a tree data structure, the node which is a
predecessor of any node is called
as PARENT NODE. In simple words, the
node which has a branch from it to any
other node is called a parent node. Parent
node can also be defined as "The node
which has child / children".
4. Child
In a tree data structure, the node which is
descendant of any node is called as CHILD
Node. In simple words, the node which has a
link from its parent node is called as child
node. In a tree, any parent node can have any
number of child nodes. In a tree, all the
nodes except root are child nodes.
5. Siblings
In a tree data structure, nodes which belong to same Parent are called as SIBLINGS. In simple
words, the nodes with the same parent are called Sibling nodes.
6. Leaf
In a tree data structure, the node which does not have a child is called as LEAF Node. In simple
words, a leaf is a node with no child.
In a tree data structure, the leaf nodes are also called as External Nodes. External node is also a
node with no child. In a tree, leaf node is also called as 'Terminal' node.
7. Internal Nodes
In a tree data structure, the node which has atleast one child is called as INTERNAL Node. In
simple words, an internal node is a node with atleast one child.
In a tree data structure, nodes other than leaf nodes are called as Internal Nodes. The root node
is also said to be Internal Node if the tree has more than one node. Internal nodes are also
called as 'Non-Terminal' nodes.
8. Degree
In a tree data structure, the total number
of children of a node is called
as DEGREE of that Node. In simple
words, the Degree of a node is total
number of children it has. The highest
degree of a node among all the nodes in
a tree is called as 'Degree of Tree'
9. Level
In a tree data structure, the root node is said to be at
Level 0 and the children of root node are at Level 1
and the children of the nodes which are at Level 1 will
be at Level 2 and so on... In simple words, in a tree
each step from top to bottom is called as a Level and
the Level count starts with '0' and incremented by one
at each level (Step).
10. Height
In a tree data structure, the total number of edges from leaf node to a particular node in the
longest path is called as HEIGHT of that Node. In a tree, height of the root node is said to
be height of the tree. In a tree, height of all leaf nodes is '0'.
11. Depth
In a tree data structure, the total number of egdes from root node to a particular node is called
as DEPTH of that Node. In a tree, the total number of edges from root node to a leaf node in the
longest path is said to be Depth of the tree. In simple words, the highest depth of any leaf node
in a tree is said to be depth of that tree. In a tree, depth of the root node is '0'.
12. Path
In a tree data structure, the sequence of Nodes and Edges from one node to another node is called
as PATH between that two Nodes. Length of a Path is total number of nodes in that path. In
below example the path A - B - E - J has length 4.
Binary Tree
In a normal tree, every node can have any number of children. A binary tree is a special type of
tree data structure in which every node can have a maximum of 2 children. One is known as a
left child and the other is known as right child.
A tree in which every node can have a maximum of two children is called Binary Tree.
In a binary tree, every node can have either 0 children or 1 child or 2 children but not more than
2 children.
Example
There are different types of binary trees and they are...
In above figure, a normal binary tree is converted into full binary tree by adding dummy nodes
(In pink colour).
1. Array Representation
In array representation of a binary tree, we use one-dimensional array (1-D Array) to represent a
binary tree.
To represent a binary tree of depth 'n' using array representation, we need one dimensional array
We use a double linked list to represent a binary tree. In a double linked list, every node consists
of three fields. First field for storing left child address, second for storing actual data and third
The above example of the binary tree represented using Linked list representation is shown as
follows...
Binary Tree Traversals
When we wanted to display a binary tree, we need to follow some order in which all the nodes of that
binary tree must be displayed. In any binary tree, displaying order of nodes depends on the traversal
method.
Displaying (or) visiting order of nodes in a binary tree is called as Binary Tree Traversal.
1. In - Order Traversal
1. In-order Traversal
In this traversal method, the left subtree is visited first, then the root and later the right sub-tree.
We should always remember that every node may represent a subtree itself.
If a binary tree is traversed in-order, the output will produce sorted key values in an ascending
order.
D→B→E→A→F→C→G
Algorithm
Until all nodes are traversed −
Step 1 − Recursively traverse left subtree.
Step 2 − Visit root node.
Step 3 − Recursively traverse right subtree.
2. Pre-order Traversal
In this traversal method, the root node is visited first, then the left subtree and finally the right
subtree.
We start from A, and following pre-order traversal, we
first visit A itself and then move to its left
subtree B. B is also traversed pre-order. The process
goes on until all the nodes are visited. The output of pre-
order traversal of this tree will be −
A→B→D→E→C→F→G
Algorithm
Until all nodes are traversed −
Step 1 − Visit root node.
Step 2 − Recursively traverse left subtree.
Step 3 − Recursively traverse right subtree.
3. Post-order Traversal
In this traversal method, the root node is visited last, hence the name. First we traverse the left
subtree, then the right subtree and finally the root node.
D→E→B→F→G→C→A
Algorithm
Until all nodes are traversed −
Step 1 − Recursively traverse left subtree.
Step 2 − Recursively traverse right subtree.
Step 3 − Visit root node.
More On Binary Trees
Copy Binary Tree
Given a binary tree whose root node address is given by the pointer value ROOT and this
algorithm generates a copy of the tree
PROCEDURE COPY(ROOT)
If ROOT = NULL
then return (NULL).
5. [FINISH]
return(NEW). and returns the address of its
root node. NEW is a temporary pointer
variable
Determine if Two Trees are Identical
We have the pointers reference the next node in an inorder traversal; called threads
We need to know if a pointer is an actual link or a thread, so we keep a boolean for each
pointer
Why do we need Threaded Binary Tree?
Binary trees have a lot of wasted space: the leaf nodes each have 2 null pointers. We can use
these pointers to help us in inorder traversals.
Threaded binary tree makes the tree traversal faster since we do not need stack or recursion
for traversal
Types of threaded binary trees:
Now, if we look at the first child-next sibling representation of the tree closely, we will see
that it forms a binary tree. To see this better, we can rotate every next-sibling edge 45
degrees clockwise. After that we get the following binary tree:
1
/
2
/\
5 3
\ \
6 4
/
7
/
8
\
9
in the above figure, we can see the mapping among the vertices (A, B, C, D, E) is represented by
using the adjacency matrix which is also shown in the figure.
There exists different adjacency matrices for the directed and undirected graph. In directed
graph, an entry Aij will be 1 only when there is an edge directed from Vi to Vj.
A directed graph and its adjacency matrix representation is shown in the following figure.
Representation of weighted directed graph is different. Instead of filling the entry by 1, the Non-
zero entries of the adjacency matrix are represented by the weight of respective edges.
The weighted directed graph along with the adjacency matrix representation is shown in the
following figure.
Linked Representation
In the linked representation, an adjacency list is used to store the Graph into the computer's
memory.
Consider the undirected graph shown in the following figure and check the adjacency list
representation.
An adjacency list is maintained for each node present in the graph which stores the node value
and a pointer to the next adjacent node to the respective node. If all the adjacent nodes are
traversed then store the NULL in the pointer field of last node of the list. The sum of the lengths
of adjacency lists is equal to the twice of the number of edges present in an undirected graph.
Consider the directed graph shown in the following figure and check the adjacency list
representation of the graph.
In a directed graph, the sum of lengths of all the adjacency lists is equal to the number of edges
present in the graph.
In the case of weighted directed graph, each node contains an extra field that is called the weight
of the node. The adjacency list representation of a directed graph is shown in the following
figure.
Graph Traversal
Graph traversal is a technique used for a searching vertex in a graph. The graph traversal is also
used to decide the order of vertices is visited in the search process. A graph traversal finds the
edges to be used in the search process without creating loops. That means using graph traversal
we visit all the vertices of the graph without getting into looping path.
There are two graph traversal techniques and they are as follows...
1. DFS (Depth First Search)
2. BFS (Breadth First Search)
DFS (Depth First Search)
DFS traversal of a graph produces a spanning tree as final result. Spanning Tree is a graph
without loops. We use Stack data structure with maximum size of total number of vertices in
the graph to implement DFS traversal.
Back tracking is coming back to the vertex from which we reached the current vertex.
Example
BFS (Breadth First Search)
BFS traversal of a graph produces a spanning tree as final result. Spanning Tree is a graph
without loops. We use Queue data structure with maximum size of total number of vertices in
the graph to implement BFS traversal.
The connected components of a graph can be found using either a depth-first search (DFS), or a
breadth-first search (BFS). We start at an arbitrary vertex, and visit every vertex adjacent to it
recursively, adding them to the first component. Once this search has finished, we have visited
all of the vertices in the first connected component, so we choose another unvisited vertex (if
any) and perform the same search starting from it, adding the vertices we find to the second
component. This process continues until all vertices have been visited, at which point we know
the number of connected components in the graph, and which vertices they contain.
Algorithm
By this definition, we can draw a conclusion that every connected and undirected Graph G has
at least one spanning tree. A disconnected graph does not have any spanning tree, as it cannot
be spanned to all its vertices.
We found three spanning trees off one complete graph. A complete undirected graph can have
maximum nn-2 number of spanning trees, where n is the number of nodes. In the above
addressed example, n is 3, hence 33−2 = 3spanning trees are possible.
All possible spanning trees of graph G, have the same number of edges and vertices.
Removing one edge from the spanning tree will make the graph disconnected, i.e. the
spanning tree is minimally connected.
Adding one edge to the spanning tree will create a circuit or loop, i.e. the spanning tree
is maximally acyclic.
MinimumSpanningTree (MST)
In a weighted graph, a minimum spanning tree is a spanning tree that has minimum weight than
all other spanning trees of the same graph. In real-world situations, this weight can be measured
as distance, congestion, traffic load or any arbitrary value denoted to the edges.
MinimumSpanning-TreeAlgorithm
We shall learn about two most important spanning tree algorithms here −
Kruskal's Algorithm
Prim's Algorithm
Kruskal’s Algorithm builds the spanning tree by adding edges one by one into a growing
spanning tree. Kruskal's algorithm follows greedy approach as in each iteration it finds an edge
which has least weight and add it to the growing spanning tree.
below are the steps for finding MST using Kruskal’s algorithm
1. Sort all the edges in non-decreasing order of their weight.
2. Pick the smallest edge. Check if it forms a cycle with the spanning tree formed so far. If
cycle is not formed, include this edge. Else, discard it.
3. Repeat step#2 until there are (V-1) edges in the spanning tree.
To understand Kruskal's algorithm let us consider the following example −
Step 1 - Remove all loops and Parallel Edges
Remove all loops and parallel edges from the given graph.
In case of parallel edges, keep the one which has the least cost associated and remove all others.
Next cost is 3, and associated edges are A,C and C,D. We add them again −
Next cost in the table is 4, and we observe that adding it will create a circuit in the graph. −
We ignore it. In the process we shall ignore/avoid all edges that create a circuit.
We observe that edges with cost 5 and 6 also create circuits. We ignore them and move on.
Now we are left with only one node to be added. Between the two least cost edges available 7
and 8, we shall add the edge with cost 7.
By adding edge S,A we have included all the nodes of the graph and we now have minimum
cost spanning tree.
Prim's Spanning Tree Algorithm
Prim's algorithm to find minimum cost spanning tree (as Kruskal's algorithm) uses the greedy
approach. Prim's algorithm shares a similarity with the shortest path first algorithms.
Prim's algorithm, in contrast with Kruskal's algorithm, treats the nodes as a single tree and
keeps on adding new nodes to the spanning tree from the given graph.
To contrast with Kruskal's algorithm and to understand Prim's algorithm better, we shall use the
same example −
Remove all loops and parallel edges from the given graph. In case of parallel edges, keep the
one which has the least cost associated and remove all others.
In this case, we choose S node as the root node of Prim's spanning tree. This node is arbitrarily
chosen, so any node can be the root node. One may wonder why any video can be a root node.
So the answer is, in the spanning tree all the nodes of a graph are included and because it is
connected then there must be at least one edge, which will join it to the rest of the tree.
Step 3 - Check outgoing edges and select the one with less cost
After choosing the root node S, we see that S,A and S,C are two edges with weight 7 and 8,
respectively. We choose the edge S,A as it is lesser than the other.
Now, the tree S-7-A is treated as one node and we check for all edges going out from it. We
select the one which has the lowest cost and include it in the tree.
After this step, S-7-A-3-C tree is formed. Now we'll again treat it as a node and will check all
the edges again. However, we will choose only the least cost edge. In this case, C-3-D is the
new edge, which is less than other edges' cost 8, 6, 4, etc.
After adding node D to the spanning tree, we now have two edges going out of it having the
same cost, i.e. D-2-T and D-2-B. Thus, we can add either one. But the next step will again yield
edge 2 as the least cost. Hence, we are showing a spanning tree with both edges included.
We may find that the output spanning tree of the same graph using two different algorithms is
same.
Transitive closure of a Graph
Transitive Closure it the reachability matrix to reach from vertex u to vertex v of a graph. One
graph is given, we have to find a vertex v which is reachable from another vertex u, for all vertex
pairs (u, v).
The final matrix is the Boolean type. When there is a value 1 for vertex u to vertex v, it means
that there is at least one path from u to v.
Output:
1111
0111
0011
0001
Algorithm
Begin
copy the adjacency matrix into another matrix named transMat
for any vertex k in the graph, do
for each vertex i in the graph, do
for each vertex j in the graph, do
transMat[i, j] := transMat[i, j] OR (transMat[i, k]) AND transMat[k, j])
done
done
done
Display the transMat
End