data structure ii UNIT NOTES
data structure ii UNIT NOTES
A tree is recursively defined as a set of one or more nodes where one node is designated as the root
of the tree and all the remaining nodes can be partitioned into non-empty sets each of which is a
sub-tree of the root.
Figure 9.1 shows a tree where node A is the root node; nodes B, C, and D are children of the root
node and form sub-trees of the tree rooted at node A.
Root node The root node R is the topmost node in the tree. If R = NULL, then it means the tree is
empty. S
Sub-trees If the root node R is not NULL, then the trees T1 , T2 , and T3 are called the sub-trees of R.
Leaf node A node that has no children is called the leaf node or the terminal node. Path A sequence
of consecutive edges is called a path. For example, in Fig. 9.1, the path from the root node A to node
I is given as: A, D, and I.
Ancestor node An ancestor of a node is any predecessor node on the path from root to that node.
The root node does not have any ancestors. In the tree given in Fig. 9.1, nodes A, C, and G are the
ancestors of node K.
Descendant node A descendant node is any successor node on any path from the node to a leaf
node. Leaf nodes do not have any descendants. In the tree given in Fig. 9.1, nodes C, G, J, and K are
the descendants of node A.
Level number Every node in the tree is assigned a level number in such a way that the root node is
at level 0, children of the root node are at level number 1. Thus, every node is at one level higher
than its parent. So, all child nodes have a level number given by parent’s level number + 1.
Degree Degree of a node is equal to the number of children that a node has. The degree of a leaf
node is zero.
A binary tree is a data structure that is defined as a collection of elements called nodes.
In a binary tree, the topmost element is called the root node, and each node has 0, 1, or at
the most 2 children.
A node that has zero children is called a leaf node or a terminal node.
Every node contains a data element, a left pointer which points to the left child, and a right
pointer which points to the right child.
The root element is pointed by a 'root' pointer. If root = NULL, then it means the tree is
empty
TRAVERSING A BINARY TREE
Traversing a binary tree is the process of visiting each node in the tree exactly once in a systematic
way.
Unlike linear data structures in which the elements are traversed sequentially, tree is a nonlinear
data structure in which the elements can be traversed in many different ways.
These algorithms differ in the order in which the nodes are visited. In this section, we will discuss
these algorithms.
Pre-order Traversal
To traverse a non-empty binary tree in pre-order, the following operations are performed
recursively at each node.
the pre-order traversal of the tree is given as A, B, C. Root node first, the left sub-tree next, and then
the right sub-tree.
In this algorithm, the left sub-tree is always traversed before the right sub-tree. T
he word ‘pre’ in the pre-order specifies that the root node is accessed prior to any other nodes in
the left and right sub-trees.
In-order Traversal
To traverse a non-empty binary tree in in-order, the following operations are performed recursively
at each node.
.The in-order traversal of the tree is given as B, A, and C. Left sub-tree first, the root node next, and
then the right sub-tree.
In this algorithm, the left sub-tree is always traversed before the root node and the right sub-tree.
The word ‘in’ in the in-order specifies that the root node is accessed in between the left and the right
sub-trees.
To traverse a non-empty binary tree in post-order, the following operations are performed
recursively at each node.
Consider the tree given in Fig. 9.18. The post-order traversal of the tree is given as B, C, and A.
Left sub-tree first, the right sub-tree next, and finally the root node.
In this algorithm, the left sub-tree is always traversed before the right sub-tree and the root node.
The word ‘post’ in the post-order specifies that the root node is accessed after the left and the right
sub-trees.
in the Memory In the computer’s memory, a binary tree can be maintained either by using a linked
representation or by using a sequential representation.
Linked representation of binary trees In the linked representation of a binary tree, every node will
have three parts:
Every binary tree has a pointer ROOT, which points to the root element (topmost element) of the
tree.
the left position is used to point to the left child of the node or to store the address of the left child
of the node.
Finally, the right position is used to point to the right child of the node or to store the address of the
right child of the node.
Though it is the simplest technique for memory representation, it is inefficient as it requires a lot of
memory space.
In a binary search tree, all the nodes in the left sub-tree have a value less than that of the root node.
Correspondingly, all the nodes in the right sub-tree have a value either equal to or greater than the
root node.
Since the nodes in a binary search tree are ordered, the time needed to search an element in the
tree is greatly reduced.
Whenever we search for an element, we do not need to traverse the entire tree.
For example, in the given tree, if we have to search for 29, then we know that we have to scan only
the left sub-tree. If the value is present in the tree, it will only be in the left sub-tree, as 29 is smaller
than 39 (the root node’s value).
The left sub-tree has a root node with the value 27. Since 29 is greater than 27, we will move to the
right sub-tree, where we will find the element.
Thus, the average running time of a search operation is O(log2 n), as at every step, we eliminate half
of the sub-tree from the search process.
Due to its efficiency in searching elements, binary search trees are widely used in dictionary
problems where the code always inserts and searches the elements that are indexed by some key
value.
Binary search trees also speed up the insertion and deletion operations.
The tree has a speed advantage when the data in the structure changes rapidly.
However, in the worst case, a binary search tree will take O(n) time to search for an element.
The left sub-tree of a node N contains values that are less than N’s value.
The right sub-tree of a node N contains values that are greater than N’s value.
Both the left and the right binary trees also satisfy these properties and, thus, are binary
search trees.
The search function is used to find whether a given value is present in the tree or not.
The searching process begins at the root node.
The function first checks if the binary search tree is empty. If it is empty, then the value we
are searching for is not present in the tree.
So, the search algorithm terminates by displaying an appropriate message.
The insert function is used to add a new node with a given value at the correct position in the binary
search tree.
Adding the node at the correct position means that the new node should not violate the properties
of the binary search tree.
However, utmost care should be taken that the properties of the binary search tree are not violated
and nodes are not lost in the process
In order to determine the height of a binary search tree, we calculate the height of the left sub-tree
and the right sub-tree.
For example, if the height of the left sub-tree is greater than that of the right sub-tree, then 1 is
added to the left sub-tree, else 1 is added to the right sub-tree.
• You can utilize these fields in such a way so that the empty
inorder traversal.
Similarly, the empty right child field of a node can be used to point to its in-order successor.
Such a type of binary tree is known as a one way threaded binary tree.
A field that holds the address of its in-order successor is known as thread.
In-order :- 30 40 50 60 65 69 72 80
APPLICATIONs OF TREES
Trees are used to store simple as well as complex data. Here simple means an integer value,
character value and complex data means a structure or a record.
Trees are often used for implementing other types of data structures like hash tables, sets,
and maps.
A self-balancing tree, Red-black tree is used in kernel scheduling, to preempt massively
multiprocessor computer operating system use. (We will study red-black trees in next
chapter.)
Another variation of tree, B-trees are prominently used to store tree structures on disc. They
are used to index a large number of records. (We will study B-Trees in Chapter 11.)
B-trees are also used for secondary indexes in databases, where the index facilitates a select
operation to answer some range criteria.
Trees are an important data structure used for compiler construction.
Trees are also used in database design.
Trees are used in file system directories.
Trees are also widely used for information storage and retrieval in symbol tables.
REPRESENTATION OF GRAPHS
There are three common ways of storing graphs in the computer’s memory.
They are:
An adjacency matrix is used to represent which nodes are adjacent to one another.
By definition, two nodes are said to be adjacent if there is an edge connecting them.
In a directed graph G, if node v is adjacent to node u, then there is definitely an edge from u to
v.
For any graph G having n nodes, the adjacency matrix will have the dimension of n ¥ n. In an
adjacency matrix, the rows and columns are labelled by graph vertices.
An entry aij in the adjacency matrix will contain 1, if vertices vi and vj are adjacent to each
other.
However, if the nodes are not adjacent, aij will be set to zero.
It is summarized in Fig. 13.13. Since an adjacency matrix contains only 0s and 1s, it is called a bit
matrix or a Boolean matrix.
The entries in the matrix depend on the ordering of the nodes in G. Therefore, a change in the
order of nodes will result in a different adjacency matrix.
An adjacency list is another way in which graphs can be represented in the computer’s memory.
Furthermore, every node is in turn linked to its own list that contains the names of all other
nodes that are adjacent to it.
It is easy to follow and clearly shows the adjacent nodes of a particular node.
It is often used for storing graphs that have a small-to-moderate number of edges. That
is, an adjacency list is preferred for representing sparse graphs in the computer’s
memory; otherwise, an adjacency matrix is a good choice.
Adding new nodes in G is easy and straightforward when G is represented using an
adjacency list. Adding new nodes in an adjacency matrix is a difficult task, as the size of
the matrix needs to be changed and existing nodes may have to be reordered
Adjacency Multi-list Representation
Graphs can also be represented using multi-lists which can be said to be modified
version of adjacency lists.
Adjacency multi-list is an edge-based rather than a vertex-based representation of
graphs.
A multi-list representation basically consists of two parts—a directory of nodes’
information and a set of linked lists storing information about edges.
While there is a single entry for each node in the node directory, every node, on the
other hand, appears in two adjacency lists (one for the node at each end of the edge).
For example, the directory entry for node i points to the adjacency list for node i.
This means that the nodes are shared among several lists. In a multi-list representation,
the information about an edge (vi , vj ) of an undirected graph can be stored using the
following attributes:
M: A single bit field to indicate whether the edge has been examined or not.
vi : A vertex in the graph that is connected to vertex vj by an edge.
vj : A vertex in the graph that is connected to vertex vi by an edge.
Link i for vi : A link that points to another node that has an edge incident on vi . Link j for
vi : A link that points to another node that has an edge incident on vj .
APPLICATIONS OF GRAPHS
In circuit networks where points of connection are drawn as vertices and component wires
become the edges of the graph.
In transport networks where stations are drawn as vertices and routes become the edges of
the graph.
In maps that draw cities/states/regions as vertices and adjacency relations as edges. ∑ In
program flow analysis where procedures or modules are treated as vertices and calls to
these procedures are drawn as edges of the graph.
Once we have a graph of a particular concept, they can be easily used for finding shortest
paths, project planning, etc.
In flowcharts or control-flow graphs, the statements and conditions in a program are
represented as nodes and the flow of control is represented by the edges.
In state transition diagrams, the nodes are used to represent states and the edges represent
legal moves from one state to the other.
Graphs are also used to draw activity network diagrams. These diagrams are extensively
used as a project management tool to represent the interdependent relationships between
groups, steps, and tasks that have a significant impact on the project.
An Activity Network Diagram (AND) also known as an Arrow Diagram or a PERT (Program
Evaluation Review Technique) is used to identify time sequences of events which are pivotal
to objectives.
It is also helpful when a project has multiple activities which need simultaneous
management. ANDs help the project development team to create a realistic project schedule
by drawing graphs that exhibit: