0% found this document useful (0 votes)
87 views

Data Structure - Unit 5 - B.Tech 3rd

The document defines various graph terminology and types of graphs. It provides examples and definitions of: 1) Null graphs which have no edges and only isolated vertices. Trivial graphs which have only one vertex. 2) Non-directed and directed graphs based on whether the edges have a defined direction. 3) Connected graphs where there is at least one path between every pair of vertices, and disconnected graphs where at least one pair of vertices has no path between them. 4) Regular graphs where all vertices have the same degree.

Uploaded by

Mangalam Mishra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

Data Structure - Unit 5 - B.Tech 3rd

The document defines various graph terminology and types of graphs. It provides examples and definitions of: 1) Null graphs which have no edges and only isolated vertices. Trivial graphs which have only one vertex. 2) Non-directed and directed graphs based on whether the edges have a defined direction. 3) Connected graphs where there is at least one path between every pair of vertices, and disconnected graphs where at least one pair of vertices has no path between them. 4) Regular graphs where all vertices have the same degree.

Uploaded by

Mangalam Mishra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

UNIT - V

Graph
Definition
A graph G can be defined as an ordered set G(V, E) where V(G) represents the set of
vertices and E(G) represents the set of edges which are used to connect these vertices.
A Graph G(V, E) with 5 vertices (A, B, C, D, E) and six edges ((A,B), (B,C), (C,E), (E,D),
(D,B), (D,A)) is shown in the following figure.

Terminology and Representation


Path
A path can be defined as the sequence of nodes that are followed in order to reach
some terminal node V from the initial node U.

Closed Path
A path will be called as closed path if the initial node is same as terminal node. A path
will be closed path if V0=VN.

Simple Path
If all the nodes of the graph are distinct with an exception V0=VN, then such path P is
called as closed simple path.

Cycle
A cycle can be defined as the path which has no repeated edges or vertices except the
first and last vertices.

Connected Graph
A connected graph is the one in which some path exists between every two vertices (u,
v) in V. There are no isolated nodes in connected graph.

Complete Graph
A complete graph is the one in which every node is connected with all other nodes. A
complete graph contain n(n-1)/2 edges where n is the number of nodes in the graph.

Weighted Graph
In a weighted graph, each edge is assigned with some data such as length or weight.
The weight of an edge e can be given as w(e) which must be a positive (+) value
indicating the cost of traversing the edge.

Digraph
A digraph is a directed graph in which each edge of the graph is associated with some
direction and the traversing can be done only in the specified direction.

Loop
An edge that is associated with the similar end points can be called as Loop.

Adjacent Nodes
If two nodes u and v are connected via an edge e, then the nodes u and v are called as
neighbours or adjacent nodes.

Degree of the Node


A degree of a node is the number of edges that are connected with that node. A node
with degree 0 is called as isolated node.

Graphs & Multi-graphs


The graph data structure is used to store data required in computation to solve
many computer programming problems.
Graphs are used to address real-world problems in which the problem area is
represented as a network, such as telephone networks, circuit networks, etc.
In computer science, the graph is an abstract data type used to implement the
undirected and directed graph notions from graph theory in mathematics.
A graph data structure is made up of a finite and potentially mutable set of
vertices (also known as nodes or points), as well as a set of unordered pairs for
an undirected graph or a set of ordered pairs for a directed graph.
These pairs are recognized as edges, links, or lines in a directed graph but are
also known as arrows or arcs.
The vertices could be internal graph elements or external items represented by
integer indices or references.
So depending upon the position of these nodes and vertices, there are different
types of graphs, such as:

o Null Graph

o Trivial Graph

o Non-directed Graph

o Directed Graph

o Connected Graph

o Disconnected Graph

o Regular Graph

o Complete Graph

o Cycle Graph

o Cyclic Graph

o Acyclic Graph

o Finite Graph

o Infinite Graph

o Bipartite Graph

o Planar Graph

o Simple Graph

o Multi Graph

o Pseudo Graph

o Euler Graph

o Hamiltonian Graph

Null Graph
The Null Graph is also known as the order zero graph. The term "null graph"
refers to a graph with an empty edge set.
In other words, a null graph has no edges, and the null graph is present with only
isolated vertices in the graph.

The image displayed above is a null or zero graphs because it has zero edges
between the three vertices of the graph.

Trivial Graph
A graph is called a trivial graph if it has only one vertex present in it.
The trivial graph is the smallest possible graph that can be created with the least
number of vertices that is one vertex only.

The above is an example of a trivial graph having only a single vertex in the whole
graph named vertices A.

Non-Directed Graph
A graph is called a non-directed graph if all the edges present between any graph
nodes are non-directed.
By non-directed edges, we mean the edges of the graph that cannot be
determined from the node it is starting and at which node it is ending.
All the edges for a graph need to be non-directed to call it a non-directed graph.
All the edges of a non-directed graph don't have any direction.
The graph that is displayed above is an example of a disconnected graph.
This graph is called a disconnected graph because there are four vertices named
vertex A, vertex B, vertex C, and vertex D.
There are also exactly four edges between these vertices of the graph.
And all the vertices that are present between the different nodes of the graph are
not directed, which means the edges don't have any specific direction.
For example, the edge between vertex A and vertex B doesn't have any direction,
so we cannot determine whether the edge between vertex A and vertex B starts
from vertex A or vertex B.
Similarly, we can't determine the ending vertex of this edge between these nodes.

Directed Graph
Another name for the directed graphs is digraphs.
A graph is called a directed graph or digraph if all the edges present between any
vertices or nodes of the graph are directed or have a defined direction.
By directed edges, we mean the edges of the graph that have a direction to
determine from which node it is starting and at which node it is ending.
All the edges for a graph need to be directed to call it a directed graph or digraph.
All the edges of a directed graph or digraph have a direction that will start from
one vertex and end at another.
The graph that is displayed above is an example of a connected graph.
This graph is called a connected graph because there are four vertices in the
graph named vertex A, vertex B, vertex C, and vertex D.
There are also exactly four edges between these vertices of the graph and all the
vertices that are present between the different nodes of the graph are directed
(or pointing to some of the vertices) which means the edges have a specific
direction assigned to them.
For example, consider the edge that is present between vertex D and vertex A.
This edge shows that an arrowhead is pointing towards vertex A, which means
this edge starts from vertex D and ends at vertex A.

Connected Graph
For a graph to be labelled as a connected graph, there must be at least a single
path between every pair of the graph's vertices.
In other words, we can say that if we start from one vertex, we should be able to
move to any of the vertices that are present in that particular graph, which means
there exists at least one path between all the vertices of the graph.
The graph shown above is an example of a connected graph because we start
from any one of the vertices of the graph and start moving towards any other
remaining vertices of the graph.
There will exist at least one path for traversing the graph.
For example, if we begin from vertex B and traverse to vertex H, there are
various paths for traversing. One of the paths is
Vertice B ->vertice C ->vertice D ->vertice F ->vertice E ->vertice H.
Similarly, there are other paths for traversing the graph from vertex B to vertex H.
there is at least one path between all the graph nodes.
In other words, we can say that all the vertices or nodes of the graph are
connected to each other via edge or number of edges.

Disconnected Graph
A graph is said to be a disconnected graph where there does not exist any path
between at least one pair of vertices.
In other words, we can say that if we start from any one of the vertices of the
graph and try to move to the remaining present vertices of the graph and there
exists not even a single path to move to that vertex, then it is the case of the
disconnected graph.
If any one of such a pair of vertices doesn't have a path between them, it is called
a disconnected graph.
The graph shown above is a disconnected graph.
The above graph is called a disconnected graph because at least one pair of
vertices doesn't have a path to traverse starting from either node.
For example, a single path between both vertices doesn't exist if we want to
traverse from vertex A to vertex G.
In other words, we can say that all the vertices or nodes of the graph are not
connected to each other via edge or number of edges so that they can be
traversed.

Regular Graph
For a graph to be called a regular, it should satisfy one primary condition: all
graph vertices should have the same degree.
By the degree of vertices, we mean the number of nodes associated with a
particular vertex.
If all the graph nodes have the same degree value, then the graph is called
a regular graph.
If all the vertices of a graph have the degree value of 6, then the graph is called a
6-regular graph.
If all the vertices in a graph are of degree 'k', then it is called a "k-regular graph".
The graphs that are displayed above are regular graphs.
In graph 1, there are three vertices named vertex A, vertex B, and vertex C, All
the vertices in graph 1, have the degree of each node as 2.
The degree of each vertex is calculated by counting the number of edges
connected to that particular vertex.
For vertex A in graph 1, there are two edges associated with vertex A, one from
vertex B and another from vertex D. Thus, the degree of vertex A of graph one is
2.
Similarly, for other vertices of the graph, there are only two edges associated with
each vertex, vertex B and vertex D.
Therefore, vertex B and vertex D are 2. As the degree of all the three nodes of the
graph is the same, that is 2. Therefore, this graph is called a 2-regular graph.
Similarly, for the second graph shown above, there are four vertices named vertex
E, vertex F, vertex G, and vertex F.
The degree of all the four vertices of this graph is 2.
Each vertex of the graph is 2 because only two edges are associated with all of
the graph's vertices.
As all the nodes of this graph have the same degree of 2, this graph is called
a regular graph.

Complete Graph
A graph is said to be a complete graph if, for all the vertices of the graph, there
exists an edge between every pair of the vertices.
In other words, we can say that all the vertices are connected to the rest of all the
vertices of the graph.
There are two graphs name K3 and K4 shown in the above image, and both
graphs are complete graphs.
Graph K3 has three vertices, and each vertex has at least one edge with the rest
of the vertices. Similarly, for graph K4, there are four nodes named vertex E,
vertex F, vertex G, and vertex H.
For example, the vertex F has three edges connected to it to connect it to the
respective three remaining vertices of the graph.
Likewise, for the other three reaming vertices, there are three edges associated
with each one of them.
As all the vertices of this graph have a separate edge for other vertices, it is called
a complete graph.

Cycle Graph
If a graph with many vertices greater than three and edges form a cycle, then the
graph is called a cycle graph.
In a graph of cycle type, the degree of all the vertices of the cycle graph will be 2.

There are three graphs shown in the above image, and all of them are examples
of the cyclic graph because the number of nodes for all of these graphs is greater
than two and the degree of all the vertices of all these graphs is exactly 2.
Cyclic Graph
For a graph to be called a cyclic graph, it should consist of at least one cycle. If a
graph has a minimum of one cycle present, it is called a cyclic graph.

The graph shown in the image has two cycles present, satisfying the required
condition for a graph to be cyclic, thus making it a cyclic graph.

Acyclic Graph
A graph is called an acyclic graph if zero cycles are present, and an acyclic graph
is the complete opposite of a cyclic graph.

The graph shown in the above image is acyclic because it has zero cycles present
in it.
That means if we begin traversing the graph from vertex B, then a single path
doesn't exist that will traverse all the vertices and end at the same vertex that is
vertex B.

Finite Graph
If the number of vertices and the number of edges that are present in a graph are
finite in number, then that graph is called a finite graph.
The graph shown in the above image is the finite graph.
There are four vertices named vertex A, vertex B, vertex C, and vertex D, and the
number of edges present in this graph is also four, as both the number of nodes
and vertices of this graph is finite in number it is called a finite graph.

Infinite Graph
If the number of vertices in the graph and the number of edges in the graph are
infinite in number, that means the vertices and the edges of the graph cannot be
counted, then that graph is called an infinite graph.

As we can see in the above image, the number of vertices in the graph and the
number of edges in the graph are infinite, so this graph is called an infinite graph.

Bipartite Graph
For a graph to be a Bipartite graph, it needs to satisfy some of the basic
preconditions. These conditions are:
o All the vertices of the graph should be divided into two distinct sets of
vertices X and Y.
o All the vertices present in the set X should only be connected to the
vertices present in the set Y with some edges. That means the vertices
present in a set should not be connected to the vertex that is present in
the same set.
o Both the sets that are created should be distinct that means both
should not have the same vertices in them.

The graph shown in the above image is divided into two vertices named set X and
set Y. The contents of these sets are,
Set X = {vertex A, vertex B, vertex C, vertex D}
Set Y = {vertex P, vertex Q, vertex R}
The vertex A of the set X is associated with the vertex Q of the set Y. And the
vertex B is also connected to the vertex Q.
The vertex C of the set X is connected to the two vertices of the set Y named
vertex P and vertex R. The vertex D of the set X is associated with the vertex Q
of the set R.
Similarly, all the vertices present in the set Y are only connected to the vertices
from the set X. And both set X and set Y have non-repeating or distinct elements
present in them.
The graph shown in the above image satisfies all the conditions for the Bipartite
graph, and thus it is a Bipartite graph.

Planar Graph
A graph is called a planar graph if that graph can be drawn in a single plane with
any two of the edges intersecting each other.
In such a way that no edges cross each other.
The graph shown in the above image can be drawn in a single plane with any two
edges intersecting. Thus it is a planar graph.

Simple Graph
A graph is said to be a simple graph if the graph doesn't consist of no self-loops
and no parallel edges in the graph.

We have three vertices and three edges for the graph that is shown in the above
image. This graph has no self-loops and no parallel edges; therefore, it is called
a simple graph.

Multi Graph
A graph is said to be a multigraph if the graph doesn't consist of any self-loops,
but parallel edges are present in the graph.
If there is more than one edge present between two vertices, then that pair of
vertices is said to be having parallel edges.
We have three vertices and three edges for the graph that is shown in the above
image.
There are no self-loops, but two edges connect these two vertices between vertex
A and vertex E of the graph.
In other words, we can say that if two vertices of a graph are connected with more
than one edge in a graph, then it is said to be having parallel edges, thus making
it a multigraph.

Pseudo Graph
If a graph consists of no parallel edges, but self-loops are present in a graph, it is
called a pseudo graph.
The meaning of a self-loop is that there is an edge present in the graph that starts
from one of the graph's vertices, and if that edge ends on the same vertex, then it
is called a pseudo graph.

The graph shown in the above image has vertex A, vertex B and vertex E.
There are four edges in this graph, and there are three edges associated with
vertex A, and among these three edges, one of the edges is a self-loop.
And among these four edges present in there is no parallel edge in it. Since the
graph shown above has a self-loop and no parallel edge present in it, thus it is a
pseudo graph.

Euler Graph
If all the vertices present in a graph have an even degree, then the graph is
known as an Euler graph.
By degree of a vertex, we mean the number of edges that are associated with a
vertex.
So for a graph to be an Euler graph, it is required that all the vertices in the graph
should be associated with an even number of edges.

In the graph shown in the above image, we have five vertices named vertex A,
vertex B, vertex C, vertex D and vertex E.
All the vertices except vertex C have a degree of 2, which means they are
associated with two edges each of the vertex.
At the same time, vertex C is associated with four edges, thus making it degree 4.
The degree of vertex C and other vertices is 4 and 2, respectively, which are
even. Therefore, the graph displayed above is an Euler graph.

Hamilton Graph
Suppose a closed walk in the connected graph that visits every vertex of the
graph exactly once (except starting vertex) without repeating the edges.
Such a graph is called a Hamiltonian graph, and such a walk is called
a Hamiltonian path. The Hamiltonian circuit is also known as Hamiltonian Cycle.
In other words, A Hamiltonian path that starts and ends at the same vertex is
called a Hamiltonian circuit.
Every graph that contains a Hamiltonian circuit also contains a Hamiltonian path,
but vice versa is not true.
There may exist more than one Hamiltonian path and Hamiltonian circuit in a
graph.
The graph shown in the above image consists of a closed path ABCDEFA which
starts from vertex A and traverses all other vertices or nodes without traversing
any of the nodes twice other than vertex A in the path of traversal.
Therefore, the graph shown in the above image is a Hamilton graph.

Directed Graph
In a directed graph, edges form an ordered pair. Edges represent a specific path
from some vertex A to another vertex B.
Node A is called initial node while node B is called terminal node.
A directed graph is shown in the following figure.
Sequential Representations of Graph
In this article, we will discuss the ways to represent the graph. By Graph
representation, we simply mean the technique to be used to store some graph
into the computer's memory.
A graph is a data structure that consist a sets of vertices (called nodes) and
edges. There are two ways to store Graphs into the computer's memory:

o Sequential representation (or, Adjacency matrix representation)

o Linked list representation (or, Adjacency list representation)


In sequential representation, an adjacency matrix is used to store the graph.
Whereas in linked list representation, there is a use of an adjacency list to store
the graph.
In this tutorial, we will discuss each one of them in detail.
Now, let's start discussing the ways of representing a graph in the data structure.

Sequential representation
In sequential representation, there is a use of an adjacency matrix to represent
the mapping between vertices and edges of the graph.
We can use an adjacency matrix to represent the undirected graph, directed
graph, weighted directed graph, and weighted undirected graph.
If adj[i][j] = w, it means that there is an edge exists from vertex i to vertex j with
weight w.
An entry Aij in the adjacency matrix representation of an undirected graph G will
be 1 if an edge exists between Vi and Vj.
If an Undirected Graph G consists of n vertices, then the adjacency matrix for that
graph is n x n, and the matrix A = [aij] can be defined as -
aij = 1 {if there is a path exists from Vi to Vj}
aij = 0 {Otherwise}
It means that, in an adjacency matrix, 0 represents that there is no association
exists between the nodes, whereas 1 represents the existence of a path between
two edges.
If there is no self-loop present in the graph, it means that the diagonal entries of
the adjacency matrix will be 0.

Adjacency matrices
Adjacency matrix for an undirected graph
Now, let's see the adjacency matrix representation of an undirected graph.

In the above figure, an image shows the mapping among the vertices (A, B, C, D,
E), and this mapping is represented by using the adjacency matrix.
There exist different adjacency matrices for the directed and undirected graph. In
a directed graph, an entry Aij will be 1 only when there is an edge directed from
Vi to Vj.

Adjacency matrix for a directed graph


In a directed graph, edges represent a specific path from one vertex to another
vertex.
Suppose a path exists from vertex A to another vertex B; it means that node A is
the initial node, while node B is the terminal node.
Consider the below-directed graph and try to construct the adjacency matrix of it.
In the above graph, we can see there is no self-loop, so the diagonal entries of the
adjacent matrix are 0.

Properties of the adjacency matrix


Some of the properties of the adjacency matrix are listed as follows:
o An adjacency matrix is a matrix that contains rows and columns used
to represent a simple labeled graph with the numbers 0 and 1 in the
position of (VI, Vj), according to the condition of whether or not the two
Vi and Vj are adjacent.
o For a directed graph, if there is an edge exists between vertex i or Vi to
Vertex j or Vj, then the value of A[Vi][Vj] = 1, otherwise the value will be
0.
o For an undirected graph, if there is an edge that exists between vertex i
or Vi to Vertex j or Vj, then the value of A[Vi][Vj] = 1 and A[Vj][Vi] = 1,
otherwise the value will be 0.
Adjacency matrix for a weighted directed graph
It is similar to an adjacency matrix representation of a directed graph except that
instead of using the '1' for the existence of a path, here we have to use the
weight associated with the edge.
The weights on the graph edges will be represented as the entries of the
adjacency matrix. We can understand it with the help of an example.
Consider the below graph and its adjacency matrix representation.
In the representation, we can see that the weight associated with the edges is
represented as the entries in the adjacency matrix.
In the above image, we can see that the adjacency matrix representation of the
weighted directed graph is different from other representations.
It is because, in this representation, the non-zero values are replaced by the
actual weight assigned to the edges.
Adjacency matrix is easier to implement and follow. An adjacency matrix can be
used when the graph is dense and a number of edges are large.
Though, it is advantageous to use an adjacency matrix, but it consumes more
space. Even if the graph is sparse, the matrix still consumes the same space.

Graph Traversal - BFS


Graph traversal is a technique used for searching a vertex in a graph.
The graph traversal is also used to decide the order of vertices is visited in the
search process.
A graph traversal finds the edges to be used in the search process without
creating loops.
That means using graph traversal we visit all the vertices of the graph without
getting into looping path.
There are two graph traversal techniques and they are as follows...

1. BFS (Breadth First Search)

2. DFS (Depth First Search)

BFS (Breadth First Search)


BFS traversal of a graph produces a spanning tree as final result. Spanning

Tree is a graph without loops.

We use Queue data structure with maximum size of total number of vertices

in the graph to implement BFS traversal.

We use the following steps to implement BFS traversal...

o Step 1 - Define a Queue of size total number of vertices in the graph.

o Step 2 - Select any vertex as starting point for traversal. Visit that vertex

and insert it into the Queue.

o Step 3 - Visit all the non-visited adjacent vertices of the vertex which is at

front of the Queue and insert them into the Queue.

o Step 4 - When there is no new vertex to be visited from the vertex which is

at front of the Queue then delete that vertex.

o Step 5 - Repeat steps 3 and 4 until queue becomes empty.

o Step 6 - When queue becomes empty, then produce final spanning tree

by removing unused edges from the graph

Example
DFS (Depth First Search)
DFS traversal of a graph produces a spanning tree as final result. Spanning Tree is a

graph without loops. We use Stack data structure with maximum size of total number

of vertices in the graph to implement DFS traversal.

We use the following steps to implement DFS traversal...

Step 1 - Define a Stack of size total number of vertices in the graph.

Step 2 - Select any vertex as starting point for traversal. Visit that vertex and push it

on to the Stack.

Step 3 - Visit any one of the non-visited adjacent vertices of a vertex which is at the

top of stack and push it on to the stack.

Step 4 - Repeat step 3 until there is no new vertex to be visited from the vertex which

is at the top of the stack.

Step 5 - When there is no new vertex to visit then use back tracking and pop one

vertex from the stack.

Step 6 - Repeat steps 3, 4 and 5 until stack becomes Empty.

Step 7 - When stack becomes Empty, then produce final spanning tree by removing

unused edges from the graph

Example
Connected Component

Definition
A connected component or simply component of an undirected graph is a subgraph in
which each pair of nodes is connected with each other via a path.
Let’s try to simplify it further, though. A set of nodes forms a connected component in an
undirected graph if any node from the set of nodes can reach any other node by
traversing edges. The main point here is reachability.
In connected components, all the nodes are always reachable from each other.

One Connected Component


In this example, the given undirected graph has one connected component:

Let’s name this graphG1(V,E) . Here V = {V1,V2,V3,V4,V5,V6} denotes the vertex set
and E = {E1,E2,E3,E4,E5,E6,E7} denotes the edge set of G1. The graph G1 has one
connected component, let’s name it C1, which contains all the vertices of G1. Now let’s
check whether the set C1 holds to the definition or not.
According to the definition, the vertices in the set C1 should reach one another via a
path. We’re choosing two random vertices V1 and V6:

V6 is reachable to V1 via:
E4 E7 or E3 E5 E7 0r E1 E2 E6 E7
V1 is reachable to V6 via:
E7 E4 or E7 E5 E3 or E7 E6 E2 E1
The vertices V1 and V6 satisfied the definition, and we could do the same with other
vertex pairs in C1 as well.
More Than One Connected Component
In this example, the undirected graph has three connected components:

Let’s name this graph as G2(V,E) ,


where
V = {V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12}, and
E = {E1,E2,E3,E4,E5,E6,E7,E8,E8,E10,E11}
The graph g2 has 3 connected components: , C1 = {V1,V2,V3,V4,V5,V6}, C2 =
{V7,V8,V9} and C3 = {V10,V11,V12}.
Now, let’s see whether connected components C1,C2 , and C3 satisfy the definition or
not. We’ll randomly pick a pair from each C1, C2, and C3 set.
From the set C1, let’s pick the vertices V4 and V6.

V6 is reachable to V4 via:
E2 E6 E7 or E1 E4 E7 or E1 E3 E5 E7
V4 is reachable to V6 via:
E7 E6 E2 or E7 E4 E1 or E7 E5 E3 E1

Now let’s pick the vertices V8 and V9 from the set C2.

V9 is reachable to V8: E9 E8
V8 is reachable to V9: E8 E9
Finally, let’s pick the vertices V11 and V12 from the set C3.
V11 is reachable to V12: E11 E10
V12 is reachable to V11: E10 E11
So from these simple demonstrations, it is clear that C1,C2 ,C3and follow the
connected component definition.

Spanning tree
Now, we will discuss the spanning tree and the minimum spanning tree. But before
moving directly towards the spanning tree, let's first see a brief description of the graph
and its types.

Graph
A graph can be defined as a group of vertices and edges to connect these vertices. The
types of graphs are given as follows -
o Undirected graph: An undirected graph is a graph in which all the edges do not
point to any particular direction, i.e., they are not unidirectional; they are
bidirectional. It can also be defined as a graph with a set of V vertices and a set
of E edges, each edge connecting two different vertices.
o Connected graph: A connected graph is a graph in which a path always exists
from a vertex to any other vertex. A graph is connected if we can reach any
vertex from any other vertex by following edges in either direction.
o Directed graph: Directed graphs are also known as digraphs. A graph is a
directed graph (or digraph) if all the edges present between any vertices or
nodes of the graph are directed or have a defined direction.
Now, let's move towards the topic spanning tree.

What is a spanning tree?


A spanning tree can be defined as the subgraph of an undirected connected graph. It
includes all the vertices along with the least possible number of edges. If any vertex is
missed, it is not a spanning tree. A spanning tree is a subset of the graph that does not
have cycles, and it also cannot be disconnected.
A spanning tree consists of (n-1) edges, where 'n' is the number of vertices (or nodes).
Edges of the spanning tree may or may not have weights assigned to them. All the
possible spanning trees created from the given graph G would have the same number
of vertices, but the number of edges in the spanning tree would be equal to the number
of vertices in the given graph minus 1.
A complete undirected graph can have nn-2 number of spanning trees where n is the
number of vertices in the graph. Suppose, if n = 5, the number of maximum possible
spanning trees would be 55-2 = 125.
Applications of the spanning tree
Basically, a spanning tree is used to find a minimum path to connect all nodes of the
graph. Some of the common applications of the spanning tree are listed as follows -

o Cluster Analysis

o Civil network planning

o Computer network routing protocol


Now, let's understand the spanning tree with the help of an example.

Example of spanning tree


Suppose the graph be -

As discussed above, a spanning tree contains the same number of vertices as the
graph, the number of vertices in the above graph is 5; therefore, the spanning tree will
contain 5 vertices. The edges in the spanning tree will be equal to the number of
vertices in the graph minus 1. So, there will be 4 edges in the spanning tree.
Some of the possible spanning trees that will be created from the above graph are given
as follows -
Properties of spanning-tree
Some of the properties of the spanning tree are given as follows -

o There can be more than one spanning tree of a connected graph G.

o A spanning tree does not have any cycles or loop.

o A spanning tree is minimally connected, so removing one edge from the tree
will make the graph disconnected.

o A spanning tree is maximally acyclic, so adding one edge to the tree will create
a loop.

o There can be a maximum nn-2 number of spanning trees that can be created
from a complete graph.

o A spanning tree has n-1 edges, where 'n' is the number of nodes.

o If the graph is a complete graph, then the spanning tree can be constructed by
removing maximum (e-n+1) edges, where 'e' is the number of edges and 'n' is
the number of vertices.
So, a spanning tree is a subset of connected graph G, and there is no spanning tree of
a disconnected graph.

Minimum Cost Spanning tree


A minimum spanning tree can be defined as the spanning tree in which the sum of the
weights of the edge is minimum. The weight of the spanning tree is the sum of the
weights given to the edges of the spanning tree. In the real world, this weight can be
considered as the distance, traffic load, congestion, or any random value.

Example of minimum spanning tree


Let's understand the minimum spanning tree with the help of an example.
The sum of the edges of the above graph is 16. Now, some of the possible spanning
trees created from the above graph are -

So, the minimum spanning tree that is selected from the above spanning trees for the
given weighted graph is -

Applications of minimum spanning tree


The applications of the minimum spanning tree are given as follows -

o Minimum spanning tree can be used to design water-supply networks,


telecommunication networks, and electrical grids.

o It can be used to find paths in the map.

Algorithms for Minimum spanning tree


A minimum spanning tree can be found from a weighted graph by using the algorithms
given below -

o Prim's Algorithm

o Kruskal's Algorithm
Let's see a brief description of both of the algorithms listed above.
Prim's algorithm - It is a greedy algorithm that starts with an empty spanning tree. It is
used to find the minimum spanning tree from the graph. This algorithm finds the subset
of edges that includes every vertex of the graph such that the sum of the weights of the
edges can be minimized.
To learn more about the prim's algorithm, you can click the below link -
https://www.javatpoint.com/prim-algorithm
Kruskal's algorithm - This algorithm is also used to find the minimum spanning tree for
a connected weighted graph. Kruskal's algorithm also follows greedy approach, which
finds an optimum solution at every stage instead of focusing on a global optimum.
To learn more about the prim's algorithm, you can click the below link -
https://www.javatpoint.com/kruskal-algorithm
So, that's all about the article. Hope the article will be helpful and informative to you.
Here, we have discussed spanning tree and minimum spanning tree along with their
properties, examples, and applications.

File Structures:
Physical Storage Media
File Organization
The File is a collection of records. Using the primary key, we can access the
records. The type and frequency of access can be determined by the type of file
organization which was used for a given set of records.

File organization is a logical relationship among various records. This method


defines how file records are mapped onto disk blocks.

File organization is used to describe the way in which the records are stored in
terms of blocks, and the blocks are placed on the storage medium.

The first approach to map the database to the file is to use the several files and
store only one fixed length record in any given file. An alternative approach is to
structure our files so that we can contain multiple lengths for records.
Files of fixed length records are easier to implement than the files of variable
length records.

Objective of file organization


It contains an optimal selection of records, i.e., records can be selected as fast as
possible.

To perform insert, delete or update transaction on the records should be quick


and easy.

The duplicate records cannot be induced as a result of insert, update or delete.

For the minimal cost of storage, records should be stored efficiently.

Types of file organization:


File organization contains various methods. These particular methods have pros and
cons on the basis of access or selection. In the file organization, the programmer
decides the best-suited file organization method according to his requirement.
Types of file organization are as follows:

Organization of records into Blocks


There are different ways of storing data in the database. Storing data in files is one of
them. A user can store the data in files in an organized manner. These files are
organized logically as a sequence of records and reside permanently on disks. Each file
is divided into fixed-length storage units known as Blocks. These blocks are the units of
storage allocation as well as data transfer. Although the default block size in the
database is 4 to 8 kilobytes, many databases allow specifying the size at the time of
creating the database instance.
Usually, the record size is smaller than the block size. But, for large data items such as
images, the size can vary. For accessing the data quickly, it is required that one
complete record should reside in one block only. It should not be partially divided
between one or two blocks. In RDBMS, the size of tuples varies in different relations.
Thus, we need to structure our files in multiple lengths for implementing the records. In
file organization, there are two possible ways of representing the records:
o Fixed-length records
o Variable-length records
Let's discuss this in detail.

Fixed-Length Records
Fixed-length records means setting a length and storing the records into the file. If the
record size exceeds the fixed size, it gets divided into more than one block. Due to the
fixed size there occurs following two problems:
1. Partially storing subparts of the record in more than one block requires access
to all the blocks containing the subparts to read or write in it.
2. It is difficult to delete a record in such a file organization. It is because if the size
of the existing record is smaller than the block size, then another record or a part
fills up the block.
However, including a certain number of bytes is the solution to the above problems. It is
known as File Header. The allocated file header carries a variety of information about
the file, such as the address of the first record. The address of the second record gets
stored in the first record and so on. This process is similar to pointers. The method of
insertion and deletion is easy in fixed-length records because the space left or freed by
the deleted record is exactly similar to the space required to insert the new records. But
this process fails for storing the records of variable lengths.

Variable-Length Records
Variable-length records are the records that vary in size. It requires the creation of
multiple blocks of multiple sizes to store them. These variable-length records are kept in
the following ways in the database system:
1. Storage of multiple record types in a file.
2. It is kept as Record types that enable repeating fields like multisets or arrays.
3. It is kept as Record types that enable variable lengths either for one field or
more.
In variable-length records, there exist the following two problems:
1. Defining the way of representing a single record so as to extract the individual
attributes easily.
2. Defining the way of storing variable-length records within a block so as to
extract that record in a block easily.
Thus, the representation of a variable-length record can be divided into two parts:
1. An initial part of the record with fixed-length attributes such as numeric values,
dates, fixed-length character attributes for storing their value.
2. The data for variable-length attributes such as varchar type is represented in the
initial part of the record by (offset, length) pair. The offset refers to the place
where that record begins, and length refers to the length of the variable-size
attribute. Thus, the initial part stores fixed-size information about each attribute,
i.e., whether it is the fixed-length or variable-length attribute.

Sequential File Organization


This method is the easiest method for file organization. In this method, files are stored
sequentially. This method can be implemented in two ways:

1. Pile File Method:


o It is a quite simple method. In this method, we store the record in a sequence,
i.e., one after another. Here, the record will be inserted in the order in which they
are inserted into tables.
o In case of updating or deleting of any record, the record will be searched in the
memory blocks. When it is found, then it will be marked for deleting, and the new
record is inserted.

Insertion of the new record:


Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence.
Hence, records are nothing but a row in the table. Suppose we want to insert a new
record R2 in the sequence, then it will be placed at the end of the file. Here, records are
nothing but a row in any table.
2. Sorted File Method:
o In this method, the new record is always inserted at the file's end, and then it will
sort the sequence in ascending or descending order. Sorting of records is based
on any primary key or any other key.

o In the case of modification of any record, it will update the record and then sort
the file, and lastly, the updated record is placed in the right place.

Insertion of the new record:


Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto
R6 and R7. Suppose a new record R2 has to be inserted in the sequence, then it will be
inserted at the end of the file, and then it will sort the sequence.
Pros of sequential file organization
o It contains a fast and efficient method for the huge amount of data.

o In this method, files can be easily stored in cheaper storage mechanism like
magnetic tapes.

o It is simple in design. It requires no much effort to store the data.

o This method is used when most of the records have to be accessed like grade
calculation of a student, generating the salary slip, etc.

o This method is used for report generation or statistical calculations.

Cons of sequential file organization


o It will waste time as we cannot jump on a particular record that is required but we
have to move sequentially which takes our time.

o Sorted file method takes more time and space for sorting the records.

Indexing and Hashing


Indexing
We know that data is stored in the form of records. Every record has a key field, which
helps it to be recognized uniquely.

Indexing is a data structure technique to efficiently retrieve records from the database
files based on some attributes on which the indexing has been done. Indexing in
database systems is similar to what we see in books.

Indexing is defined based on its indexing attributes. Indexing can be of the following
types
Primary Index Primary index is defined on an ordered data file. The data file is
ordered on a key field. The key field is generally the primary key of the relation.
Secondary Index Secondary index may be generated from a field which is a
candidate key and has a unique value in every record, or a non-key with duplicate
values.
Clustering Index Clustering index is defined on an ordered data file. The data file
is ordered on a non-key field.
Ordered Indexing is of two types

Dense Index
Sparse Index
Dense Index
In dense index, there is an index record for every search key value in the database.
This makes searching faster but requires more space to store index records itself. Index
records contain search key value and a pointer to the actual record on the disk.

Sparse Index
In sparse index, index records are not created for every search key. An index record
here contains a search key and an actual pointer to the data on the disk. To search a
record, we first proceed by index record and reach at the actual location of the data. If
the data we are looking for is not where we directly reach by following the index, then
the system starts sequential search until the desired data is found.
Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored
on the disk along with the actual database files. As the size of the database grows, so
does the size of the indices. There is an immense need to keep the index records in the
main memory so as to speed up the search operations. If single-level index is used,
then a large size index cannot be kept in memory which leads to multiple disk accesses.

Multi-level Index helps in breaking down the index into several smaller indices in order
to make the outermost level so small that it can be saved in a single disk block, which
can easily be accommodated anywhere in the main memory.

Hashing
In a huge database structure, it is very inefficient to search all the index values and
reach the desired data. Hashing technique is used to calculate the direct location of a
data record on the disk without using index structure.
In this technique, data is stored at the data blocks whose address is generated by using
the hashing function. The memory location where these records are stored is known as
data bucket or data blocks.
In this, a hash function can choose any of the column value to generate the address.
Most of the time, the hash function uses the primary key to generate the address of the
data block. A hash function is a simple mathematical function to any complex
mathematical function. We can even consider the primary key itself as the address of
the data block. That means each row whose address will be the same as a primary key
stored in the data block.

The above diagram shows data block addresses same as primary key value. This hash
function can also be a simple mathematical function like exponential, mod, cos, sin, etc.
Suppose we have mod (5) hash function to determine the address of the data block. In
this case, it applies mod (5) hash function on the primary keys and generates 3, 3, 1, 4
and 2 respectively, and records are stored in those data block addresses.
Primary indices & Secondary indices
Primary Index
o If the index is created on the basis of the primary key of the table, then it is known
as primary indexing. These primary keys are unique to each record and contain
1:1 relation between the records.

o As primary keys are stored in sorted order, the performance of the searching
operation is quite efficient.

o The primary index can be classified into two types: Dense index and Sparse
index.
Secondary Index
o In the sparse indexing, as the size of the table grows, the size of mapping also
grows.
o These mappings are usually kept in the primary memory so that address fetch
should be faster.
o Then the secondary memory searches the actual data based on the address got
from mapping.
o If the mapping size grows then fetching the address itself becomes slower. In this
case, the sparse index will not be efficient.
o To overcome this problem, secondary indexing is introduced.
o In secondary indexing, to reduce the size of mapping, another level of indexing is
introduced.
o In this method, the huge range for the columns is selected initially so that the
mapping size of the first level becomes small.
o Then each range is further divided into smaller ranges.
o The mapping of the first level is stored in the primary memory, so that address
fetch is faster.
o The mapping of the second level and actual data are stored in the secondary
memory (hard disk).
For example:

o If you want to find the record of roll 111 in the diagram, then it will search the
highest entry which is smaller than or equal to 111 in the first level index. It will
get 100 at this level.

o Then in the second index level, again it does max (111) <= 111 and gets 110.
Now using the address 110, it goes to the data block and starts searching each
record till it gets 111.

o This is how a search is performed in this method. Inserting, updating or deleting


is also done in the same manner.

B+ Tree Index Files


B+ Tree
o The B+ tree is a balanced binary search tree. It follows a multi-level index format.

o In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all
leaf nodes remain at the same height.

o In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can
support random access as well as sequential access.

Structure of B+ Tree
o In the B+ tree, every leaf node is at equal distance from the root node. The B+
tree is of the order n where n is fixed for every B+ tree.

o It contains an internal node and leaf node.

Internal node
o An internal node of the B+ tree can contain at least n/2 record pointers except the
root node.

o At most, an internal node of the tree contains n pointers.

Leaf node
o The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key
values.

o At most, a leaf node contains n record pointer and n key values.

o Every leaf node of the B+ tree contains one block pointer P to point to next leaf
node.

Searching a record in B+ Tree


Suppose we have to search 55 in the below B+ tree structure. First, we will fetch for the
intermediary node which will direct to the leaf node that can contain a record for 55.
So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at
the end, we will be redirected to the third leaf node. Here DBMS will perform a
sequential search to find 55.

B+ Tree Insertion
Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf
node after 55. It is a balanced tree, and a leaf node of this tree is already full, so we
cannot insert 60 there.
In this case, we have to split the leaf node, so that it can be inserted into tree without
affecting the fill factor, balance and order.
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We
will split the leaf node of the tree in the middle so that its balance is not altered. So we
can group (50, 55) and (60, 65, 70) into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from 50. It
should have 60 added to it, and then we can have pointers to a new leaf node.

This is how we can insert an entry when there is overflow. In a normal scenario, it is
very easy to find the node where it fits and then place it in that leaf node.

B+ Tree Deletion
Suppose we want to delete 60 from the above example. In this case, we have to
remove 60 from the intermediate node as well as from the 4th leaf node too. If we
remove it from the intermediate node, then the tree will not satisfy the rule of the B+
tree. So we need to modify it to have a balanced tree.
After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as
follows:
Indexing and Hashing Comparison
1. Indexing :

Indexing, as name suggests, is a technique or mechanism generally used to speed


up access of data.
Index is basically a type of data structure that is used to locate and access data in
database table quickly.
Indexes can easily be developed or created using one or more columns of
database table.

2. Hashing :

Hashing, as name suggests, is a technique or mechanism that uses hash


functions with search keys as parameters to generate address of data record.

It calculates direct location of data record on disk without using index structure.

A good hash functions only uses one-way hashing algorithm and hash cannot be
converted back into original key.

In simple words, it is a process of converting given key into another value known
as hash value or simply hash.

Difference between Indexing and Hashing in DBMS:

Indexing Hashing

It is a technique that allows to search


It is a technique that allows to quickly location of desired data on disk without using
retrieve records from database file. index structure.

It is generally used to optimize or


increase performance of database It is generally used to index and retrieve
simply by minimizing number of disk items in database as it is faster to search
accesses that are required when a that specific item using shorter hashed key
query is processed. rather than using its original value.

It offers faster search and retrieval of It is faster than searching arrays and lists,
data to users, helps to reduce table provides more flexible and reliable method of
space, makes it possible to quickly data retrieval rather than any other data
retrieve or fetch data, can be used for structure, can be used for comparing two
sorting, etc. files for quality, etc.

Its main purpose is to provide basis for Its main purpose is to use math problem to
both rapid random lookups and efficient organize data into easily searchable
access of ordered records. buckets.
Indexing Hashing

It is not considered best for large


databases and its good for small
databases. It is considered best for large databases.

Types of indexing includes ordered


indexing, primary indexing, secondary Types of hashing includes static and
indexing, clustered indexing. dynamic hashing.

It uses mathematical functions known as


It uses data reference to hold address hash function to calculate direct location of
of disk block. records on disk.

It is important because it ensures data


It is important because it protects file integrity of files and messages, takes
and documents of large size business variable length string or messages and
organizations, and optimize compresses and converts it into fixed length
performance of database. value.

===========XXXXXXXXXXXXXXXX===========

You might also like