Unit 3
Unit 3
NOTES
Subject Name: Data Structures & Analysis of Algorithms Subject Code: KCA-205 Semester: II
UNIT-3
Insertion Sort, Selection Sort, Bubble Sort, Heap Sort, Comparison of Sorting Algorithms, Sorting
in Linear Time: Counting Sort and Bucket Sort.
Terminology used with Graph, Data Structure for Graph Representations: Adjacency Matrices,
Adjacency List, Adjacency. Graph Traversal: Depth First Search and Breadth First Search,
Connected Component.
INSERTION-SORT(A)
Example
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
Bubble Sort
• Bubble sort is similar to selection sort in the sense that it repeatedly finds the
largest/smallest value in the unprocessed portion of the array and puts it back.
• However, finding the largest value is not done by selection this time.
• Compares adjacent items and exchanges them if they are out of order.
• In one pass, the largest value has been “bubbled” to its proper position.
• “Bubble” the largest value to the end using pair-wise comparisons and swapping
while(!isSorted)
isSorted = true;
{
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
S[i] = S[i+1];
S[i+1] = temp;
isSorted = false;
length--;
Selection Sort
• Selection sort is a sorting algorithm which works as follows:
– Repeat the steps above for remainder of the list (starting at the second position)
Algorithm
min = i;
min = j; }
}
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
T = numbers[min];
numbers[min] = numbers[i];
numbers[i] = T;
• We can ignore N as well since N2 grows more rapidly than N, making our algorithm O(N2)
Heap Sort
The (binary) heap data structure is an array object that we can view as a nearly complete binary
tree. Each node of the tree corresponds to an element of the array. The tree is completely filled
on all levels except possibly the lowest, which is filled from the left up to a point. The root of the
tree is A[1], and given the index i of a node, we can easily compute the indices of its parent, left
child, and right child:
Similarly, the RIGHT procedure can quickly compute 2i + 1 by shifting the binary representation
of i left by one bit position and then adding in a 1 as the low-order bit. The PARENT procedure
can compute [i/2] by shifting i right one bit position. Good implementations of heapsort often
implement these procedures as "macros" or "inline" procedures.
MAX-HEAPIFY lets the value at A[i] "float down" in the max-heap so that the subtree rooted at
index i obeys the max-heap property.
At each step, the largest of the elements A[i], A[LEFT(i)], and A[RIGHT(i)] is determined, and its
index is stored in largest. If A[i] is largest, then the subtree rooted at node i is already a max-heap
and the procedure terminates. Otherwise, one of the two children has the largest element, and
A[i ] is swapped with A[largest], which causes node i and its children to satisfy the max-heap
property. The node indexed by largest, however, now has the original value A[i], and thus the
subtree rooted at largest might violate the max-heap property. Consequently,
we call MAX-HEAPIFY recursively on that subtree.
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
Building a heap
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
4 1 3 2 16 9 10 14 8 7
We can derive a tighter bound by observing that the time for MAX-HEAPIFY to run at a node
varies with the height of the node in the tree, and the heights of most nodes are small. Our
tighter analysis relies on the properties that an n-element heap has height [log n] and at most
[n/2h+1] nodes of any height h.
The total cost of BUILD-MAX-HEAP as being bounded is T(n)=O(n)
The Heap Sort Algorithm
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
The HEAPSORT procedure takes time O(n log n), since the call to BUILD-MAX- HEAP takes time
O(n) and each of the n - 1 calls to MAX-HEAPIFY takes time O(log n).
Review of Sorting
So far we have seen a number of algorithms for sorting a list of numbers in ascending
order. Recall that an in-place sorting algorithm is one that uses no additional array storage
(however, we allow Quicksort to be called in-place even though they need a stack of size O(log
n) for keeping track of the recursion). A sorting algorithm is stable if duplicate elements remain
in the same relative position after sorting.
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
Slow Algorithms: Include Bubble Sort, Insertion Sort, and Selection Sort. These are all simple Θ
(n2) in-place sorting algorithms. Bubble Sort and Insertion Sort can be implemented as stable
algorithms, but Selection Sort cannot (without significant modifications).
Mergesort: Mergesort is a stable Θ(n log n) sorting algorithm. The downside is that Merge Sort
Is the only algorithm of the three that requires additional array storage, implying that it is not an
in-place algorithm.
Quicksort: Widely regarded as the fastest of the fast algorithms. This algorithm is O(n log n) in
The expected case, and O(n2) in the worst case. The probability that the algorithm takes
asymptotically longer (assuming that the pivot is chosen randomly) is extremely small for large
n.It is an(almost) in-place sorting algorithm but is not stable.
Heapsort: Heapsort is based on a nice data structure, called a heap, which is a fast priority queue.
Elements can be inserted into a heap in O(log n) time, and the largest item can be extracted in
O(log n) time. (It is also easy to set up a heap for extracting the smallest item.) If you only want
to extract the k largest values, a heap can allow you to do this is O(n + k log n) time. It is an in-
place algorithm, but it is not stable.
Decision Tree Argument: In order to prove lower bounds, we need an abstract way of modeling
“any possible” comparison-based sorting algorithm, we model such algorithms in terms of an
abstract model called a decision tree. In a comparison-based sorting algorithm only comparisons
between the keys are used to determine the action of the algorithm.
let us assume that n = 3, and let’s build a decision tree for Selection Sort. Recall that the algorithm
consists of two phases. The first finds the smallest element of the entire list, and swaps it with
the first element. The second finds the smaller of the remaining two items, and swaps it with the
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
second element. Here is the decision tree (in outline form). The first comparison is between a1
and a2. The possible results are:
a1 <= a2: Then a1 is the current minimum. Next we compare a1 with a3 whose results might
be either:
a1 <=a3: Then we know that a1 is the minimum overall, and the elements remain in their
original positions. Then we pass to phase 2 and compare a2 with a3. The possible
results are:
a2 <=a3: Final output is a1; a2; a3.
a2 > a3: These two are swapped and the final output is a1; a3; a2.
a1 > a3: Then we know that a3 is the minimum is the overall minimum, and it is swapped
with a1. The we pass to phase 2 and compare a2 with a1 (which is now in the third
position of the array) yielding either:
a2 <=a1: Final output is a3; a2; a1.
a2 > a1: These two are swapped and the final output is a3; a1; a2.
a1 > a2: Then a2 is the current minimum. Next we compare a2 with a3 whose results might be
either:
a2 <=a3: Then we know that a2 is the minimum overall. We swap a2 with a1, and then pass to
phase 2, and compare the remaining items a1 and a3. The possible results are:
a1 <=a3: Final output is a2; a1; a3.
a1 > a3: These two are swapped and the final output is a2; a3; a1.
a2 > a3: Then we know that a3 is the minimum is the overall minimum, and it is swapped
with a1. We pass to phase 2 and compare a2 with a1 (which is now in the third position
of the array) yielding either:
a2<= a1: Final output is a3; a2; a1.
a2 > a1: These two are swapped and the final output is a3; a1; a2.
The final decision tree is shown below.
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
Using Decision Trees for Analyzing Sorting: Consider any sorting algorithm. Let T(n) be
the maximum number of comparisons that this algorithm makes on any input of size n. Notice
that the running time of the algorithm must be at least as large as T(n), since we are not counting
data movement or other computations at all. The algorithm defines a decision tree. Observe that
the height of the decision tree is exactly equal to T(n), because any path from the root to a leaf
corresponds to a sequence of comparisons made by the algorithm.
The length of the longest path from the root of a decision tree to any of its reachable leaves
represents the worst-case number of comparisons that the corresponding sorting algorithm
performs. Consequently, the worst-case number of comparisons for a given comparison sort
algorithm equals the height of its decision tree. A lower bound on the heights of all decision trees
in which each permutation appears as a reachable leaf is therefore a lower bound on the running
time of any comparison sort algorithm. The following theorem establishes such a lower bound.
Theorem: Any comparison-based sorting algorithm has worst-case running time (n log n).
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
From the preceding discussion, it suffices to determine the height of a decision tree in which each
permutation appears as a reachable leaf. Consider a decision tree of height h with l reachable
leaves corresponding to a comparison sort on n elements. Because each of the n! permutations
of the input appears as some leaf, we have n! ≤ l. Since a binary tree of height h has no more
than 2h leaves, we have
n! ≤ l ≤ 2h ,
which, by taking logarithms, implies
h ≥ lg(n!) (since the lg function is monotonically increasing)
= Ω(n lg n) (by Stirling formula)
Counting sort
Counting sort assumes which each one of the n input elements is an integer within the range 1
to k, for some integer k. Whenever k = O(n), the sort runs within linear time.
Fundamental concept: determine, for every input element x, the number of elements less than
x. This enables one to determine x’s position in the sorted array.
The for loop of lines 1–2 takes time ϴ(k), the for loop of lines 3–4 takes time ϴ(n), the for loop
of lines 6–7 takes time ϴ(k), and the for loop of lines 9–11 takes time ϴ(n). Thus, the overall
time is ϴ(k+n). In practice, we usually use counting sort when we have k = O(n), in which case
the running time is ϴ(n). Counting sort beats the lower bound of Ω(n lg n) proved in previous
lecture because it is not a comparison sort.
Fig : The operation of COUNTING_SORT on an input array A[1..8], where each element of A is
positive integer no larger than k = 6.
Example-2
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
The Counting sort is a stable sort i.e., multiple keys with the same value are placed in the
sorted array in the same order that they appear in the input array.
9 for j ← 1 to n
then the stability no longer holds. Notice that the correctness of argument in the CLR does not
depend on the order in which array A[1 . . n] is processed. The algorithm is correct no matter
what order is used. In particular, the modified algorithm still places the elements with value k in
position c[k - 1] + 1 through c[k], but in reverse order of their appearance in A[1 . . n]
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
Bucket Sort
The idea of bucket sort is to divide the interval [0, 1) into n equal-sized subintervals, or buckets,
and then distribute the n input numbers into the buckets. Since the inputs are uniformly
distributed over [0, 1), we don’t expect many numbers to fall into each bucket. To produce the
output, we simply sort the numbers in each bucket and then go through the buckets in order,
listing the elements in each. Our code for bucket sort assumes that the input is an n-element
array A and that each element A[i ] in the array satisfies 0 ≤ A[i ] < 1. The code requires an auxiliary
array B[0 . . n − 1] of linked lists (buckets) and assumes that there is a mechanism for maintaining
such lists.
Example -1
The operation of BUCKET-SORT. (a) The input array A[1 . . 10]. (b) The array B[0 . . 9] of sorted
lists (buckets) after line 5 of the algorithm. Bucket i holds values in the half-open interval [i/10,
(i + 1)/10). The sorted output consists of a concatenation in order of the lists B[0], B[1], . . . , B[9].
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
Every relationship is an edge from one node to another. Whether you post a photo, join a group,
like a page etc., a new edge is created for that relationship.
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
All of facebook is then, a collection of these nodes and edges. This is because facebook uses a
graph data structure to store its data.
A collection of vertices V
In the graph,
V = {0, 1, 2, 3}
G = {V, E}
A Graph G(V, E) is a data structure that is defined by a set of Vertices (V) and a set of Edges
(E).
Vertex (v) or node is an indivisible point, represented by the lettered components on the
example graph below
Space Complexity is shown as Θ(G) and represents how much memory is needed to hold a
given graph
Adjacency Complexity shown by O(G) is how long it takes to find all the adjacent vertices to
a give vertex v.
When you’re representing a graph structure G, the vertices, V, are very straight forward to store
since they are a set and can be represented directly as such. For instance, for a graph of vertices:
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
V = {0,1,2,3,4}
Graph Representation
Things get a little more interesting when you start storing the Edges, E. Here there are two common
structures that you can use represent and navigate the edge set:
Adjacency Matrix
Adjacency List
We’re going to take a look at a simple graph and step through each representation of it. We will
assess each one according to its Space Complexity and Adjacency Complexity.
Graphs are commonly represented in two ways:
1. Adjacency Matrix
An adjacency matrix is 2D array of V x V vertices. Each row and column represent a vertex.
If the value of any element a[i][j] is 1, it represents that there is an edge connecting vertex i and
vertex j.
The adjacency matrix for the graph we created above is
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
Since it is an undirected graph, for edge (0,2), we also need to mark edge (2,0); making the
adjacency matrix symmetric about the diagonal.
Edge lookup(checking if an edge exists between vertex A and vertex B) is extremely fast in
adjacency matrix representation but we have to reserve space for every possible link between all
vertices(V x V), so it requires more space.
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
Pros
Highly interpretable. It's symmetrical unless its a directed graph and you can neatly store edge
values in each matrix entry.
Decent adjacency complexity of O(G) = |V|. In order to find all vertices adjacent to v, we
need to scan their whole row with the adjacency matrix.
Cons
Terrible space complexity of Θ(G) = |V|². Here we are storing every possible vertex
permutation of length 2, including each vertex paired with itself. This is more than double the
maximum vertex set of possible combinations, (|V| choose 2)
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
2. Adjacency List
An adjacency list is efficient in terms of storage because we only need to store the values for the
edges. For a graph with millions of vertices, this can mean a lot of saved space.
Pros:
Space complexity of Θ(G) = |V|+ 2|E| We have a list for every vertex and in total these lists
will store 2|E| elements since each edge will appear in both vertex lists.
Great adjacency complexity. For a given vertex v, O(G) is equal to d(v) the Degree of v.
When looking for all adjacent neighbors this is in fact the best possible value here.
Optimal Representation
Choosing the optimal data structure to represent a given graph G is actually dependent on the
density of edges within G. This can be roughly summarized as follows.
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
If |E|≈|V| ie there are about as many edges as there are vertices then G is
considered Sparse and an adjacency list is preferred.
Graph Operations
Depth first search (DFS) algorithm starts with the initial node of the graph G, and then goes to
deeper and deeper until we find the goal node or the node which has no children. The algorithm,
then backtracks from the dead end towards the most recent node that is yet to be completely
unexplored.
The data structure which is being used in DFS is stack. The process is similar to BFS algorithm.
In DFS, the edges that leads to an unvisited node are called discovery edges while the edges that
leads to an already visited node are called block edges.
Algorithm
1. STACK : H
POP the top element of the stack i.e. H, print it and push all the neighbours of H onto the stack
that are is ready state.
1. Print H
2. STACK : A
Pop the top element of the stack i.e. A, print it and push all the neighbours of A onto the stack that
are in ready state.
1. Print A
2. Stack : B, D
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
Pop the top element of the stack i.e. D, print it and push all the neighbours of D onto the stack that
are in ready state.
1. Print D
2. Stack : B, F
Pop the top element of the stack i.e. F, print it and push all the neighbours of F onto the stack that
are in ready state.
1. Print F
2. Stack : B
Pop the top of the stack i.e. B and push all the neighbours
1. Print B
2. Stack : C
Pop the top of the stack i.e. C and push all the neighbours.
1. Print C
2. Stack : E, G
Pop the top of the stack i.e. G and push all its neighbours.
1. Print G
2. Stack : E
Pop the top of the stack i.e. E and push all its neighbours.
1. Print E
2. Stack :
Hence, the stack now becomes empty and all the nodes of the graph have been traversed.
1. H → A → D → F → B → C → G → E
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
Breadth first search is a graph traversal algorithm that starts traversing the graph from root node
and explores all the neighbouring nodes. Then, it selects the nearest node and explore all the
unexplored nodes. The algorithm follows the same process for each of the nearest node until it
finds the goal.
The algorithm of breadth first search is given below. The algorithm starts with examining the node
A and all of its neighbours. In the next step, the neighbours of the nearest node of A are explored
and process continues in the further steps. The algorithm explores all neighbours of all the nodes
and ensures that each node is visited exactly once and no node is visited twice.
Algorithm
Example
Consider the graph G shown in the following image, calculate the minimum path p from node A
to node E. Given that each edge has a length of 1.
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
Solution:
Minimum Path P can be found by applying breadth first search algorithm that will begin at node
A and will end at E. the algorithm uses two queues,
namely QUEUE1 and QUEUE2. QUEUE1 holds all the nodes that are to be processed
while QUEUE2 holds all the nodes that are processed and deleted from QUEUE1.
1. QUEUE1 = {A}
2. QUEUE2 = {NULL}
2. Delete the Node A from QUEUE1 and insert all its neighbours. Insert Node A into QUEUE2
1. QUEUE1 = {B, D}
2. QUEUE2 = {A}
3. Delete the node B from QUEUE1 and insert all its neighbours. Insert node B into QUEUE2.
1. QUEUE1 = {D, C, F}
2. QUEUE2 = {A, B}
4. Delete the node D from QUEUE1 and insert all its neighbours. Since F is the only neighbour of
it which has been inserted, we will not insert it again. Insert node D into QUEUE2.
1. QUEUE1 = {C, F}
2. QUEUE2 = { A, B, D}
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
5. Delete the node C from QUEUE1 and insert all its neighbours. Add node C to QUEUE2.
1. QUEUE1 = {F, E, G}
2. QUEUE2 = {A, B, D, C}
6. Remove F from QUEUE1 and add all its neighbours. Since all of its neighbours has already
been added, we will not add them again. Add node F to QUEUE2.
1. QUEUE1 = {E, G}
2. QUEUE2 = {A, B, D, C, F}
7. Remove E from QUEUE1, all of E's neighbours has already been added to QUEUE1 therefore
we will not add them again. All the nodes are
1. QUEUE1 = {G}
2. QUEUE2 = {A, B, D, C, F, E}
8. Remove G from QUEUE1, all of G's neighbours has already been added to QUEUE1 therefore
we will not add them again. All the nodes are visited
3. QUEUE1 = {}
4. QUEUE2 = {A, B, D, C, F, E,G}
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)
KIET Group of Institutions, Ghaziabad
Department of Computer Applications
(An ISO – 9001: 2015 Certified & ‘A’ Grade accredited Institution by NAAC)