Data Structures- Unit-I (1)
Data Structures- Unit-I (1)
Topics:
1. Definition of Data Structures
2. Classification of Data Structures
3. Abstract Data Type(ADT)
4. Analysis of Algorithms
5. Recursion - Examples
6. Analysis of Recursive Algorithms
7. Sorting:
- Quick Sort
- Merge Sort
- Selection Sort
- Radix Sort
8. Comparison of Sorting Algorithms
Data structure:
In the modern world, data and its information is an essential part, where data is just
collection of facts or set of values that are in particular format and the information is the
processed data.
If the data is not organized effectively, it is very difficult to perform any task on large amount
of data. If it is organized effectively then any operation can be performed easily on that data.
If the data is stored in a well-organized way on storage media and in computer's memory
then it can be accessed quickly for processing that further reduces the latency and the user
is provided a fast response.
A data structure is a particular way of organizing a large amount of data more efficiently in a
computer so that any operation on that data becomes easy.
In other words, Data structure is a way of collecting and organizing data in such a way that
we can perform operations on these data in an effective way.
Data structures is about rendering data elements in terms of some relationship, for better
performance, organization and storage.
The logical or mathematical model for a particular organization of data is termed as a data
structure.
Data Structure is useful in representing the real world data and its operations in the
computer program.
Based on the organizing method of a data structure, data structures are divided into two
types:
Primitive data structures are those which are the predefined way of storing data by the
system. And the set of operations that can be performed on these data are also predefined.
Most of the programming languages have built-in support for the primitive data structures.
Non-primitive data structures are more complicated data structures and they are derived
from primitive data structures.
Non-primitive data structures are used to store large and connected data. Some example of
non-primitive data structures are: Linked List, Tree, Graph, Stack and Queue.
The non-primitive data structures are subcategorized into two ways: Linear data structures
and Non-linear data structures.
If a data structure is organizing the data in sequential order then that data structure is called
a linear data structure.
Some of the examples are Arrays, Linked Lists, Stacks and Queues.
If a data structure is organizing the data in random order or hierarchical order, not in
sequential order, then that data structure is called as non-linear data structure.
Introduction to Algorithms
● An Algorithm is a finite set of instructions or logic, written in order, to accomplish a
certain predefined task.
● An Algorithm is independent of the programming language. An Algorithm is the core
logic to solve a given problem.
● An Algorithm is expressed generally as flow chart or as an informal high level
description called as pseudocode
● Algorithm can be defined as “a sequence of steps to be performed for getting the
desired output for a given input.”
● Before attempting to write an algorithm, one should find out what the expected
inputs and outputs are for the given problem.
An algorithm is said to be efficient and fast, if it takes less time to execute and consumes
less memory space.
For any algorithm to execute it requires memory for the following purposes:
1. Memory required to store the program instructions. (Also called as Instruction space)
2. Memory required to store constants and variables. (Also called as Data space )
3. Memory that is to be dynamically allocated. Memory that is required for storing data
between functions. (Also called as Environment space)
An algorithm may require inputs, variables, and constants of different data types.
For calculating the space complexity, we need to know the amount of memory used by
variables of different data types, which generally varies for different operating systems.
The above code taken two inputs x and y of type int as formal parameters.
In the code, another local variable z of type int is used for storing the sum of x and y.
The int data type takes 2 bytes of memory, so the total space complexity is 3 (number of
variables) * 2 (Size of each variable) = 6 bytes.
The space requirement of this algorithm is fixed for any input given to the algorithm, hence
it is called as constant space complexity
Let us understand the calculation of space complexity with one more example:
In the above code, 2 * n bytes of memory (size of int data type is 2) is required by the array
a[ ] and 2 bytes of memory for each variable of x, n and i.
Hence the total space requirement for the above code would be (2 * n + 6).
The space complexity of the program is increasing linearly with the size of the array (input)
n then it is called as Linear Space Complexity.
Similarly, when the memory requirement of the algorithm increases quadratic to the given
input then it is called as a "Quadratic Space Complexity".
Similarly, when the memory requirement of the algorithm increases cubic to the given input
then it is called as a "Cubic Space Complexity" and so on.
The total time required by the algorithm to run till its completion depends on the number of
(machine) instructions the algorithm executes during its running time.
The actual time taken differs from machine to machine as it depends on several factors like
CPU speed, memory access speed, OS, processor etc.
So, we will take the number of operations executed in the algorithm as the time complexity.
Thus, the amount of time taken and the number of elementary operations performed by the
algorithm differ by at most a constant factor.
Examples:
1.
Algorithm sum(A,n): Time Taken by each operation
s=0 - 1 unit
i=0 - 1 unit
while(i<n): - n+1 units
i++ - n units
s=s+A[i] -n units
return S - 1 unit
2. Algorithm add(A,B,n):
For i in range(0,n): - n units
For j in range(0,n): -n*n units
c[i,j]=A[i,j]+B[i,j] - n*n units
Total: n+2n2 i.e O(n2 )
3. Algorithm multiply(A,B,n):
For i in range(0,n): -n
For j in range(0,n): -n*n
C[i,j]=0 -n*n
For k in range(0,n): -n*n*n
C[i,j] = C[i,j] + A[i,k] + B[k,j] -n*n*n
Total : n+2n2+2n3 ~O(n3 )
5. i=n
while(i>=1):
i=i/2
Ex1:
if a > b:
return True
else:
return False
Prepared by Dr. Srilatha P
Ex2:
def get_first(data):
return data[0]
if __name__ == '__main__':
data = [1, 2, 9, 8, 3, 4, 7, 6, 5]
print(get_first(data))
Ex1:
for index in range(0, len(data), 3):
print(data[index])
Ex2:
def binary_search(data, value):
n = len(data)
left = 0
right = n - 1
while left <= right:
middle = (left + right) // 2
if value < data[middle]:
right = middle - 1
elif value > data[middle]:
left = middle + 1
else:
return middle
raise ValueError('Value is not in the list')
if __name__ == '__main__':
data = [1, 2, 3, 4, 5, 6, 7, 8, 9]
print(binary_search(data, 8))
Ex1:
for value in data:
print(value)
if __name__ == '__main__':
data = [1, 2, 9, 8, 3, 4, 7, 6, 5]
print(linear_search(data, 7))
For example: for each value in the data1 (O(n)) use the binary search (O(log n)) to search the
same value in data2.
for example:
for x in data:
for y in data:
print(x, y)
Asymptotic Notations
Asymptotic notations are a set of mathematical tools used in computer science to describe
and analyze the growth rate of functions. These notations allow us to simplify the
mathematical representation of the functions, making it easier to compare the relative
growth rates of different functions. Some of the most commonly used asymptotic notations
are big O, little o, big Theta, and little omega.
1. Big O notation (O(n)): This notation represents an upper bound on the growth rate of
a function. If f(n) = O(g(n)), it means that there exists a constant c > 0 and a constant
n0 such that for all n >= n0, f(n) <= c * g(n). This means that the function f grows at
most as fast as g, as n approaches infinity. For example, the function f(n) = n^2 is
O(n^2).
2. Little o notation (o(n)): This notation represents a strict upper bound on the growth
rate of a function. If f(n) = o(g(n)), it means that there exists a constant c > 0 and a
constant n0 such that for all n >= n0, f(n) < c * g(n). This means that the function f
grows strictly slower than g, as n approaches infinity. For example, the function f(n) =
n^2 is o(n^3).
3. Big Theta notation (Θ(n)): This notation represents an average case time complexity
of a function. If f(n) = Θ(g(n)), it means that there exist constants c1, c2 > 0 and a
constant n0 such that for all n >= n0, c1 * g(n) <= f(n) <= c2 * g(n). This means that
the function f grows at the same rate as g, as n approaches infinity. For example, the
function f(n) = n^2 is Θ(n^2).
4. Big Omega notation (Ω) : Big-Omega (Ω) notation gives a lower bound for a function
f(n) to within a constant factor. We write f(n) = Ω(g(n)), If there are positive constants
n0 and c such that, to the right of n0 the f(n) always lies on or above c*g(n). Ω(g(n)) =
{ f(n) : There exist positive constant c and n0 such that 0 ≤ c g(n) ≤ f(n), for all n ≤ n0}
5. Little omega notation (Ω(n)): This notation represents a lower bound on the growth
rate of a function. If f(n) = Ω(g(n)), it means that there exists a constant c > 0 and a
constant n0 such that for all n >= n0, f(n) >= c * g(n). This means that the function f
grows at least as fast as g, as n approaches infinity. For example, the function f(n) =
n^2 is Ω(n).
General Properties:
1. If f(n) is O(g(n)) then a*f(n) is also O(g(n)) ; where a is a constant.
Example:
f(n) = 2n²+5 is O(n²)
then 7*f(n) = 7(2n²+5)
= 14n²+35 is also O(n²)
3. Transitive Properties :
If f(n) is O(g(n)) and g(n) is O(h(n)) then f(n) = O(h(n)) .
Example: if f(n) = n , g(n) = n² and h(n)=n³
n is O(n²) and n² is O(n³) then n is O(n³)
Similarly this property satisfies for both Θ and Ω notation. We can say
If f(n) is Θ(g(n)) and g(n) is Θ(h(n)) then f(n) = Θ(h(n)) .
If f(n) is Ω (g(n)) and g(n) is Ω (h(n)) then f(n) = Ω (h(n))
4. Symmetric Properties :
If f(n) is Θ(g(n)) then g(n) is Θ(f(n)) .
Example: f(n) = n² and g(n) = n² then f(n) = Θ(n²) and g(n) = Θ(n²)
This property only satisfies for Θ notation.
Recursion:
In programming, recursion is a technique where a function calls itself with different input
parameters until a specific condition is met.
Here are some key concepts related to recursion :
Base case: The base case is the simplest version of the problem that can be solved without
further recursion. It serves as the stopping condition for the recursive algorithm. Without a
base case, the recursion will continue indefinitely and lead to a stack overflow error.
Tail recursion:
● Tail recursion is a technique used in recursive functions where the last operation
performed in the recursive function is the recursive call itself.
● In other words, the function call is made at the end of the function, and there is no
pending operation after the call.
● In a traditional recursive function, each recursive call adds a new stack frame to the
call stack, which can lead to stack overflow errors if the recursion depth is too deep.
● However, in a tail-recursive function, the compiler can optimize the code so that the
function does not create a new stack frame for each recursive call, but instead reuses
the existing stack frame.
● This optimization is possible because the function's final operation is the recursive
call, so there is no need to keep the previous stack frames in memory.
● By reusing the same stack frame, the function can avoid stack overflow errors and
improve its performance.
Here is the tail-recursive version of the function that computes the factorial of a number
Def factorial(int n) :
return factorial_tail(n, 1)
In the tail-recursive version, the function factorial_tail takes two arguments: n, which
represents the number to compute the factorial of, and acc, which represents the
accumulator that holds the intermediate result of the computation. The function starts with
an initial value of acc=1 and multiplies it by n in each recursive call. When n reaches 0, the
The initial call: The program starts by calling the recursive function with some initial input.
Base case: The recursive function checks if the input meets some base case condition, which
is a simple case that can be solved directly without recursion. The function returns a result if
the input meets the base case condition.
Recursive case: If the input does not meet the base case condition, the function breaks down
the problem into smaller subproblems, which are solved by calling the function recursively
with the subproblems as input.
Backtracking: Once the function has solved the subproblems, it combines their results to
solve the original problem and returns the result to the caller.
The Call Stack:The call stack is a data structure used by the computer to keep track of the
order in which function calls are made and their corresponding variables and parameters.
In recursive programs, the call stack plays an important role in keeping track of the state of
the program as it executes the recursive function calls.
When a recursive function is called, a new frame is added to the call stack to keep track of
the function's local variables and parameters.
The function then proceeds to call itself recursively with a smaller subproblem.
Each recursive call adds a new frame to the call stack, pushing the previous frames down.
As the function calls continue, the call stack grows taller and taller until a base case is
reached.
At this point, the recursive function returns a value, and the computer begins to unwind the
call stack, popping the frames off the stack one by one.
As each frame is popped off the stack, the function returns to the state it was in before the
recursive call, with its original variables and parameters.
def factorial(n):
result = factorial(5)
print(result)
The initial call: factorial(5) is called, and a new frame is added to the call stack.
Recursive call 1: factorial(4) is called, and a new frame is added to the call stack.
Recursive call 2: factorial(3) is called, and a new frame is added to the call stack.
Recursive call 3: factorial(2) is called, and a new frame is added to the call stack.
Recursive call 4: factorial(1) is called, and a new frame is added to the call stack.
Base case: factorial(1) returns 1, and its frame is popped off the call stack.
Backtracking 1: factorial(2) returns 2*1=2, and its frame is popped off the call stack.
Backtracking 2: factorial(3) returns 3*2=6, and its frame is popped off the call stack.
Backtracking 3: factorial(4) returns 4*6=24, and its frame is popped off the call stack.
Backtracking 4: factorial(5) returns 5*24=120, and its frame is popped off the call stack.
So the final result is 120, and the call stack is empty.
T(n): Denotes the no. of comparison incurred by an algorithm on an input size ‘n’.
Let T(n) represent the time it takes to perform a linear search on an array of size n.
If the element is found at the first position (i.e., the best-case scenario), it would take 1
comparison to find the element. Therefore, T(1) = 1.( Best Case)
In the worst-case scenario, you may have to search through the entire array. If the element is
not present, you will perform n comparisons. Therefore, T(n) = n.
In the average case, you can assume that the element you are searching for is equally likely
to be at any position in the array. So, on average, you will search half of the array, which
would be (n/2) comparisons.
T(n) = T(n-1) + 1
This recurrence relation states that the time it takes to perform a linear search on an array of
size n is equal to the time it takes to perform a linear search on an array of size (n-1) plus
one comparison.
The base cases for this recurrence relation are T(1) = 1 (best-case) and T(n) = n
(worst-case).
Solving this recurrence relation will give you the average-case time complexity of linear
search.
Let T(n) represent the number of comparisons or steps required to perform a binary search
on a sorted array of size n.
The binary search algorithm divides the array into two halves and checks the middle
element to determine whether the target value is in the left or right half of the array. This
leads to a recurrence relation:
T(n) = T(n/2) + 1
T(n) represents the total number of comparisons or steps required to perform a binary
search on an array of size n.
T(n/2) represents the number of comparisons required to perform a binary search on one
half of the array (either the left or right half).
1 represents the initial comparison to check the middle element.
The base case for this recurrence relation is T(1) = 1 because when you have only one
element, you've already found the target element (or determined that it's not present) with
a single comparison.
T(n) represents the total number of comparisons or steps required to perform a factorial
search on a list of size n.
T(n-1) represents the number of comparisons required to perform a factorial search on the
list of size n-1, as you eliminate some permutations.
1 represents the comparison needed to check the current permutation with the target
permutation.
The base case for this recurrence relation is T(1) = 1 because, when you have only one
permutation left in the list, you've already found the target permutation with a single
comparison.
sol:
Find T(n-1) and substitute in the equation. T(n-1) = T(n-2)+1 if substitute in the T(n) which
gives
T(n) = T(n-2) + 2
.
.
Repeat for k times which gives
T(n) = T(n-k) + k
Sol:
T(n) = n + T(n-1)
=n+(n-1)+T(n-2)[ substitute T(n-1)= n-1 + T(n-2)]
=n+(n-1)+(n-3)+T(n-3)[ substitute T(n-2)= n-2 + T(n-3)]
. . . . k times
Sol:
T(n) = T(n-1) + log(n)
=T(n-2) + log(n-1)+ log(n)[substitute T(n-1) = T(n-2) + log(n-1)]
=T(n-3) +log(n-2)+ log(n-1)+ log(n)[substitute T(n-2) = T(n-3) + log(n-2)]
…… K times
= T(n-k) + log(n-(k-1))+ log(n-(k-2))....................+ log(n)
= T(0)+log(1)+log(2)+..........+log(n){Assume n-k=0 => n=k}
= 1+ log(1*2*3…….(n-1)*n)
=1+ log(n!)
~O(nlog(n))
Sol:
T(n) = 2*T(n-1) + 1
=2*( 2*T(n-2) + 1)+1[ substitute T(n-1) = 2 * T(n-2) + 1]
= 22*T(n-2) + 2 + 1
……. K times
= 2k * T(n-k) + 2k-1 + 2k-2 + ……..+22 +2+1
= 2n * T(0)+1+2+ ……..+ 2k-1{Assume n-k=0 => n=k}
= 2n * 1 + 2k - 1 [ since 1+2+3……….2k = 2k+1 -1]
= 2n + 2n -1
= 2n+1 - 1
~O(2n)
5. T(n) =1 if n=1
= n * T(n-1) if n>1
Sol:
T(n) = n * T(n-1)
= (n-1) * n * T(n-2)
……. N-1 steps
= n * (n-1) * (n-2) * ……………*(n- (n-2)) * T(n- (n-1))
= n * (n-1) * (n-2) * ……………*2 * T(1)
=n * (n-1) * (n-2) * ……………*2 * 1
Let's see how the master theorem can also be applied to dividing functions.
1. T(n) = 8 * T(n / 2) + n2
Sol:
a=8, b= 2, f(n) = n2
T(n) = nlog28 u(n)
T(n) = n3 u(n)
2. T(n) = T(n/2) + c
Sol:
a=1, b=2, f(n) =c
T(n) = nlog21 u(n) = n0 u(n)
h(n) = c / nlog21 = c / n0 = c / 1 = c
If h(n) = (log2n) 0 * c [ multiply with term (log2n) 0 ] then u(n) = (log2n) 0+1 / 0+1 ~ O(log2n)
Sorting:
- Merge Sort
- Quick Sort
- Selection Sort
- Radix Sort
Internal Sort
Sort algorithms that use main memory exclusively during the sort are called internal sorting
algorithms. This kind of algorithm assumes high-speed random access to all memory.
External Sort
Sorting algorithms that use external memory, such as tape or disk, during the sort come
under this category.
The phrase "external sorting" refers to a group of sorting algorithms that can process
enormous volumes of data. When the data being sorted must live in the slower external
memory because it will not fit in the computer's primary memory (often RAM), external
sorting is necessary(drive).
A hybrid sort-merge approach is frequently used for external sorting. Data chunks small
enough to fit in the main memory are read, sorted, and written off to a temporary file during
the sorting process. The sorted sub-files are concatenated into a single, bigger file during
the merging step.
Example:
The external merge sort method separates the sorted chunks into smaller ones that can fit
in RAM before merging them. In order to accommodate each run into the main memory, the
file is first divided into runs of a manageable size. Then use the merge sort sorting method
to order each run in the main memory. Once the file is sorted, combine the resulting runs
into gradually larger runs.
The process of combining two or more sorted lists or files into a third sorted list or file is
called merging.
Merge sort is a divide and conquer algorithm, it divides input array in two halves, calls itself
for the two halves and then merges the two sorted halves.
The merge(arr, low, mid, high) is key process that assumes that arr[low..mid] and
arr[mid+1..high] are sorted and merges the two sorted sub-arrays into one.
The following diagram shows the complete merge sort process for an example array {38, 27,
43, 3, 9, 82, 10}.
We can see that the array is recursively divided in two halves till the size becomes 1. Once
the size becomes 1, the merge processes comes into action and starts merging arrays back
till the complete array is merged.
mergeSort(arr, 0, n-1)
print("\n\nSorted array is")
for i in range(n):
print("%d" % arr[i],end=" ")
Advantages:
1. Suitable for Large List
2. Merge Operation can be improved with linked list
3. Uses External Sorting
4. Stable: Keeps relative ordering of the elements
Disadvantages:
The recursive relation for mergesort( in all the cases) can be expressed as follows:
T(n) represents the time it takes to sort an array of size 'n' using mergesort.
The term "2 * T(n/2)" represents the time required to recursively sort the two halves of the
array (each of size n/2) using mergesort. Since mergesort operates on two halves
independently, you can consider the time taken to sort each half separately.
The term "O(n)" represents the time required to merge the two sorted halves of the array
back together. The merging step takes linear time because it involves comparing and
combining elements from the two halves while preserving their order.
QuickSort
Like merge sort, quick sort is a divide and conquer algorithm, where it picks an element as
pivot and partitions the given array around the picked pivot element.
There are many different versions of quick sort that pick pivot in different ways:
1. Always pick first element as pivot
2. Always pick last element as pivot
3. Pick a random element as pivot.
4. Pick median as pivot
The base case of the recursion is arrays of size zero or one, which are in order by definition,
so they never need to be sorted.
The quickSort() function is used to partition the list into two halves and recursively calls
the two halves.
The partition() is a key process, where given an array and an element x of array as pivot,
put x at its correct position in sorted array and put all smaller elements (smaller than x)
before x, and put all greater elements (greater than x) after x.
size = len(data)
quickSort(data, 0, size - 1)
Example:
Input:
0 1 2 3 4
5 4 2 1 3
P L H
0 1 2 3 4
5 4 2 1 3
4<5 so increment L
P L H
0 1 2 3 4
5 4 2 1 3
2<5 so increment L
P L H
1<5 so increment L
P L H
0 1 2 3 4
5 4 2 1 3
0 1 2 3 4
5 4 2 1 3
0 1 2 3 4
3 4 2 1 5
0 1 2 3 4
3 4 2 1 5
P L H
0 1 2 3 4
3 4 2 1 5
0 1 2 3 4
3 1 2 4 5
0 1 2 3 4
3 1 2 4 5
0 1 2 3 4
3 1 2 4 5
0 1 2 3 4
2 1 3 4 5
0 1 2 3 4
2 1 3 4 5
P L H
0 1 2 3 4
2 1 3 4 5
0 1 2 3 4
2 1 3 4 5
0 1 2 3 4
1 2 3 4 5
Recursive Tree:
Quicksort Advantages:
1. It is an in-place algorithm since it just requires a modest auxiliary stack.
2. Sorting n objects takes only n (log n) time.
Quicksort Disadvantages:
1. It is a recursive process
2. In the worst-case scenario, it takes quadratic (i.e., n2) time.
1. T(n) represents the time it takes to sort an array of size 'n' using quicksort in the best
case.
2. The term "2 * T(n/2)" represents the time required to recursively sort the two nearly
equal subarrays, each of size 'n/2'.
3. The term "O(n)" represents the time required for the partitioning step, where the pivot
ideally divides the array into two nearly equal subarrays.
1. T(n) represents the time it takes to sort an array of size 'n' using quicksort in the worst
case.
2. The term "T(n-1)" represents the time required to sort the subarray of size 'n-1' that is not
the pivot element itself.
3. The term "O(n)" represents the time required for the partitioning step.
The recursive relation for quicksort in the average case is typically more complex because it
depends on the probability distribution of pivot choices and how well-balanced the
partitions are during the sorting process. However, on average, quicksort can be analyzed
using the following probabilistic recursive relation:
T(n) represents the average time it takes to sort an array of size 'n' using quicksort.
The term "n" represents the time required for the partitioning step, where the pivot divides
the array into two subarrays.
Σ(T(i) + T(n-i-1)) is the sum of average times for sorting the left and right subarrays after
partitioning, considering all possible choices of the pivot.
Randomized Algorithm:
In radix sort the given list of numbers are sorted based on the digits of individual numbers.
Sorting is performed from least significant digit of the numbers to the most significant digit
of the numbers.
Radix sort algorithm requires number of passes which are equal to the number of digits
present in the largest number among the list of numbers.
Consider a list of numbers: 99,125,186, 34,67,12,43,78. Here, the biggest number is 186
and the number of digits in 186 is 3, so the number of passes required to sort all the
numbers is 3.
n = len(arr)
# Change count[i] so that count[i] now contains actual position of this digit in
output array
for i in range(1, 10):
count[i] += count[i - 1]
# Copying the output array to arr[], so that arr now contains sorted numbers
i=0
for i in range(0, len(arr)):
arr[i] = output[i]
# Do counting sort for every digit. Note that instead of passing digit number, exp is
passed. exp is 10^i where i is current digit number
exp = 1
while max1 / exp >= 1:
countingSort(arr, exp)
exp *= 10
# Driver code
arr = [170, 45, 75, 90, 802, 24, 2, 66]
# Function Call
radixSort(arr)
Auxiliary Space:
Radix sort also has a space complexity of O(n + b), where n is the number of elements
and b is the base of the number system. This space complexity comes from the need to
create buckets for each digit value and to copy the elements back to the original array
after each digit has been sorted.
Input:
The maximum number is 453 and the number of digits in the max number is 3. So we
need 3 passes to sort the elements.
Pass 1:
First create a count array which contains the frequency of unit place value
Scan the input from left to right
The first element is 001 and the unit place value is 1. Store 1 in the index ‘1’ in the
count array
Count Array:
0 1 2 3 4 5 6 7 8 9
1
The second element is 453 and the unit palace value is 3. Store 1 at the index ‘3’ in
the count array
0 1 2 3 4 5 6 7 8 9
1 1
The third element is 246 and the unit palace value is 6. Store 1 at the index ‘6’ in the
count array
0 1 2 3 4 5 6 7 8 9
1 1 1
The fourth element is 123 and the unit palace value is 3. Add 1 to the value at the
index ‘3’ in the count array
0 1 2 3 4 5 6 7 8 9
1 2 1
The fifth element is 089 and the unit palace value is 9. Store 1 in the index ‘9’ in the
0 1 2 3 4 5 6 7 8 9
1 2 1 1
0 1 2 3 4 5 6 7 8 9
0 1 1 3 3 3 4 4 4 5
Now scan the input from right to left to sort the elements on the unit place values
The first element is 089 and the unit place value is 9. Go to index ‘9’ in the cumulative
count array and decrement by 1 and consider the decrement value as an index in to
the input array and store 089.
Output Array:
0 1 2 3 4
089
0 1 2 3 4 5 6 7 8 9
0 1 1 3 3 3 4 4 4 4
The next element is 123 and the unit place value is 3. Go to index ‘3’ in the
cumulative count array and decrement by 1 and consider the decrement value as an
index in to the input array and store 123.
output Array:
0 1 2 3 4
123 089
0 1 2 3 4 5 6 7 8 9
0 1 1 2 3 3 4 4 4 4
Output Array:
0 1 2 3 4
123 246 089
0 1 2 3 4 5 6 7 8 9
0 1 1 2 3 3 3 4 4 4
The next element is 453 and the unit place value is 3. Go to index ‘3’ in the
cumulative count array and decrement by 1 and consider the decrement value as an
index in to the input array and store 453.
Output Array:
0 1 2 3 4
453 123 246 089
0 1 2 3 4 5 6 7 8 9
0 1 1 1 3 3 3 4 4 4
The next element is 001 and the unit place value is 1. Go to index ‘1’ in the
cumulative count array and decrement by 1 and consider the decrement value as an
index in to the input array and store 001.
Output Array:
0 1 2 3 4
001 453 123 246 089
0 1 2 3 4 5 6 7 8 9
0 0 1 1 3 3 3 4 4 4
After the first pass elements on the unit place value get sorted
Pass II:
Input Array:
0 1 2 3 4
001 453 123 246 089
The second element is 453 and the tens palace value is 5. Store 1 at the index ‘5’ in
the count array
0 1 2 3 4 5 6 7 8 9
1 1
The third element is 123 and the tens palace value is 2. Store 1 at the index ‘2’ in the
count array
0 1 2 3 4 5 6 7 8 9
1 1 1
The fourth element is 246 and the tens palace value is 4. store 1 at the index ‘1’ in
the count array
0 1 2 3 4 5 6 7 8 9
1 1 1 1
The fifth element is 089 and the tens palace value is 8. Store 1 in the index ‘8’ in the
0 1 2 3 4 5 6 7 8 9
1 1 1 1 1
0 1 2 3 4 5 6 7 8 9
1 1 2 2 3 4 4 5 5 5
Now scan the input from right to left to sort the elements on the tens place values
The first element is 089 and the tens place value is 8. Go to index ‘8’ in the
cumulative count array and decrement by 1 and consider the decrement value as an
index in to the input array and store 089.
Output Array:
0 1 2 3 4
089
0 1 2 3 4 5 6 7 8 9
1 1 2 2 3 4 4 4 5 5
The next element is 246 and the tens place value is 4. Go to index ‘4’ in the
cumulative count array and decrement by 1 and consider the decrement value as an
index in to the input array and store 246.
Output Array:
0 1 2 3 4
246 089
0 1 2 3 4 5 6 7 8 9
1 1 2 2 2 4 4 4 4 5
0 1 2 3 4
123 246 089
0 1 2 3 4 5 6 7 8 9
1 1 1 2 2 4 4 4 4 5
The next element is 453 and the tens place value is 5. Go to index ‘5’ in the
cumulative count array and decrement by 1 and consider the decrement value as an
index in to the input array and store 453.
Output Array:
0 1 2 3 4
123 246 453 089
0 1 2 3 4 5 6 7 8 9
1 1 1 2 2 3 4 4 4 5
The next element is 001 and the tens place value is 0. Go to index ‘0’ in the
cumulative count array and decrement by 1 and consider the decrement value as an
index in to the input array and store 001.
Output Array:
0 1 2 3 4
001 123 246 453 089
0 1 2 3 4 5 6 7 8 9
0 1 1 2 2 4 4 4 4 5
Pass III:
Input Array:
0 1 2 3 4
001 123 246 453 089
count array
0 1 2 3 4 5 6 7 8 9
2 1 1 1
0 1 2 3 4 5 6 7 8 9
2 3 4 4 5 5 5 5 5 5
Sorted Array:
0 1 2 3 4
001 089 123 246 453
Selection Sort
Selection sort process can be done in two ways, one is the largest element method and the
other is smallest element method.
The working procedure for selection sort largest element method is as follows:
1. Let us consider an array of n elements (i.e., a[n]) to be sorted.
2. In the first step, the largest element in the list is searched. Once the largest element is
found, it is exchanged with the element which is placed at the last position. This
completes the first pass.
3. In the next step, it searches for the second largest element in the list and it is
interchanged with the element placed at second largest position. This is done in
second pass.
4. This process is repeated for n - 1 passes to sort all the elements.
Let us consider an example of array numbers "80 10 50 20 40", and sort the array from
lowest number to greatest number using selection sort by the largest element
Pass - 1 :
After completion of Pass - 1, the largest element is moved to the end of the array.
Now, Pass - 2 can find the next largest element with out considering the last position
element.
Pass - 2 :
( 40 10 50 20 80 ) -> ( 40 10 20 50 80 ) // Largest in 40 10 50 20 is 50 and it is replaced with
next last position of the array.
After completion of Pass - 2 the second largest element is moved to the second last position
of the array.
Now, Pass - 3 can find the next largest element with out considering the last two position
elements because they are already sorted.
Pass - 3 :
( 40 10 20 50 80 ) -> ( 20 10 40 50 80 ) // Largest in 40 10 20 is 40 and it is replaced with
next last position of the array.
After completion of Pass - 3 the third largest element is moved to the third last position of
the array.
Now, Pass - 4 can find the next largest element with out considering the last three position
elements because they are already sorted.
Pass - 4 :
( 20 10 40 50 80 ) -> ( 10 20 40 50 80 ) // Largest in 20 10 is 20 and it is replaced with next
last position of the array.
After completion of Pass - 4 all the elements of the array are sorted. So, the result is 10 20
40 50 80.
Selection sort is a simple comparison-based sorting algorithm that repeatedly selects the
minimum (or maximum) element from an unsorted portion of the array and moves it to its
correct position in the sorted portion. The recursive relation for selection sort is not
commonly used because selection sort is typically implemented as an iterative algorithm.
However, you can express its behavior in terms of a recursive relation:
T(n) represents the time it takes to sort an array of size 'n' using selection sort in the worst
case.
The term "T(n-1)" represents the time required to sort an array of size 'n-1', as you first find
the minimum element and place it in the correct position.
The term "O(n)" represents the time required to find the minimum element among the
remaining unsorted elements.
This recursive relation indicates that selection sort has a worst-case time complexity of
O(n^2) because, in each iteration, it needs to compare and potentially swap elements,
resulting in a nested loop structure.
However, it's important to note that selection sort is not typically implemented using
recursion due to its inefficiency compared to more efficient sorting algorithms like quicksort,
mergesort, or even insertion sort. In practice, selection sort is often implemented using
iterative loops for sorting small datasets where simplicity may be more important than
efficiency.
Stable sorting algorithms: A sorting algorithm is stable if it does not change the order of
elements with same values
Stable sorting algorithms: Bubble, Insertion , Merge
Non-stable sorting algorithms: Selection, Quick , Heap, Radix
Quicksort: Quicksort has an average-case time complexity of O(n log n), where 'n' is
the number of elements to be sorted. However, in the worst case (when the pivot
choice is poor), it can degrade to O(n^2). The worst-case behavior can be mitigated
with good pivot selection strategies.
Mergesort: Mergesort always has a time complexity of O(n log n), regardless of the
input data. This makes it a reliable choice when a guaranteed worst-case performance
is needed.
Radix Sort: Radix sort has a time complexity of O(k * n), where 'n' is the number of
elements and 'k' is the number of digits or maximum value of the elements. It is
efficient when 'k' is relatively small compared to 'n'. However, it may not be suitable
for sorting data with a wide range of values or floating-point numbers.
Selection Sort: Selection sort has a time complexity of O(n^2) in all cases. It is not
efficient for large datasets
Space Complexity:
Quicksort: Quicksort typically has a space complexity of O(log n)[best Case] and
O(n)[Worst Case] due to the recursion stack.
Selection Sort: Selection sort has a space complexity of O(1) because it sorts the array
in-place without requiring additional memory.
Mergesort: Mergesort is suitable for all types of data and is especially useful when
stability and guaranteed worst-case performance are important. It does not have the
best constant factors, so it may be less efficient for small datasets.
Radix Sort: Radix sort is ideal for sorting integers or fixed-length strings where 'k' is
small compared to 'n'. It's not suitable for sorting data with a wide range of values or
floating-point numbers.
Selection Sort: Selection sort is not recommended for sorting large datasets, and it is
not particularly suited for any specific type of data. It is mainly used for small datasets
where simplicity is preferred.