0% found this document useful (0 votes)
113 views

Insertion Sort Vs Merge Sort in Matlab

IEEE format paper about performance and worst time case complexity comparison of Insertion sort vs Merge Sort using Matlab

Uploaded by

kevindar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

Insertion Sort Vs Merge Sort in Matlab

IEEE format paper about performance and worst time case complexity comparison of Insertion sort vs Merge Sort using Matlab

Uploaded by

kevindar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Sorting Algorithm Comparison and Runtime

Analysis on Matlab, Merge Sort and Insertion Sort


Kevin Darmawan
1806148744
Departement of Electrical engineering,
Computer Engineering
Universitas Indonesia
Jakarta, Indonesia
[email protected]

Abstract—This paper aims at comparing two algorithms of merge sort but is also unstable and slows down for very large
different types, iterative and recursive, represented by merge values of ‘n’, but may be improved by reducing the number of
sort and insertion sort algorithm, while simulating real world iterations made and increasing the number of pivots for
cases which we choose to use Matlab for accomplishing a task of stability.
sorting great amount of data. Comparing both algorithms in
terms of runtime and the relation with their time complexity, the Song Qin[2] in the year 2008 evaluates the time
impact to performance, stability and usability for these complexity O(n logn) of merge sort algorithm theoretically
algorithms will be discussed. and empirically. Comparing with insertion sort which have
quadradic value time complexity. Results showed that merge
Keywords—algorithms, sorting algorithms, insertion sort, sort is slightly faster than the insertion sort for cases when n is
merge sort, small and gets faster rapidly than insertion sort for very large
sets of n.
I. INTRODUCTION
Sorting, is the mathematical process of rearranging sets of In this paper we will be comparing two sorting algorithms
data in similar elements in a definite order, the process may be with different types of operation; recursive and iterative to
accomplished through the mathematical approach and also analyze the efficiency of the two in handling very large n. The
computation. The sorting algorithm has a long history and fastest of both groups as tested in [5] are merge sort and
wide range of uses, not only because it is applicable in real- insertion sort, therefore, they will be sampled for recursive and
life problems, ranging from basic sorting tasks to complex iterative type of sorting algorithm and also because of the
search engines algorithms, the presence of the sorting stability [2] and room for improvement [4]. Matlab will be
algorithm is widely used in this modern era of technology. used to simulate the sorting algorithms to present a more
Most programs implement these sorting algorithms according realistic scenario for real-life usage, which also creates a more
to the particular purpose to ensure they run as fast as possible realistic comparison and efficiency analysis.
under even the worst of situations. Creating a fast and efficient This paper is organized as the following. Section II
algorithm which suits a particular use case can save an amount discusses the basic theory of sorting algorithms used in the
of significant time, especially when sorting out thousands to process; merge sort and insertion sort, this section also
millions of data. contains the pseudocode for implementation of both
There are a considerable amount of factors that impact the algorithms and the worst case analysis (O) of the two. Section
performance of a sorting algorithm, which may be a III is used for the comparison data of both algorithm runtime
consideration when choosing a sorting algorithm. These which includes the comparison graph and table, plus the
factors vary from code complexity which leads to the analysis of the pseudocode with the actual runtime of the
algorithm’s time complexity, effective memory usage, and algorithm, breaking down the algorithm to create a more
even the computer hardware. It is merely impossible can cover efficient one. Finally in section IV we draw the conclusions
all the performance weakness, therefore, different algorithms and future improvements towards the paper.
are used for different constraints. II. SORTING ALGORITHM THEORY
Htwe Htwe Aung[5] in the year 2019 did an analysis and A sorting algorithm is used to rearrange elements inside
comparison of efficiency in well known sorting algorithms. an aray to decide a new order, in this case ascending.
Aung splitted the algorithms based on the types, iteration and
recursion and compared them according to time complexity A. Merge Sort algorithm
O(n2) that includes bubble, selection, and insertion sort and Merge sort is a divide-and-conquer based sorting
O(n logn) that includes heap, merge and quick sort. Results algorithm that divides a problem into smaller subproblems,
returned that iteration type sorting algorithms with O(n logn) sorts them out and then combines the subproblem to solve the
time complexity are significantly faster than those which main one. Divide and conquer involves three steps in the
require recursion or multiple arrays to work with time process:
complexity O(n2).
Divide : divide the array into two parts, if the array index
Vignesh R. and Tribikram Pradhan[4], in the year 2016 is odd, include to the first array. Keep dividing until reaching
created a modified merge sort algorithm with O(n) best case the base case.
time complexity and O(n log n) worst case time complexity.
They discovered that their algorithm is faster than the normal Conquer: sort the two base case subarrays
Combine: combine the sorted subarrays together, creating To figure out the worst case running time of Merge Sort
a sorted array from two subarrays, keep combining until the algorithm, we will need the binary tree in figure 1. When we
main sorted array is assembled. divide the array into half it can be represented by the base 2
logarithmic function lg n,and the maximum number of steps
The Merge Sort algorithm can be illustrated as follows: can be represented by lg n+1, and for each level of the tree
adds cn time. Merging the subarrays from the original array
divided to n-elements uses the runtime of cn. Therefore, the
time complexity of the algorithm is:
T(n) = cn lg n + cn (1)
By the hierarchy of growth rate, we ignore the constant
and we get:
T(n) = O(n*lg n) (2)
because merge sort always divides the array into two
halves and merges two halves which takes constant, linear
time, worst case, best case and average time complexity will
be the same as O.
B. Insertion sort algorithm
The insertion sort algorithm works similarly to the way we
sort playing cards in our hand. The way insertion sort works
can be illustrated like so.

Figure 1 – Merge sort binary tree

Array A of 6 elements is divided into 3’s then divided


again until we have subarrays of 1, then they are sorted then
combined. Then sorted again with the next subarray and
combined until the final sorted array is solved.

Merge Sort Algorithm pseudocode


Require: data array A
1. MERGE-SORT(A, left, right)
2. if left < right
3. mid = (l+(r-l)/2)
4. MERGE-SORT(A, left, mid)
5. MERGE-SORT (A, mid+1, right)
6. MERGE(A, left, mid, right)
7. end func
8. MERGE(A, l, h, ub)
9. j←0
10. lb ← l
11. mid ← h-1 Figure 2 – Insertion sort illustration
12. n ← ub-lb+1
13. while (l <= mid && h <= ub) We assume the first element of the array is already sorted,
14. if(theArray[l] < theArray[h]) so the position will not change, the second element will be
stored in variable key. Next the key is compared to the element
15. A[j++] ← theArray[l++]
before, the first time is element 0. If key is less than the first
16. else element then key is inserted before it, if key is greater than the
17. A[j++] ← theArray[h++] first element, then key is inserted after it. Then, key is moved
18. while(l <= mid) to the next element by incrementing the array, compares it
19. A[j++] ← theArray[l++] with the element before it with the if greater and if less
20. while(h <= ub) statement, repeat the process until the entire array is sorted.
21. A[j++] ← theArray[h++] Insertion Sort Algorithm pseudocode
22. for(j=0; j<n; j++)
23. theArray[lb+j]← A[j] Require: array A of size j
24. end func 1. for k ← 1 to n-1
2. key ← A[j] For testing, the algorithm is written using Matlab formula
3. //insert A[j] into the sorted sequence A[1….j-1] formatting to create random dataset and sorting functions, and
4. i ← j-1 will be ran and timed in the Matlab ecosystem. The machine
5. while i > 0 and A[i] > key will be set in the same condition for all tests to ensure valid
6. A[i+1] ← A[i] test results: Internet connection off, no background programs,
7. i←i-1 windows update and other updates off, high performance
8. A[i+1] ← key mode with overboost on and external cooling at the maximum
9. end func B. Experiment
The experiment is conducted by creating 5 sets of array as
To figure out the worst case time complexity, we assume variables in Matlab that consists of integers ranging from 1 to
that the array given is in the reverse order, which means the 100 with n as the size of the set being 100, 1000, 10 0000,
program must loop recursively for every element to compare 100.000, and 1 000 000. Then the 5 sets are processed using
it, the element A[j] with every element that comes before it, functions created in Matlab using the sorting algorithms given,
the element A[j-1] until for 1 to n-1 times all elements have next the time is noted and then plotted to the increase of n.
been sorted.So the worst-case running time can be written as Below are the results of the test:
the following equation[3]:
TABLE I. ALGORITHM RUNTIME COMPARATION

Elements Time taken in seconds


(n) Merge sort Insertion sort
(1)
100 0.008 0.003

When all elements are in reverse order, it becomes: 1000 0.014 0.027

10 000 0.066 2.739


(2) 100 000 0.686 231.64

1 000 000 7.986 > 5000

(3)
Time (s) RUNTIME TO DATASET SIZE
Substituting (2) and (3) to (1) the we will get: 6000
(4) 4000

2000

0 n
100 1000 10000 100000 1000000
The worst case running time is expressed as (an2+bn+c) Merge sort Insertion sort
for constants a, b, and c, by the hierarchy, the time
complexity for worst case running time is:
C. Analysis
T(n) = an2 + bn + c = O(n2) (5)
According to the graph of relation between runtime and
the increase of dataset, it is discovered that the insertion sort
has a major weakness when it comes to very large datasets.
III. EXPERIMENT AND ANALYSIS The insertion sort algorithm runtime is still below the merge
The experiment is done to measure the runtime of each sort algorithm for the dataset of range 1-n with the size of n,
sorting algorithm, then we will compare both algorithm with further increase of dataset size, insertion sort runs slower
runtime and analyze the result with the worst case runtime than the merge sort and takes a major leap of time for sorting
from the pseudocode. arrays above 100 000 with a ridiculous 1,5 hour runtime.
Though the data range may be limited to 100 but the runtime
A. Machine Spesification and Conditioning could not go below the worst case time complexity of the
The Machine used for the experiment is a laptop fitted with algorithm, and the insertion sort algorithm process turned
the following specification: worst case to worse.
1) CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz The increase of time of the merge sort algorithm is linear
(12 CPUs), up to 4.10GHz as the size of array gets larger, this goes with the time
2) RAM: DDR4 8GB single channel complexity of merge sort algorithm O(n*lg n). While the
3) GPU: GTX 1050ti 4GB mobile runtime of insertion sort gets drastically the greater the array
4) OS: Windows 10 Home sizes are, which the worst-case time complexity is O(n2), the
graph starts to skew at 105 data and will continue to steepen
5) Matlab version: r2016a
exponentially. Major difference is because of how the
algorithm woks, iterative sorting does not need to go through
all data multiple times, instead it breaks them into manageable REFERENCES
subarrays while the recursive sort goes through each data and
moves them to the correct order. Therefore we can say that [1] Victor S. Adamchik, Sorting, Carnegie Mellon University, [online
overall the merge sort of the iteration sorting types perform document].2009. Available: Carnegie Mellon University Computer
better than the insertion sort of the recursive type. Science Online,
https://www.cs.cmu.edu/~adamchik/15-
It is undeniable that the experiment was not fully finished 121/lectures/Sorting%20Algorithms/sorting.html [Accessed: Mar. 22,
since this process creates instability to the system, it is better 2020].
to avoid damage to the computer . One factor why the process [2] S. Qin, Merge Sort Algorithm. Florida Institute of Technology, [online
of running the algorithm took ages was because Matlab isn’t document], 2008. Available: Semantics Scholar,
truly a compiler, it cannot produce executable codes that are https://www.semanticscholar.org/paper/Merge-Sort-Algorithm-
written in computer language as other compilers do, instead Qin/6804987ab63d1879aa55ba68224dced142ce8774 [Accessed: Mar.
interprets and optimizes code the users input and solve many 22, 2020].
mathematical problems. This paper is written for the purpose [3] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction
to algorithms. Cambridge (Inglaterra): Mit Press, 2009.
of sorting algorithm comparison and analysis and, also the
[4] R. Vignesh, P. Tribikram , Merge Sort Enhanced in Place Sorting
effect of both algorithms when used to try and solve a real Algorithm. Manipal Institute of Technology, [online document], 2016.
world problem with Matlab. Though simpler methods are Available: researcchgate.net,
available like excel, but the goal is to implement algorithms, https://www.researchgate.net/publication/312963714_Merge_sort_en
and since algorithms are not always code but steps to solve hanced_in_place_sorting_algorithm [Accessed: Mar. 22, 2020].
problems then I feel Matlab is best suited for the task [5] H. Aung, “Analysis and Comparative of Soritng Algorithms”. Journal
of Trend in Scientific Research and Development, vol 3, issue 5,
D. Conclusion and future improvement August, 2019. [online serial]. Available:
https://www.academia.edu/40250937/Analysis_and_Comparative_of_
This comparison concludes that algorithms have their own Sorting_Algorithms [Accessed: Mar. 22, 2020].
constraints, merge sort algorithm is fast and stable for sorting [6] Parewa Labs, ”Merge Sort”.[online]. Available:
very large datasets while insertion sort at a certain point slows https://www.programiz.com/dsa/merge-sort [Accessed: Mar 22 2020].
down heavily. The runtime of the algorithms are strictly tied [7] “Merge Sort Algorithm”. [online]. Available:
to the worst-case time complexity, and may go even worse https://www.studytonight.com/data-structures/merge-sort [Accessed:
when using the wrong implementations like language, and Mar. 22, 2020]
compilers. [8] Parewa Labs, ”Insertion Sort”.[online]. Available:
https://www.programiz.com/dsa/insertion-sort [Accessed: Mar 22
Future improvements can be made to enhance the 2020].
performance of the program like minimizing program loops [9] “Insertion Sort Algorithm”. [online]. Available:
and jumps in the program implementation. Also when looking https://www.studytonight.com/data-structures/insertion-sort
for performance and speed, other programming languages [Accessed: Mar. 22, 2020]
offer lightweight execution to achieve better performance
while still based on the sam pseudocode.

You might also like