FPGA Based Hardware Accelerator For Sorting Data
FPGA Based Hardware Accelerator For Sorting Data
2021 9th International Japan-Africa Conference on Electronics, Communications, and Computations (JAC-ECC) | 978-1-6654-8292-9/21/$31.00 ©2021 IEEE | DOI: 10.1109/JAC-ECC54461.2021.9691432
Abstract—Sorting data is one of the most important processes selection, insertion and merge sort. Section III describes lit-
in data processing. Fast processing is urgently needed for real erature work of implementing and optimizing the above men-
time data access. Therefore, hardware accelerator is used to tioned algorithms. In section IV we present our experiments
fasten the data processing. In this paper, we present FPGA based
hardware accelerators for data sorting using bubble, selection, and results. Section V concludes the paper.
insertion and merge sorting algorithms. Further, we provide a fair
comparison between them in terms of execution time, and area. II. S ORTING A LGORITHMS
Our implementations result in that for small data set, merge
sort is the best sorting algorithm in terms of execution time. A. Bubble sort
Therefore, it can be used as a parallel cooperative system with
CPU for high speed data processing. Bubble sort is frequently used to introduce the concept of
Index Terms—Sorting Algorithms, FPGA, Hardware Acceler- data sorting [3]. It is an algorithm that compares the adjacent
ator.
elements and swaps their positions if they are not in the
I. I NTRODUCTION intended order. The order can be ascending or descending. So,
for n elements in data sets, n − 1 is the maximum number of
Sorting is a method for reordering a set of data with ascend-
swaps that are required for reaching the maximum or minimum
ing or descending order. Sorting is an important application as
value to its position at the first round, at next round there are
it is used in many systems such as image processing, binary
n − 1 unsorted elements in the data set and consequently,
search, real-time embedded applications, numerous computing
n − 2 swaps are required for reaching the second maximum
systems and video compression. In reality, the majority of
or minimum value to its position and so on. Therefore, bubble
computer science and engineering research has focused on
sort needs n(n − 1)/2 comparisons to sort a data set with size
determining the proper algorithm for sorting a set of data [1].
of n.
Therefore, there a need to optimize the sorting process to make
operation of searching, insertion, and deletion easier. Further,
the need of using hardware accelerators instead of software B. Selection sort
processing arises to have better time performance. Selection sort is an approach used to arrange elements in
Finding a faster and more efficient platform for accessing their proper positions by finding and placing a misplaced
any sorting algorithm is essential for high data processing. element in its ending position [4]. The element with the least
In this case, FPGA-based hardware implementations can out- value is chosen and swapped out with the first element. The
perform general-purpose computers in terms of high-speed smallest element value among the remaining elements is then
data processing [2]. Therefore, examining the different existing identified and exchanged with the second element, and so on
sorting algorithms and focusing on the most hardware friendly until all the elements are in their correct positions.
algorithm that perform faster and more efficient with the
hardware implementation are essential issues.
C. Insertion sort
Most of the literature works focuses on software imple-
mentation of sorting algorithms and a little work focus on Insertion sort algorithm starts by sorting the first two
hardware implementation. Our goal is to design hardware elements in the data set, which are element 1 and 2. Then
accelerators for various sorting algorithms using FPGAs. This the third element is checked to be inserted into its proper
is done to take advantage of the programmability and speed of place. If element 3 is less than both of elements 1 and 2,
hardware implementation. Therefore, in this paper, we focus these two elements are shifted by one position. If element 3
on hardware implementing of bubble, selection, insertion and is less than element 2 and not less than element 1 then only
merge sort and providing a clear comparison between them in element 2 moves to position of element 3 and its place is taken
terms of execution time and area. by element 3. If element 3 is not less than both, it remains
The rest of this paper is presented in four sections. Section II in its current place. Each element in the list is inserted in its
briefly presents the most popular sorting algorithms; bubble, right location also till the end of the list [5].
978-1-6654-8292-9/21/$31.00 2021
c IEEE 57
Authorized licensed use limited to: UNIVERSITAT POLITECNICA DE CATALUNYA. Downloaded on December 11,2023 at 16:11:57 UTC from IEEE Xplore. Restrictions apply.
D. Merge sort sort algorithm shows much faster operation than the other
It is developed by john von Neumann [6] and it is one of algorithms.
the first sorting algorithms used on a computer. The merge Batcher [13] has proposed two architecture for merge sort;
sort method works by dividing a data set into two halves and Odd-Even and Bitonic merge sort. Power of two number of
then halving the result data sets until they are the lowest size inputs (M) are needed.
possible, which is two elements size. The smaller data sets are A. Odd-Even merge sort
sorted and merged into larger sizes before being sorted into
The sorted sequence might be generated through a series of
the target data set. The algorithm is designed to be recursive.
parallel merging units from OE-2s, OE-4s, OE-8s ... to OE-M
III. OVERVIEW OF S ORTING A LGORITHMS as in Fig.1. The architecture is parallel and appropriate for
IMPLEMENTATION pipeline design. The M-input merging unit receives an odd
There is a lot of research in the topic of sorting algorithms. and an even indexed sequences. Both of them contain M/2
Most of the research focuses on the software implementation samples.
of different sorting algorithms [1] [7]. Consequently, the focus
was on the time complexity of each algorithm which depends
on the number of comparisons occurs in the process of sorting.
Regarding hardware implementation, the number of compar-
isons affects the circuit area. The amount of parallel com-
parisons that can occur at the same time is very important
when the focus is on the execution time. Further, the concept
of pipelining should be applied when the focus is on the
throughput of the sorting accelerator. In this part, we make an
overview on the literature work on hardware implementations
of sorting algorithms.
High-level synthesis (HLS) hardware implementations for
several sorting algorithms were proposed in [8]. HLS is used
to develop and build FPGA applications using a familiar
programming language such as C++, C, or MATLAB, without
the requirement to understand the target hardware architecture. Fig. 1. The architecture of 8-input Odd-Even merge sort.
Selection sort is proved to be faster than the other sorting
algorithms for N < 64. Otherwise, insertion sort is the more B. Bitonic merge sort
efficient option. Verilog and HLS differ in logic resources and The Bitonic sequence is made up of ascending and descend-
performance, with HLS requiring more than twice as many flip ing sequences, as seen in Fig.2. Bitonic sort is used widely
flops and look up tables and Verilog having a lower latency because of its regular structure. The M-input merging unit
than HLS [9]. receives an ascending and a descending sequences. Both of
Bubble sort is hardware implemented with two architectures; them contain M/2 samples.
serial and parallel bubble sort [10]. Serial bubble sort is the In [14], parallel bubble sort, Odd-Even and Bitonic merge
traditional bubble sort while parallel bubble sort depends on
dividing the swapping operation into odd and even phases.
In odd phase, each odd element in the list is sorted with the
adjacent element and in even phase each even element is sorted
with the adjacent element. The two phases are repeated n/2
times. The advantage of this algorithm over serial bubble sort
is parallelizing the sorting process. The implementations were
FPGA based where the two architectures are implemented with
combinational architecture. It is shown that serial bubble sort
requires smaller memory compared to parallel bubble sort.
However, parallel bubble sort is faster than serial bubble sort.
In [11], another comparison is made between serial and
parallel bubble sort in sequential architectures and it shows that
parallel implementation of Bubble sort algorithm is almost 10
times faster than that of serial implementation for 20 different
data inputs. Fig. 2. The architecture of 8-input Bitonic merge sort
Bubble, selection and insertion sort algorithms hardware im-
plementations using FPGA with sequential implementations sorting algorithms are implemented in FPGA with sequentially
are compared in [12] and the results show that insertion implementations. It is concluded that the Odd-Even merge sort
Authorized licensed use limited to: UNIVERSITAT POLITECNICA DE CATALUNYA. Downloaded on December 11,2023 at 16:11:57 UTC from IEEE Xplore. Restrictions apply.
is the fastest but Bitonic merge provides regular structure. IV. E XPERIMENTS AND RESULTS
Further, odd-Even uses less area than Bitonic merge sort.
Magesh et al. [15] implemented bubble, Odd-Even and Bitonic We have implemented, synthesized and simulated the differ-
merge sort and proposed a pipelined Bitonic merge sort using ent sorting algorithms on FPGA, vertex-5 family with Verilog
FPGA. They noticed that the previous existing system is HDL. All algorithms are designed with 4, 8, 16 and 32 data
non-pipelined, so the execution time and delay are the same input. Each data input is 8-bit width. The main factors which
for it. Therefore, they introduced the pipeline concept where we have focused on them are area (Number of slices) and
the execution time and delay differ. It was shown that their execution time in terms of (ns) which is a result of dividing the
proposed architecture gave a reduced delay compared to other number of clock cycles that are required to finish the sorting
algorithms and gave area lower than Bitonic merge sort but process on the maximum working frequency.
larger than Odd-Even merge sort. We have implemented bubble sort, parallel bubble sort, se-
In [16], a comparison between five of merge sorting algorithms lection sort, insertion sort, Odd-Even merge sort and Bitonic
namely serial, parallel, Bitonic, Odd-Even and the modified merge sort in sequential structure. The implementation results
merge sort is presented. The sorting algorithms are imple- are shown at table II. The results are presented in terms of
mented on FPGA in pipelined architectures based on resource area and execution time. It is shown that Odd-Even merge
utilization, delay and area. It is observed that Serial and sort and Bitonic merge sort give better performance in terms
parallel merge use the highest amount of resource utilization of execution time but selection and insertion sort algorithms
compared to Bitonic, Odd-Even and modified merge. Further, are better in terms of area.
delay in the parallel merge is much less than serial merge.
Furthermore, the Odd-Even and modified merge have a very TABLE II
close value of area used while Bitonic merge has a slightly RESULTS OF NON - PIPELINED ALGORITHMS
Authorized licensed use limited to: UNIVERSITAT POLITECNICA DE CATALUNYA. Downloaded on December 11,2023 at 16:11:57 UTC from IEEE Xplore. Restrictions apply.
TABLE IV V. C ONCLUSION
RESULTS OF PIPELINED ALGORITHMS WITH 8 DATA INPUT
In this paper, different sorting algorithms are implemented,
Type of algorithm slices (LUT+FF) Max. freq. (MHz) clock cycles ET (ns) synthesized and simulated using FPGA in two different archi-
Serial bubble sort 723 409.75 14 34.17 tectures; synchronous and pipelined. For synchronous, non-
Parallel bubble sort 672 411.73 8 19.43
Selection sort 730 411.73 14 34.01 pipelined Bitonic and non-pipelined Odd-Even merge sort
Insertion sort 802 384.27 15 39.32 have the best performance in terms of execution time while
Odd-Even merge sort 474 411.73 6 14.57
Bitonic merge sort 535 411.73 6 14.57
non-pipelined selection and non-pipelined insertion sort have
the lowest area. For pipelined architectures, it is shown that
Bitonic and Odd-Even merge sort have much lower execu-
TABLE V tion time when implemented in hardware. Further, Odd-Even
RESULTS OF PIPELINED ALGORITHMS WITH 16 DATA INPUT
merge sort is the smallest area among all other architectures.
Bitonic merge sort is slightly larger in area and slower in
Type of algorithm slices (LUT+FF) Max. freq. (MHz) clock cycles ET (ns) execution than Odd-Even merge sort but it is preferred for
Serial bubble sort 2883 380.79 30 78.78
Parallel bubble sort 2757 370.13 16 43.23
designers Because of its regular structure.
Selection sort 2898 380.79 30 78.78
Insertion sort 2955 326.71 31 94.89 R EFERENCES
Odd-Even merge sort 1575 370.13 10 27.02 [1] D.E. Knuth, “The art of computer programming. Sorting and searching”,
Bitonic merge sort 1779 370.13 10 27.02 vol. III, Addison-Wesley, 2011.
[2] Z. Long and Z. Zhang, “FPGA-based collaborative hardware sorting unit
for embedded data processing system”, 10th International Conference on
TABLE VI Intelligent Computation Technology and Automat ion (ICICTA), (pp.
RESULTS OF PIPELINED ALGORITHMS WITH 32 DATA INPUT 260-264), October, 2017.
[3] Wlodzimierz Dobosiewicz, “An efficient variation of bubble sort”,
Information Processing Letters, Volume 11, Issue 1, Pages 5-6, 1980.
Type of algorithm slices (LUT+FF) Max. freq. (MHz) clock cycles ET (ns) [4] H. Iraj, M .H. S. Afsari, S. Hassanzadeh, “A new external sorting
Serial bubble sort 10868 398.86 62 155.43 algorithm with selecting the record list location ”, USEAS Transactions
Parallel bubble sort 10171 312.14 32 102.53 on Communications, 5(5):909-913, 2006.
Selection sort 10839 398.86 62 155.43 [5] A. Kumari, S. Chakraborty, “Software complexity: A statistical case
Insertion sort 10905 332.81 63 189.32 study through insertion sort”, Applied Mathematics and Computation,
Odd-Even merge sort 4692 370.13 15 40.53 190(1): 40-50, 2007.
Bitonic merge sort 5276 370.13 15 40.53 [6] M .Z. Jafarlou and P. Y. Fard Heuristic, “Pattern based merge sort”,
Procedia Computer Science, 3: 322-324, 2011.
[7] Y. Yang, P. Yu, and Y. Gan, “Experimental study on the five sort
algorithms”, Second International Conference on Mechanic Automation
and Control Engineering, Inner Mongolia, China, pp. 1314-1317, 2011.
[8] Y. Ben Jmaa, R. Ben Atitallah, D. Duvivier and M. Ben Jemaa, “A
comparative study of sorting algorithms with FPGA acceleration by high
level synthesis”, Computacin y Sistemas, Vol. 23, No. 1, pp. 213230,
2019.
[9] Marc-Andre’, Te’trault, “Two FPGA case studies comparing high level
synthesis and manual HDL for HEP applications”,(2018).
[10] D. Purnomo, J. Marhaendro , A. Arinaldi, D. Priyantini, A. Wibisono,
and A. Febrian, “Implementation of serial and parallel bubble sort on
FPGA”, Journal of Computer Science and Information, 9, no. 2: 113-120
, 2016.
[11] R. Lipu, R. Amin, M. N. Islam Mondal and M. A. Mamun, “Exploiting
parallelism for faster implementation of Bubble sort algorithm using
FPGA”, 2nd International Conference on Electrical, Computer and
Telecommunication Engineering (ICECTE), pp. 1-4, 2016.
Fig. 3. Scalability of different sorting algorithms in terms of ET [12] M. Fahad Alif, S. M. R. Islam and P. Deb, “Design and implementation
of sorting algorithms based on FPGA”,International Conference on
Computer, Communication, Chemical, Materials and Electronic Engi-
neering (IC4ME2), pp. 1-4, 2019.
[13] K. E. Batcher, “Sorting networks and their applications”, in Proc.AFIPS
Proc. Spring Joint Computer Conf., pp. 307314, 1968.
[14] K. Gayathri, S. Harshiniv, “Hardware implementation of sorting algo-
rithm using FPGA”, Vol-4 Issue-2, 2395-4396, 2018.
[15] V. Magesh, S. Megavarnan, A. Pragadish and S. Saravanan, “FPGA
implementation of sorting algorithms”, International Journal for Tech-
nological Research in Engineering, Volume 5, Issue 8, ISSN: 2347 4718,
2018.
[16] J. Lobo and S. Kuwelkar, “Performance analysis of merge sort al-
gorithms”, International Conference on Electronics and Sustainable
Communication Systems (ICESC), pp. 110-115, 2020.
[17] V. S. Harshini and K. K. S. Kumar, “Design of Hybrid Sorting Unit”,
International Conference on Smart Structures and Systems (ICSSS), pp.
1-6, 2019.
Fig. 4. Scalability of different sorting algorithms in terms of Area
Authorized licensed use limited to: UNIVERSITAT POLITECNICA DE CATALUNYA. Downloaded on December 11,2023 at 16:11:57 UTC from IEEE Xplore. Restrictions apply.