0% found this document useful (0 votes)
13 views

FPGA Based Hardware Accelerator For Sorting Data

Uploaded by

ihacbakk12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

FPGA Based Hardware Accelerator For Sorting Data

Uploaded by

ihacbakk12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

FPGA Based Hardware Accelerator for Sorting Data

2021 9th International Japan-Africa Conference on Electronics, Communications, and Computations (JAC-ECC) | 978-1-6654-8292-9/21/$31.00 ©2021 IEEE | DOI: 10.1109/JAC-ECC54461.2021.9691432

Maher Abdelrasoul Ahmed Sayed Shaban Hala Abdel-Kader


Dept. of Electrical Engineering Dept. of Electronics and Communications Dept. of Electrical Engineering
Faculty of Engineering,Shoubra Institute of Aviation Engineering and Technology Faculty of Engineering,Shoubra
Benha University Giza, Egypt Benha University
Cairo, Egypt [email protected] Cairo, Egypt
[email protected] [email protected]

Abstract—Sorting data is one of the most important processes selection, insertion and merge sort. Section III describes lit-
in data processing. Fast processing is urgently needed for real erature work of implementing and optimizing the above men-
time data access. Therefore, hardware accelerator is used to tioned algorithms. In section IV we present our experiments
fasten the data processing. In this paper, we present FPGA based
hardware accelerators for data sorting using bubble, selection, and results. Section V concludes the paper.
insertion and merge sorting algorithms. Further, we provide a fair
comparison between them in terms of execution time, and area. II. S ORTING A LGORITHMS
Our implementations result in that for small data set, merge
sort is the best sorting algorithm in terms of execution time. A. Bubble sort
Therefore, it can be used as a parallel cooperative system with
CPU for high speed data processing. Bubble sort is frequently used to introduce the concept of
Index Terms—Sorting Algorithms, FPGA, Hardware Acceler- data sorting [3]. It is an algorithm that compares the adjacent
ator.
elements and swaps their positions if they are not in the
I. I NTRODUCTION intended order. The order can be ascending or descending. So,
for n elements in data sets, n − 1 is the maximum number of
Sorting is a method for reordering a set of data with ascend-
swaps that are required for reaching the maximum or minimum
ing or descending order. Sorting is an important application as
value to its position at the first round, at next round there are
it is used in many systems such as image processing, binary
n − 1 unsorted elements in the data set and consequently,
search, real-time embedded applications, numerous computing
n − 2 swaps are required for reaching the second maximum
systems and video compression. In reality, the majority of
or minimum value to its position and so on. Therefore, bubble
computer science and engineering research has focused on
sort needs n(n − 1)/2 comparisons to sort a data set with size
determining the proper algorithm for sorting a set of data [1].
of n.
Therefore, there a need to optimize the sorting process to make
operation of searching, insertion, and deletion easier. Further,
the need of using hardware accelerators instead of software B. Selection sort
processing arises to have better time performance. Selection sort is an approach used to arrange elements in
Finding a faster and more efficient platform for accessing their proper positions by finding and placing a misplaced
any sorting algorithm is essential for high data processing. element in its ending position [4]. The element with the least
In this case, FPGA-based hardware implementations can out- value is chosen and swapped out with the first element. The
perform general-purpose computers in terms of high-speed smallest element value among the remaining elements is then
data processing [2]. Therefore, examining the different existing identified and exchanged with the second element, and so on
sorting algorithms and focusing on the most hardware friendly until all the elements are in their correct positions.
algorithm that perform faster and more efficient with the
hardware implementation are essential issues.
C. Insertion sort
Most of the literature works focuses on software imple-
mentation of sorting algorithms and a little work focus on Insertion sort algorithm starts by sorting the first two
hardware implementation. Our goal is to design hardware elements in the data set, which are element 1 and 2. Then
accelerators for various sorting algorithms using FPGAs. This the third element is checked to be inserted into its proper
is done to take advantage of the programmability and speed of place. If element 3 is less than both of elements 1 and 2,
hardware implementation. Therefore, in this paper, we focus these two elements are shifted by one position. If element 3
on hardware implementing of bubble, selection, insertion and is less than element 2 and not less than element 1 then only
merge sort and providing a clear comparison between them in element 2 moves to position of element 3 and its place is taken
terms of execution time and area. by element 3. If element 3 is not less than both, it remains
The rest of this paper is presented in four sections. Section II in its current place. Each element in the list is inserted in its
briefly presents the most popular sorting algorithms; bubble, right location also till the end of the list [5].

978-1-6654-8292-9/21/$31.00 2021
c IEEE 57

Authorized licensed use limited to: UNIVERSITAT POLITECNICA DE CATALUNYA. Downloaded on December 11,2023 at 16:11:57 UTC from IEEE Xplore. Restrictions apply.
D. Merge sort sort algorithm shows much faster operation than the other
It is developed by john von Neumann [6] and it is one of algorithms.
the first sorting algorithms used on a computer. The merge Batcher [13] has proposed two architecture for merge sort;
sort method works by dividing a data set into two halves and Odd-Even and Bitonic merge sort. Power of two number of
then halving the result data sets until they are the lowest size inputs (M) are needed.
possible, which is two elements size. The smaller data sets are A. Odd-Even merge sort
sorted and merged into larger sizes before being sorted into
The sorted sequence might be generated through a series of
the target data set. The algorithm is designed to be recursive.
parallel merging units from OE-2s, OE-4s, OE-8s ... to OE-M
III. OVERVIEW OF S ORTING A LGORITHMS as in Fig.1. The architecture is parallel and appropriate for
IMPLEMENTATION pipeline design. The M-input merging unit receives an odd
There is a lot of research in the topic of sorting algorithms. and an even indexed sequences. Both of them contain M/2
Most of the research focuses on the software implementation samples.
of different sorting algorithms [1] [7]. Consequently, the focus
was on the time complexity of each algorithm which depends
on the number of comparisons occurs in the process of sorting.
Regarding hardware implementation, the number of compar-
isons affects the circuit area. The amount of parallel com-
parisons that can occur at the same time is very important
when the focus is on the execution time. Further, the concept
of pipelining should be applied when the focus is on the
throughput of the sorting accelerator. In this part, we make an
overview on the literature work on hardware implementations
of sorting algorithms.
High-level synthesis (HLS) hardware implementations for
several sorting algorithms were proposed in [8]. HLS is used
to develop and build FPGA applications using a familiar
programming language such as C++, C, or MATLAB, without
the requirement to understand the target hardware architecture. Fig. 1. The architecture of 8-input Odd-Even merge sort.
Selection sort is proved to be faster than the other sorting
algorithms for N < 64. Otherwise, insertion sort is the more B. Bitonic merge sort
efficient option. Verilog and HLS differ in logic resources and The Bitonic sequence is made up of ascending and descend-
performance, with HLS requiring more than twice as many flip ing sequences, as seen in Fig.2. Bitonic sort is used widely
flops and look up tables and Verilog having a lower latency because of its regular structure. The M-input merging unit
than HLS [9]. receives an ascending and a descending sequences. Both of
Bubble sort is hardware implemented with two architectures; them contain M/2 samples.
serial and parallel bubble sort [10]. Serial bubble sort is the In [14], parallel bubble sort, Odd-Even and Bitonic merge
traditional bubble sort while parallel bubble sort depends on
dividing the swapping operation into odd and even phases.
In odd phase, each odd element in the list is sorted with the
adjacent element and in even phase each even element is sorted
with the adjacent element. The two phases are repeated n/2
times. The advantage of this algorithm over serial bubble sort
is parallelizing the sorting process. The implementations were
FPGA based where the two architectures are implemented with
combinational architecture. It is shown that serial bubble sort
requires smaller memory compared to parallel bubble sort.
However, parallel bubble sort is faster than serial bubble sort.
In [11], another comparison is made between serial and
parallel bubble sort in sequential architectures and it shows that
parallel implementation of Bubble sort algorithm is almost 10
times faster than that of serial implementation for 20 different
data inputs. Fig. 2. The architecture of 8-input Bitonic merge sort
Bubble, selection and insertion sort algorithms hardware im-
plementations using FPGA with sequential implementations sorting algorithms are implemented in FPGA with sequentially
are compared in [12] and the results show that insertion implementations. It is concluded that the Odd-Even merge sort

58 International Japan-Africa Conference on Electronics, Communications, and Computations (JAC-ECC 2021)

Authorized licensed use limited to: UNIVERSITAT POLITECNICA DE CATALUNYA. Downloaded on December 11,2023 at 16:11:57 UTC from IEEE Xplore. Restrictions apply.
is the fastest but Bitonic merge provides regular structure. IV. E XPERIMENTS AND RESULTS
Further, odd-Even uses less area than Bitonic merge sort.
Magesh et al. [15] implemented bubble, Odd-Even and Bitonic We have implemented, synthesized and simulated the differ-
merge sort and proposed a pipelined Bitonic merge sort using ent sorting algorithms on FPGA, vertex-5 family with Verilog
FPGA. They noticed that the previous existing system is HDL. All algorithms are designed with 4, 8, 16 and 32 data
non-pipelined, so the execution time and delay are the same input. Each data input is 8-bit width. The main factors which
for it. Therefore, they introduced the pipeline concept where we have focused on them are area (Number of slices) and
the execution time and delay differ. It was shown that their execution time in terms of (ns) which is a result of dividing the
proposed architecture gave a reduced delay compared to other number of clock cycles that are required to finish the sorting
algorithms and gave area lower than Bitonic merge sort but process on the maximum working frequency.
larger than Odd-Even merge sort. We have implemented bubble sort, parallel bubble sort, se-
In [16], a comparison between five of merge sorting algorithms lection sort, insertion sort, Odd-Even merge sort and Bitonic
namely serial, parallel, Bitonic, Odd-Even and the modified merge sort in sequential structure. The implementation results
merge sort is presented. The sorting algorithms are imple- are shown at table II. The results are presented in terms of
mented on FPGA in pipelined architectures based on resource area and execution time. It is shown that Odd-Even merge
utilization, delay and area. It is observed that Serial and sort and Bitonic merge sort give better performance in terms
parallel merge use the highest amount of resource utilization of execution time but selection and insertion sort algorithms
compared to Bitonic, Odd-Even and modified merge. Further, are better in terms of area.
delay in the parallel merge is much less than serial merge.
Furthermore, the Odd-Even and modified merge have a very TABLE II
close value of area used while Bitonic merge has a slightly RESULTS OF NON - PIPELINED ALGORITHMS

higher value. Type of algorithm size 4 size 8 size 16 size 32


V. S. Harshini and K. K. S. Kumar [17] proposed Hybrid sys- slices ET slices ET slices ET slices ET
tem of both Odd-Even and Bitonic merge sort and compared Bubble sort 122 27.65 258 109.93 448 620.52 737 2591.10
Parallel bubble sort 93 14.18 185 28.36 330 56.72 700 113.44
it with Bitonic merge sort and Odd-Even merge sort using Selection sort 88 32.13 182 152.81 336 569.43 640 2398.38
FPGA. It is observed that, the design of Hybrid sorting unit Insertion sort 112 36.05 150 148.51 284 667.04 626 3206.27
achieves smaller number of slices, fixed structure and optimal Odd-Even merge sort 138 8.66 419 21.97 1036 42.91 2962 64.85
Bitonic merge sort 100 8.20 434 24.32 1056 42.18 2538 69.12
delay because of less usage of comparators. Further, pipeline
implementation to each of Bitonic, Odd-Even merge sort and
Hybrid sort produces low delay and high speed. We modified the architectures to be pipelined to improve the
In table I, a comparison between the literature work which throughput. Therefore, all algorithms can output a sorted data
compares hardware implementations of different sorting algo- set every clock cycle. Tables III, IV, V, and VI show the
rithms. As shown, there is no clear comparison between all results of implementing different pipelined sorting algorithms
the most important sorting algorithms. Most of the literature with 4, 8, 16, and 32 input data sets respectively. The results
focuses on comparing a sorting algorithm with a modification shows that both Bitonic and Odd-Even merge sort have the
to it. Some of the literature work compares between the best execution time. However, Bitonic merge sort is preferred
most important sorting algorithms but their comparison does because of its regular structure. Further, Odd-Even merge
not consider the modifications which may make one sorting sort shows the smallest area among all algorithms. Odd-Even
algorithm being better in terms of area or maximum working merge sort still has the highest scalability over different data
frequency. Further, there is no clear structure for merge sort sizes in terms of area and execution time, as depicted in Fig.3
they use in their comparisons. In our paper, we present a and Fig.4.
comparison between the most important algorithms Bubble, It should be noted that if a designer prefers small area designs
Insert, Selection, and merge sort and their modifications to over faster ones, he can choose either non-pipelined selection
make a clear comparison for the designers to choose the or insertion sort architecture. They show the smallest area
suitable algorithm to use in their applications. among all the algorithms.
TABLE I
COMPARISON BETWEEN THE LITERATURE WORKS TABLE III
RESULTS OF PIPELINED ALGORITHMS WITH 4 DATA INPUT
Type of algorithm [10,11] [12] [14,15] [8] [16,17]
√ √ √ √ Type of algorithm slices (LUT+FF) Max. freq. (MHz) clock cycles ET (ns)
Bubble sort √
parallel bubble sort √ √ Serial bubble sort 171 409.75 6 14.65
Selection sort √ √ Parallel bubble sort 160 411.73 4 9.72
Insertion sort √ Selection sort 174 411.73 6 14.57
Merge sort √ √ Insertion sort 186 381.53 5 13.11
Odd-Even merge sort √ √ Odd-Even merge sort 122 411.73 3 7.29
Bitonic merge sort
Bitonic merge sort 135 411.73 3 7.29
Implementation methodology RTL RTL RTL HLS RTL

International Japan-Africa Conference on Electronics, Communications, and Computations (JAC-ECC 2021) 59

Authorized licensed use limited to: UNIVERSITAT POLITECNICA DE CATALUNYA. Downloaded on December 11,2023 at 16:11:57 UTC from IEEE Xplore. Restrictions apply.
TABLE IV V. C ONCLUSION
RESULTS OF PIPELINED ALGORITHMS WITH 8 DATA INPUT
In this paper, different sorting algorithms are implemented,
Type of algorithm slices (LUT+FF) Max. freq. (MHz) clock cycles ET (ns) synthesized and simulated using FPGA in two different archi-
Serial bubble sort 723 409.75 14 34.17 tectures; synchronous and pipelined. For synchronous, non-
Parallel bubble sort 672 411.73 8 19.43
Selection sort 730 411.73 14 34.01 pipelined Bitonic and non-pipelined Odd-Even merge sort
Insertion sort 802 384.27 15 39.32 have the best performance in terms of execution time while
Odd-Even merge sort 474 411.73 6 14.57
Bitonic merge sort 535 411.73 6 14.57
non-pipelined selection and non-pipelined insertion sort have
the lowest area. For pipelined architectures, it is shown that
Bitonic and Odd-Even merge sort have much lower execu-
TABLE V tion time when implemented in hardware. Further, Odd-Even
RESULTS OF PIPELINED ALGORITHMS WITH 16 DATA INPUT
merge sort is the smallest area among all other architectures.
Bitonic merge sort is slightly larger in area and slower in
Type of algorithm slices (LUT+FF) Max. freq. (MHz) clock cycles ET (ns) execution than Odd-Even merge sort but it is preferred for
Serial bubble sort 2883 380.79 30 78.78
Parallel bubble sort 2757 370.13 16 43.23
designers Because of its regular structure.
Selection sort 2898 380.79 30 78.78
Insertion sort 2955 326.71 31 94.89 R EFERENCES
Odd-Even merge sort 1575 370.13 10 27.02 [1] D.E. Knuth, “The art of computer programming. Sorting and searching”,
Bitonic merge sort 1779 370.13 10 27.02 vol. III, Addison-Wesley, 2011.
[2] Z. Long and Z. Zhang, “FPGA-based collaborative hardware sorting unit
for embedded data processing system”, 10th International Conference on
TABLE VI Intelligent Computation Technology and Automat ion (ICICTA), (pp.
RESULTS OF PIPELINED ALGORITHMS WITH 32 DATA INPUT 260-264), October, 2017.
[3] Wlodzimierz Dobosiewicz, “An efficient variation of bubble sort”,
Information Processing Letters, Volume 11, Issue 1, Pages 5-6, 1980.
Type of algorithm slices (LUT+FF) Max. freq. (MHz) clock cycles ET (ns) [4] H. Iraj, M .H. S. Afsari, S. Hassanzadeh, “A new external sorting
Serial bubble sort 10868 398.86 62 155.43 algorithm with selecting the record list location ”, USEAS Transactions
Parallel bubble sort 10171 312.14 32 102.53 on Communications, 5(5):909-913, 2006.
Selection sort 10839 398.86 62 155.43 [5] A. Kumari, S. Chakraborty, “Software complexity: A statistical case
Insertion sort 10905 332.81 63 189.32 study through insertion sort”, Applied Mathematics and Computation,
Odd-Even merge sort 4692 370.13 15 40.53 190(1): 40-50, 2007.
Bitonic merge sort 5276 370.13 15 40.53 [6] M .Z. Jafarlou and P. Y. Fard Heuristic, “Pattern based merge sort”,
Procedia Computer Science, 3: 322-324, 2011.
[7] Y. Yang, P. Yu, and Y. Gan, “Experimental study on the five sort
algorithms”, Second International Conference on Mechanic Automation
and Control Engineering, Inner Mongolia, China, pp. 1314-1317, 2011.
[8] Y. Ben Jmaa, R. Ben Atitallah, D. Duvivier and M. Ben Jemaa, “A
comparative study of sorting algorithms with FPGA acceleration by high
level synthesis”, Computacin y Sistemas, Vol. 23, No. 1, pp. 213230,
2019.
[9] Marc-Andre’, Te’trault, “Two FPGA case studies comparing high level
synthesis and manual HDL for HEP applications”,(2018).
[10] D. Purnomo, J. Marhaendro , A. Arinaldi, D. Priyantini, A. Wibisono,
and A. Febrian, “Implementation of serial and parallel bubble sort on
FPGA”, Journal of Computer Science and Information, 9, no. 2: 113-120
, 2016.
[11] R. Lipu, R. Amin, M. N. Islam Mondal and M. A. Mamun, “Exploiting
parallelism for faster implementation of Bubble sort algorithm using
FPGA”, 2nd International Conference on Electrical, Computer and
Telecommunication Engineering (ICECTE), pp. 1-4, 2016.
Fig. 3. Scalability of different sorting algorithms in terms of ET [12] M. Fahad Alif, S. M. R. Islam and P. Deb, “Design and implementation
of sorting algorithms based on FPGA”,International Conference on
Computer, Communication, Chemical, Materials and Electronic Engi-
neering (IC4ME2), pp. 1-4, 2019.
[13] K. E. Batcher, “Sorting networks and their applications”, in Proc.AFIPS
Proc. Spring Joint Computer Conf., pp. 307314, 1968.
[14] K. Gayathri, S. Harshiniv, “Hardware implementation of sorting algo-
rithm using FPGA”, Vol-4 Issue-2, 2395-4396, 2018.
[15] V. Magesh, S. Megavarnan, A. Pragadish and S. Saravanan, “FPGA
implementation of sorting algorithms”, International Journal for Tech-
nological Research in Engineering, Volume 5, Issue 8, ISSN: 2347 4718,
2018.
[16] J. Lobo and S. Kuwelkar, “Performance analysis of merge sort al-
gorithms”, International Conference on Electronics and Sustainable
Communication Systems (ICESC), pp. 110-115, 2020.
[17] V. S. Harshini and K. K. S. Kumar, “Design of Hybrid Sorting Unit”,
International Conference on Smart Structures and Systems (ICSSS), pp.
1-6, 2019.
Fig. 4. Scalability of different sorting algorithms in terms of Area

60 International Japan-Africa Conference on Electronics, Communications, and Computations (JAC-ECC 2021)

Authorized licensed use limited to: UNIVERSITAT POLITECNICA DE CATALUNYA. Downloaded on December 11,2023 at 16:11:57 UTC from IEEE Xplore. Restrictions apply.

You might also like