P 2
P 2
Abstract
ASIC and FPGA ASIC and FPGA are considered to be the ideal platform for special
fast calculations because of the hardware structure, and how to achieve computational
algorithm by is the hotpot of research. The CORDIC (Coordinate Rotational Digital
Computer) can break the basis functions down to operations of shift and addition or
subtraction, which can be used to lay the foundation for the realization of complex logic.
But the functions selected by traditional CODIC for angle encoding are too complex,
which will lead to some problems, such as too much of area consumption and large delay.
In this paper, an optimization of CORDIC algorithm are proposed, which reduce the
consumption of Adders and comparators, decrease the complexity and delay of the
algorithm implement in hardware. The proposed algorithms are modeled in Verilog
Hardware Description Language and implemented with FPGA. The simulation results
show that the functions of sine and cosine are realized successfully, and the proposed
algorithm not only improves the computation speed but also reduces the system hardware
resources.
1. Introduction
CORDIC algorithm is the best choice to achieve the functions of transcendental
functions such as trigonometric, inverse trigonometric, exponential function , logarithmic
function since that the CORDIC algorithm is provided with a simple structure, and
characteristic of saving resources and high efficiency, as while as CORDIC algorithm is
used widely for matrix operations, especially possess significance for development of
transplant of highly complex operations in FPGA, such as analysis algorithms for
multidimensional data array[1]. CORDIC algorithm is an iteration algorithm and
commonly used to calculate basic arithmetic functions. The principle of the CORDIC is to
use the deflection of the angles associated with the cardinal number, rather than get the
desired angle. Thus, the principle can be considered as a numerical approximation
methods of calculation. Because the fixed angle is related with cardinal number, the
operations of computation include only shift and addition or subtraction. Therefore, it will
cost less FPGA (Field Programmable Gate Array) resources than conventional
calculation methods, such as multiplication and division. The conventional calculation
methods are difficult to achieve or can not meet the designer's requirements. The
appearance of CORDIC algorithm is to solve this problem, the CORDIC can greatly
saving FPGA resources to get better implement in hardware, which can achieve the
requirements of the designer.
In this paper we discuss how to implement the CORDIC algorithm with FPGA, model
it in Verilog hardware description language, simulate it by EDA tools, and analyze the
performance from aspects of the resource consumption, delay, the number of iteration and
the accuracy of computation.
We will obtain a new vector (xn, yn) by rotation of the initial vector (xi, yi), the
coordinates of the new vector can be expressed as:
xn x 2 y 2 cos cos xi sin yi
yn x 2
y 2 sin sin xi cos yi
The (1) can be written in matrix form as:
xn cos sin xi
y sin
n cos yu
Assuming tanαi=2-i, the angle of rotation of each step αi=δi arctan2-i can be
approximated as:
N N
i i i arctan 2 i
i 0 i 0
In (3), δi={1, -1}, the sign of δi determines the direction of rotation, which would be
close to the target vector if δi =1, else would be close to the opposite direction if δi =-1.
Then the remaining angle after each rotation can be expressed as: Zi+1= Zi –δiαi, where
δi=sign(Zi).
Assuming that we could complete the rotation angle θ by N times of iterations, then the
rotation process may be represented by (5).
x N 1 N cos i sin i xi
y sin cos i y i
N 1 i 1 i
N N 1
- tan i xi
cos i y
i 1 i 1 tan i 1 i
N
1 N 1 2 i x i
2 i
i 1 1 2 2i i 1 1 y i
Where ki 1 / 1 22i ,ki would converge to a constant as while as the increasing of the
number of iterations. The gain constant K of the rotational operation can be expressed as:
N N
1
K ki
i 0 i 0 1 2 2i
The value of K depends on the number of iterations N, and K was usually called focus
constant or scaling factor. Generally, we could calculate the scaling factor and the
rotation operation separately. The rotation operation was carried out according to (7).
x N 1 ' N 1 2 i xi
y ' i
N 1 i 1
2 1
i
y
However, the range of δi would be extended to {-1, 0, 1}, so as to skip the unnecessary
rotation and greatly reduce the number of iterations. The gain K would be no longer a
constant, therefore, after each rotation step we would not only update coordinates of xi
and yi but also the scaling factor ki for each step [4].J. S. Walther et al. proposed a
CORDIC algorithm in 1971, which unified rotation of CORDIC under the circumference
coordinate system, hyperbolic system and linear systems into a single iteration equation,
which was shown as:
i 0,....,N
1,
X i 1 k i * X i m i 2 s mj Yi ;
Yi 1 k i * Yi i 2 s mj
Xi ;
Z i 1 Z i i mj
Where m={1, 0, -1} for the peripheral system is a coordinate system , a linear system
or a hyperbolic systems respectively. The αmj is a constant angle, could be unified
expressed as:
mj
1
m
tan 1 m * 2 s mj (9)
Where, S(mj) is the shift sequence that could be represented by (10) in different
systems.
S (mj) 0,
1,2,3,4,5,
...,N . m 1
=0,1,2,3,4,5,...,N . m 0
=0,1,2,3,4,5,...,N . m 1repeat 3i 2 1/ 2
Uniform scaling factor can be expressed as:
k i 1 / 1 m i * 2 S mj
2
N N
K k i 1 / 1 m i * 2 S mj
2
i 0 i 0
The unity of iteration expression under different systems laid the theoretical foundation
for the same hardware to implement multiple functions, the results in different systems
and modes when the entry was (x0, y0, z0) would be shown as in Table 1.
The CORDIC algorithm based on angle encoding was same as the traditional ones, it
assembled linearly rotation angle θ by a series combination of a small angle, however, the
difference was that the rotational direction of the vector could be zero, i.e., αi = {-1, 0, 1}.
The use of greedy mechanism, making each selection from the remainder of the rotation
angle is the angle nearest. The pseudo-code of angle encoder CORDIC algorithm is
shown as:
initial : 0 ,
i 0;
0 i n 1,k 0.
repeat until: k n 1
0 i n 1 , k ik min k ik
then select ik,
0 i n 1
k 1 k ik ik , ik sign k
To expand the scope of the convergence to the range of (0, π/2), the algorithms took
advantage of interval folding technique, which under the following rules: if the range of θ
is θ>2π,let θ=θ-2π; if the range is from π to 2π,replacing the [x y]T with [-x -y]T, and
let θ=θ-π; if the range is π>θ>π/2, replacing the [x y]T with [-x -y] T, and let θ=θ-π/2.
Since the rotation angles of the CORDIC are known, the number of iterations is able to
reduce using angle encoder method at least 50% with N-bit precision. The
implementation of angle encoder CORDIC is shown in Figure 2. In Figure 2, the
algorithm requires an N-bit adder /subtracter, and a comparison unit for getting the
minimum. Since all of these operations are on the critical path of the each iteration, the
time of iteration and consumption of area are greatly increased. Angle encoder CORDIC
algorithm could greatly reduce the number of iterations at the cost of increasing the
latency of a single iteration, and skip some unnecessary rotation angle so that the scaling
factor is no longer a constant [6].
N 1 N 1
- arctan(2 n ) arctan(2 n )
n 0 n 0
The range of angle corresponding with N would be calculated according to the formula
given. In Table 3, the maximum range of rotation angle is -99.88° ≤ θ ≤ 99.88°, which
could not achieve a complete cycle. To make sure that CORDIC algorithm is
convergence, the sum of rotation angles must be bigger than the angle of rotation which is
actually required, and the input angle should must be pretreated [7].
1 45° 8 99.44°
2 71.56° 9 99.67°
3 85.60° 10 99.77°
4 92.73° 11 98.83°
5 96.30° 12 99.85°
6 98.09°98.99° 13 99.87°
7 98.99° ≥ 14 99.88°
(0,1,1) (1,0)
(0,0,1)or(0,1,0) (0,1)
(0,0,0)or(1,1,1) (0,0)
(1,0,0) (-1,0)
(1,0,1)or(1,1,0) (0,-1)
Based on the above analysis, the improved CORDIC algorithm works only when the
value i = max (m, l) and N = 15). Therefore, it is necessary to use the combination of (7),
(18) and (19) to calculate a value of function. For example, if N = 16, we firstly took
nine times of iteration with formula (7), and corrects mode correction factor as
8
1
K = 0.607259
i 0 1 22i , then took three times of iteration for the following six stages, and
corrects mode correction factor as one [8]. So the whole process needs twelve times of
iteration, reduces three stages of pipeline and six storage units, cuts down the
consumption of hardware unit, reduces the ROM accessing times, and reduces the
computation time. Thus, under the premise of guaranteed system performance, the
optimization of CORDIC algorithm economizes hardware resources and enhances the
speed of computation [12-13].
Assuming the input angle was z0, the following four cases indicated the specific rules
of conversion for 16-bit CORDIC algorithm, which is shown in Table 5.
270 z0 360
o o
z0 -270 o
y15 -x15
Device Edition
Input of design Synthesis and layout
Need of user
Functional Timing
simulation simulation
Test of device
System engineers could connect the internal logic blocks in FPGA together by editing
of connection as needed, just like placing a test circuit board into a chip. The logic blocks
and connection of a development of FPGA could be edited by designer, so to complete
the required logic functions [10].
clk recedata Z0
UART cache
pre-
processing
XN Post- cos
Z1
rst allocator for Optimized proce
unit
the initial CORDIC YN
RXD
CONTROLLER recover value Algorithm
ssing sin
X0 unit
Y0
The UART controller is used to optimize the communication between the hardware of
CORDIC and the serial device. The cache allocator is used to convert the two 8-bit datum
to a 16-bit data for the initial value of the pre-processing unit. The pre-processing unit is
used to convert initial angle into the first quadrant, trigger the iteration computation of the
unit of optimized CORDIC. In the five modules, the unit of optimized CORDIC is the
core, which determined the performance of system.
The flow chart of the optimized CORDIC algorithm is shown in Figure 5.
Initial
of system
NO
receive data
YES
Pre-process
Phase
Optimized CORDIC
information
Post-process
output
7E6C
99.8° 0.98769 0.98768 -110-5
7E14
110° 0.93969 0.98492 4.527 10-2
81EE
200° -0.34202 -0.98492 6.429 10-1
From Table 6 and Table 7, when the input angles are less than 99.8°, the inaccuracy of
simulation negligible, otherwise the inaccuracy are large, which verify the contents we
discussed above, and indicate that the conversion of input angle is necessary.
From the results, the optimized CORDIC have a high accuracy as while as enhance
operating frequency.
6. Conclusions
In this paper, we successfully complete the optimization of conventional CORDIC
algorithm, and resolve the problem of restrictive relationship of speed, area, precision in
the design, break the limitation of angle coverage, provide a optimization for various
functions by CORDIC algorithm, and accomplish the simulation of the optimized
CORDIC algorithm with 16-bit on FPGA. Comparing the simulation results of optimized
CORDIC algorithm and traditional one, we can get the conclusion that the optimized
CORDIC algorithm reduce resource consumption, and increased the maximum operating
frequency, the accuracy of the CORDIC algorithm is 10-5 as same as the data of
traditional one.
Acknowledgements
This work is supported by Science and Technology Research Funds of Education Depa
rtment in Heilongjiang Province under Grant Nos. 12541174.
References
[1] C. S. Wu, A. Y. Wu and C. H. Lin, “high-performance/ low-latency vector
rotationalCORDICarchitecture based on extended elementary angle set and trellis-based searching
schemes.” IEEE Transactions on Circuits System II:Analog Digital Signal Processing, (2003), vol. 50,
no. 9, pp. 589-601.
[2] X. Hu, R. G. Harber and S. C. Bass.” Expanding the range of convergence of t he CORDIC algorithm”.
IEEE Transaction on Computers, (1991), vol. 40, no. 1, pp. 13-21.
[3] K. Maharatna, S. Banerjee, E. Grass, M. Krstic and A. Troya, “Modified virtually scaling-free adaptive
CORDIC rotator algorithm and architecture”, IEEE Transaction on Circuits Systems for Video
Technology, (2005), vol. 15, no. 11, pp. 1463-1474.
[4] F. J. Jaime, M. A. Sánchez and J. Hormigo, “Enhanced Scaling-Free CORDIC.”IEEE Transactions On
Circuits and Systems”, Regular Papers, (2010), vol. 57, no. 7, pp. 1654-1662.
[5] H. Y. Hu, “The quantization effects of the CORDIC algorithm[J]”, IEEETrans, on Signal Processing,
vol. 40, (1992), pp. 834—844.
[6] T-B. Juang and M.-Y. Tsai, “Para—CORDIC: Parallel CORDIC Rotation Alg—orithm[J]”, IEEE
Transactions on Circuits and Systems, vol. 51, no. 8, (2004), pp. 1515—1524.
[7] K H. Bed and R E. Siferd, “VLSI implementations of low--power leading—one detector circuits”,
Proceedings of the IEEE SoutheastCon2006, (2006), pp. 279—284.
[8] J. M. P. Langlois and D. Al-Khalili, “Hardware optimized direct digital frequency synthesizer
architecture with 60-dBc spectral purity,” IEEE International Symposium on, vol. 5, May (2002), pp.
361-364.
[9] B. D. Yang, J. H. Choi, S. H. Han, L. S. Kim, and H. K. Yu, “An 800-MHz low-power direct digital
frequency synthesizer with anon-chip D/A converter,” IEEE J. Solid-State Circuits, vol. 39, no. 5, May
(2004), pp. 761-774.
[10] M. A. Butt and S. Masud, “FPGA based bandwidth adjustable all digital direct frequency synthesizer,”
IEEE, Communications and Information Technology, 2009. ISCIT 2009. 9th International Symposium
on, (2009), pp. 1399 – 1404.
[11] M. Genovese and E. Napoli, “Direct Digital Frequency Synthesizers implemented on high end FPGA
devices,” IEEE, Ph D. Research in Microelectronics and Electronics (PRIME), 2013 9th Conference on,
2013, pp. 137 – 140.
[12] Y.-J. Cao, Y. Wang and T.-Y. Sung, “A ROM-less direct digital frequency synthesizer based on a
scaling-free CORDIC algorithm,” IEEE Conference Publications, Strategic Technology (IFOST), 2011
6th International Forum on, (2011), pp. 1186 – 1189.
[13] Chimakurthy, L. S. J. Ghosh, M. Dai, F.F. Jaeger, R.C, “A novel DDS using nonlinear ROM addressing
with improved compression ratio and quantization noise,” IEEE Journals & Magazines, Ultrasonics,
Ferroelectrics and Frequency Control, (2006), pp. 274 – 283.