FPGA Implementation of Modified Non-Restoring Square Root Core
FPGA Implementation of Modified Non-Restoring Square Root Core
202 www.erpublication.org
FPGA Implementation of modified non-restoring square root core
End
Remainder R = r0
III. ARCHITECTURES
A. Pipelined
To implement this architecture we need to unfold the
algorithm explained in section II. Therefore n stages with
n adders/subtractors will appear. By observing the first
iteration, a reduction is obtained:
Rn-1 D2n-1D2n-2 01
Qn-1 1, if rn-1 0 Figure 1: Pipelined architecture
Qn-1 0, if rn-1 < 0
The longest path delay occurs in the last stage, because the
There is no need to perform the first subtraction and wait
adder/substractor increases in size as stages advance. A
one cycle, if the result from the first iteration can be
further improvement can be made if the last stages are
obtained directly from the first 2 MSBs of D. So the first
pipelined, and the initial ones merged.
stage can be embedded into the second stage, and there will
be n-1 pipeline stages.
B. Combinatorial Architecture
This architecture is depicted in Figure 1. The computation
of the remainder is not considered, although the core This architecture is implemented because some non-real
computes it if the user wants. Note that the dotted rectangles time applications need it, and also in order to establish a
indicate the registers that would have appeared if the comparison with the core that does have a fully-combinatorial
reduction of the first stage hadnt been performed. Such architecture. The architecture is very simple: It is the
architecture can obtain a new square root each cycle. The fully-pipelined architecture without the pipelining registers.
initial latency is n cycles. It only has one register at the input and one at the output.
203 www.erpublication.org
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869, Volume-3, Issue-4, April 2015
This reduces the width of the adder/subtractor by 2 bits. The
result ba is obtained in parallel and the carry-in comes
from just an OR gate. So the new adder/subtractor uses n
bits and has carry-in. Also, note that the MSB of the second
operator of the adder/subtractor is 0 as in the pipelined
case. Figure 3 depicts this architecture.
cba = xy + 11 cba = xy - 01
xy cba xy cba
00 011 00 111
01 100 01 000
00 101 00 001
Figure 4: FSM for iterative architecture
01 110 01 010
Finite state machine of iterative architecture is depicted in
C: carry-in for the next stage of the adder/subtractor ba:
figure 4. This FSM controls the iterative architecture. The
result of the operation.
process start when s = 1.After n clock cycles, the result is
obtained in register Q, done = 1, and a new process can be
Ba depends only on xy, but c depends on the type of
started.
operation. Luckily, a conventional adder/substractor with
carry-in (e.g. the lpm_add_sub megafuntion) treats the
IV. RESULTS
carry-in as positive logic when adding, and as negative
logic when subtracting [3] (this is done to reduce gates The architecture were synthesized using XILINX ISE
usage). So, for subtraction, we have to invert c to assure v14.1 successfully. After synthesizing the core were
the proper working of the adder/subtractor. The new truth implemented on FPGA device XC3S400-TQ144 (Xilinx
table is: Spartan-3 family) with speed grade -5. The core presented
does not compute the remainder, since it is rarely used.
cba = xy + 11 cba = xy - 01 Figure 5 depicts the core with all its options. Table 1
xy cba xy cba establishes a comparison between this core and the
ALTERA core.
00 011 00 011 Results are shown only for a specific device (Spartan 3)
01 100 01 100 because of large results data with just one device and these
00 101 00 101 results are enough to demonstrate the benefits of the core
01 110 01 110 implemented.
204 www.erpublication.org
FPGA Implementation of modified non-restoring square root core
Figure 5: Parameter comparison graph for Spartan Figure 5: Parameter comparison graph for Virtex
205 www.erpublication.org
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869, Volume-3, Issue-4, April 2015
206 www.erpublication.org
FPGA Implementation of modified non-restoring square root core
REFERENCES
[1] Y. Li and W. Chu, A New Non-Restoring Square Root Algorithm and
Its VLSI Implementations, Proc. Of 1996 IEEE International
Conference on Computer Designs: VLSI in Computers and Processors,
Austin, Texas, USA, October 1996, pp538-544..
[2] J. Hennessy and D. Patterson, Computer Architecture, A Quantitative
Approach, Second Edition, Morgan Kaufmann Publishers, Inc., 1996.
[3] G. Knittel, A VLSI-Design for Fast Vector Normalization Comput.
& Graphics, Vol. 19, No. 2, 1995. pp261 - 271.
[4] J. Bannur and A. Varma, The VLSI Implementation of A Square Root
Algorithm, Proc. IEEE Symposium on Computer Arithmetic , IEEE
Computer Society Press, Washington D.C., 1985. pp159 - 165.
[5] J. OLeary, M. Leeser, J. Hickey, M. Aagaard, NonRestoring Integer
Square Root: A Case Study in Design by Principled Optimization,
Proc. 2nd International Conference on Theorem Provers in Circuit
Design (TPCD94) , 1994. pp52 - 71.
[6] K. C. Johnson, Efcient Square Root Implementation on the 68000,
ACM Transaction on Mathematical Software , Vol. 13, No. 2, 1987.
pp138 - 151.
[7] H. Kabuo, T. Taniguchi, A. Miyoshi, H. Yamashita, M. Urano, H.
Edamatsu, S. Kuninobu, Accurate Rounding Scheme for the
Newton-Raphson Method Using Redundant Binary Representation,
IEEE Transaction on Computers , Vol. 43, No. 1, 1994. pp43 51
[8] Brown & Vranesic. Fundamentals of Digital Logic with VHDL
Design, McGraw Hill, 2000
[9] U. Meyer Baese, Digital Signal Processing with Field Programmable
Gate Arrays: Springer-Verlag Berlin Heidelberg, May 2001
207 www.erpublication.org