0% found this document useful (0 votes)
42 views

Design Amp Implementation of Floating Point ALU On A FPGA Processor

Uploaded by

Azhar Yaseen N J
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Design Amp Implementation of Floating Point ALU On A FPGA Processor

Uploaded by

Azhar Yaseen N J
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2012 International Conference on Computing, Electronics and Electrical Technologies [ ICCEET]

Design & Implementation of Floating point


ALU on a FPGA Processor
Prashanth B.u.v \ P.Anil Kumai , .G Sreenivasulu3
i 3
Maintenance Engineer, DST-PURSE, 2M. TECH Student , Associate Professor,
S. V University, Tirupati-517502

Abstract- In this paper, the implementation of DSP an exponent of OxFF or if any input operator is infinity. The
modules such as a floating point ALU are presented and Underflow exception occurs if the implicit bit of result is zero
designed. The design is based on high performance
or if the exponent out is -
FPGA "Cyclone TI" and implementation is done after
126 or OxOI the number is too small to be represented fully in
functional and timing simulation. The simulation tool
used is ModelSim. The tool for synthesis and single precision format. The Division by zero exception occurs
implementation is Quartus n. The experimental results when the divisor is zero the result is set to infinity. The Invalid
shows the functional and timing analysis for all the DSP operation exception occurs when the Operation cannot be
modules carried out using high performance synthesis
performed on operands. Ex: Subtraction of infinity and NAN
software from Altera.
inputs.
K�ords-Floating point ALU, Adder, subs tractor,
III. BLOCK DIAGRAM OF FLOATING POINT
multiplier , divider.
ADDITION AND SUBSTRACTION

I. INTRODUCTION : _____h�1 �1'_'�I e_'� f_' �


�1 '_l�l__g_ __ __

l'-__----" ) "
y�--�
The IEEE 754 floating point format consists of three <=__ -=I,"
h:.::.
""c:.:.,, :::
,_:o:..c:
' >W . ::
e.::. a?..:.
::: Pio.:&::.fi'L
: ' t::
omal<
=.:.:
: F.'"'", >
..F..:: ',
.... __ >
�__�,__�) l'- ______ ----�
fields. The Sign bit : I bit .It is 1 for a negative �

number and 0 for a positive number. The exponent : 8


bits. The exponent represents a power of two. The
exponent uses a excess 127 notation. Special cases Stage J

arises when the biased exponent is 255 ,then zero


fraction represents infinity and non zero fraction
represents NAN. When the biased exponent and
fraction fields are zero, the number represented is a
zero. Denormalized numbers are of the form O.f *
S�2
2/\Emin. The significand is 23 bits. Here the
significand is represented as l.fffff.. . .The fraction
part represents a number less than one. The leading
one is implicit and does not appear in the
representation. The exponent value ranges from -126 Stag. 3

to 127 which is biased and therefore ranges from 1 to


254.Further the exponent 0 and 255 represents
special cases. The number represented using the Figurel. Block diagram of Floating point adder and subs
IEEE754 standard is ( -I)As* J.f* 2/\( e-bias).
tractor.

II. TYPES OF EXCEPTIONS THAT ARISE A. Floating Point Ad dition Algorithm


The four types of exceptions that arise are as follows.
The Overflow exception occurs when the result has The floating point addition algorithm is as follows

The two operands NI,N2 are read in and compared

978-1-4673-0210-4112/$31.00 ©2012 IEEE 772

Authorized licensed use limited to: College of Engineering - THIRUVANANTHAPURAM. Downloaded on December 17,2022 at 16:02:22 UTC from IEEE Xplore. Restrictions apply.
2012 International Conference on Computing, Electronics and Electrical Technologies [ ICCEET]

for denormalization and infinity. C. SIMULATION WAVEFORMS.


If numbers are denormalized, set the implicit bit to 0
otherwise it is set to 1. TrYliWE"II t

The fractional part is extended to 24 bits.


The two exponents e1 and e2 are compared and if
� NmdaeSa: o� ±IPrm_ 1�.11l1 Imml l�l1m Stn Ern
e1<e2 , N1 and N2 are swapped. Now 12 is referred as
A lIlpl'$ iU�rIl 3'J.pJII ��I'II !ilprd roprt IOpftl
fl and vice versa.
12 is right shifted by the difference between the two
'!'
'\
Nm

8ai1 HOO
Iv'
I PI

� � . �
exponents. Now both the numbers have the same �
Dill( 1iI� HCi <1
exponent. &67 8" H. ,
� .... ,
The signs are used to see whether the operation is �
= """"
addition or subtraction.
I
If the operation is subtraction, perform 2's
complement addition.
Figure 2. Simulation Waveform for Adder
If the result is negative take the 2's complement of
the result to obtain actual result.
The result is normalized by left shifting. �_1_

Then the result is checked for overflow and


underflow and the sign of the result is also computed. � w..t.t. '" .:h""'--
A �l·

B. Functional and Timing Analysis and


Synthesis Results
',-
.. ,. ..
..
The functional and timing analysis are carried out II D ......
..
. "
using Quartus II software from Altera. The result is = .

obtained after 7 clock cycles for a clock of 5ns. The Figure 3. Simulation Waveform for Substractor

special cases and exception cases are signaled by a


N. BLOCK DIAGRAMOF FLOATING POINT
flag. For the Substractor the functional and timing MULTIPLIER

analysis are carried out using Quartus II software from

Altera. The result is obtained after 4 clock cycles for a

clock of 4ns. The special cases and exception cases are

signaled by a flag. Further the synthesis results for

adder are as follows For adder the Logic Utilization is

l%.The Combinational ALUT's are

354/38,000«1 %).The Dedicated Logic Registers are

321138,000«1%). The total Pins count is

99/296( 33%). The F max for slow1100mV 85C model

is 225.63 MHz's. The Fmax for slow 1100mV OC

model is 227.53MHz.For the substractor the synthesis

results are ,Logic Utilization is 1 %.The Combinational Figure 4. Block dia gram of Floatin g Point multiplier

ALUT's are 466/38,000( 1 %).The Dedicated Logic

Registers are 65/38,000«1 %).The Total Pins are A. Floating Point Multiplication

105/296( 35%).The maximum frequency Fmax for Algorithm

slow 1100mV 85C Model is 225.63MHZ.

773

Authorized licensed use limited to: College of Engineering - THIRUVANANTHAPURAM. Downloaded on December 17,2022 at 16:02:22 UTC from IEEE Xplore. Restrictions apply.
2012 International Conference on Computing, Electronics and Electrical Technologies [ ICCEET]

Algorithm Steps 18-bit elements are 4/216(2%). Fmax for slow

1l00mV 85C Model: 99.4MHz.Fmax for slow


Step 1
The two operand s are read and is compared for
1100mV OC Model is 109.1 MHz.

denormalization and infinity. If denormalized the implicit


D. RTL Schematic of floating point
bit is set to '0' otherwise it is set to '1'.
Multiplier.
Step 2
The exponents are added and the fractional parts are
multiplied. The sign bit is obtained by xoring the sign
bits of the two numbers.

Step 3
fans is shifted to the left until the first bit becomes
'1', and the amount of shift is calculated. means is
obtained by subtracting the amount of shift.

A. Functional and Timing Analysis Results

The functional and timing analysis are carried out


using Quartus II software. The result is obtained after
5 clock cycles for a clock of 1 Ons. The special cases
and exception cases are signaled by a flag.

B. Simulation Waveform.

Figure 6 RTL Schematic of Floating point multiplier

V. FLOATING POINT DIVISION

0101 HOOI---_---"=
=-_ _....!f�__ __"C8!lI
"""_ FOO __ _

; A. Floating Point Division Algorithm


The Algorithm steps are as follows
1. The operands are read.
2. Non restoring division algorithm is
used for performing division.

F [s ure 5. Simulation ofMult! plier 3. The exponents are subtracted and it is


biased by adding 127.
4. Normalize the result.

C. Synthesis Results 5. Non restoring division algorithm is as


follows. Initially the register A is set
The synthesis results are obtained as follows The to zero.
Logic Utilization is <1 %. The Combinational 6. The divisor is loaded into register M and
ALUT's are 66/38,000«1%).Followed by Dedicated dividend into Q register.
Logic Registers are85/38,000«1%).

Total Pins are 97/296(33%). Further the DSP block

774

Authorized licensed use limited to: College of Engineering - THIRUVANANTHAPURAM. Downloaded on December 17,2022 at 16:02:22 UTC from IEEE Xplore. Restrictions apply.
2012 International Conference on Computing, Electronics and Electrical Technologies [ ICCEET]

Further the following steps are carried out n times.


Fmax for slow 1100mV OC Model: 94.91MHz

1. If the sign of A is 0, shift A and Q left


and subtract M from A; shift A and Q left
and add M to A.

2. If the sign of A is 0, set qO to 1; otherwise set


qO to O.
B. Functional and Timing Analysis Results

The functional and timing analyses are carried out by


using Quartus II software from Altera. The result is
obtained after 50 clock cycles for a clock of 11 ns. The
special cases and exception cases are signaled by a
flag.

C. Simulation Waveform

Figure 8. RTL Schematic of Floating Point Divider

E. RTL Schematic of a Floating Point Divider

i N""l"t.--
O �-�I\iil � �j�g,; --- [� -­

A VI. CONCLUSIONS
*
The design of floating point based ALU modules are
Designed and summarized as seen in the below
!Ill I
tabular Column.
8� He 11M
8i!b H IllIlJU
I
HOO i!IllOlJ �oooooo Floating point Clock: 5ns Obtained output after

adder � cycles (3 6.293ns).

Floating point Clock: ns Obtained output after

subs tr actor � cycles (IS.565ns)


Figure 7. Simulation Waveform of Divisor
Floating point Clock: i Ons Obtained output after 5
multiplier
� cycles (56.1· ns).
D. Synthesis Results Floating point
Clock : 1 ins Output is obtained after
The synthesis results are as follows di\ider
5 0 clock cycle3
o LDgic Utilization : 2% (556.0 Ins).

o Combinational ALUT's :762/38,000(2%)

o Dedicated Logic Registers: 201/38,000«1%) Table 1.

o Total Pins: 130/296(44%)


In this paper the implemented DSP modules are

II Fm ax for slow 1100mV 85C Model: 87.47MHz Floating point based ALU computational systems. All

775

Authorized licensed use limited to: College of Engineering - THIRUVANANTHAPURAM. Downloaded on December 17,2022 at 16:02:22 UTC from IEEE Xplore. Restrictions apply.
2012 International Conference on Computing, Electronics and Electrical Technologies [ ICCEET]

these DSP modules are designed from the block Mr. P. Anil kumar , is a student
diagram approach to the synthesis and simulation ( M.tech), Instrumentation &
aspects. The functional timing analysis results and the Control systems ,Sri Venkateswara
synthesis results are measured in precise and accurate university college of Engineering,
manner. Finally the simulation waveforms are Tirupati. Passed out of B.TECH
obtained in the FPGA simulation tools and the ( 2009) III Electronics &
simulation waveforms are verified with the hardware Instrumentation Engg. from
design aspects, and matching results are obtained. The Vijayawada ( VRSEC),Andhra Pradesh, India
filter structure is implemented in a modular form. All
Dr. G. Sreenivasulu is an
the simulations are carried out in the Quartus-II
software. The target device selected is Stratix-III.The Associate professor in ECE
department at Sri Venkateswara
functional and timing analysis for all the modules are
university college of Engineering,
carried out and accurate measurements are obtained.
Tirupati.
As the results between hardware and software are
matching, this will clear the gap between hardware
implementation and the software simulation, also it
clears the visualization of the concepts incorporated.

REFERENCES

[ I] Steven Smith, (2003), Digital Signal Processing-A


Practical guide for Engineers and Scientists, 3rd
Edition, Elsevier Science, USA
[ 2]John G. Proakis, Dimitris K
Manolak is,(2 003),Dig italS igna IProcess ingPrinc ipl
es, Algorithnms,Applications,4th edition, Primer,
USA
[ 3] "A New Common Sub expression Elimination
Algorithm for Realizing low-Complexity Higher Order
Digital Filters"IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, Vol. 29, No.5,
pp 844 - 848, May 2010
[ 4] Vinay K. Ingle, John G. Proakis,(2009),
DigitalSignalProcessingUsingMATLAB,3e ,Cengage
learning.

ABOUT THE AUTHORS:

Prashanth. B.U.V has completed


B.E from R.V College of Engg,
Bangalore and M.Tech III

Embedded Systems from IN.T.U,


Hyderabad. Currently He IS

working as Maintenance
Engineer in DST-PURSE Programme, S.V.
University, Tirupati.

776

Authorized licensed use limited to: College of Engineering - THIRUVANANTHAPURAM. Downloaded on December 17,2022 at 16:02:22 UTC from IEEE Xplore. Restrictions apply.

You might also like