FPGA Design of A Fast 32-Bit Floating Point

Uploaded by

Nguyễn Cường

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views

FPGA Design of A Fast 32-Bit Floating Point

Uploaded by

Nguyễn Cường

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

FPGA Design of a Fast 32-bit Floating Point

Multiplier Unit
Anna Jain, Baisakhy Dash, Ajit Kumar Panda, Member, IEEE, Muchharla Suresh, Member, IEEE

Abstract- An architecture for a fast 32-bit floating point standard established by the Institute of Electrical and
multiplier compliant with the single precision IEEE 754-2008 Electronics Engineers (IEEE) and the most widely used
standard has been proposed in this paper. This design intends to standard for floating-point computation, followed by many
make the multiplier faster by reducing the delay caused by the
hardware and software implementations. Single precision
propagation of the carry by implementing adders having the least
representation occupies 32 bits: a sign bit, S bits for exponent
power delay constant. The implementation of the multiplier
and 23 for the mantissa. It also specifies standards for
module has been done in a top down approach. The sub-modules
have been written in Verilog HDL and then synthesized and arithmetic operations and rounding algorithms.
simulated using the Xilinx ISE 12.1 targeted on the Spartan 3E The rest of the paper is organized as follows. Section 2
FPGA. presents the proposed floating point multiplier design and
explains the architectural details. Section 3 lists the progress
I. INTRODUCTION and proposals to achieve the objective of the paper. The
implementation is described in section 4 and with section 5 we
With the advent of technology, the demand of high-speed conclude the paper.
digital systems is on the rise and the multiplier is a ubiquitous
unit in almost every digital system. Compared to other II. FLOATING POINT MULTIPLIER DESIGN
operations in an arithmetic logic unit the multiplier consumes
more time and power. Hence researchers have always been The floating point multiplication is carried out in three parts
trying to design multipliers which incorporate an optimal [2]:
combination in terms speed, power and area. In the first part, we determine the sign of the product by
In computing, floating point describes a method of performing a xor operation on the sign bits of the two
representing real numbers in a way that can support a wide operands.
range of values. Floating point units are widely used in a In the second part, the exponent bits of the operands are
dynamic range of engineering and technology applications. passed to an adder stage and a bias of 127 is subtracted from
This demands for the development of faster floating point the obtained output. The addition and bias subtraction
arithmetic circuits. operations are both implemented using 8-bit kogge-stone
In this paper we propose an architecture for a fast floating adders. Overflow and underflow conditions are indicated by
point multiplier compliant with the single precision IEEE 754- setting the respective flags.
200S standard. The major issue in the implementation of high In the third and most important stage, we find the product of
speed multiplier circuit is the delay due to the propagation of the mantissa bits. The multiplication of mantissa bits is
carry in every component used in its design. In this proposed performed in the following stages.
architecture we are trying to minimize the carry propagation A. Partial product generator: There are various ways of
at every level possible. generating partial products for a given multiplier [3]. The ones
Modern Field Programmable Gate Arrays (FPGAs) are a that we have considered are booth encoding and radix-4 booth
suitable solution that provides thousands of logic elements and encoding. The radix-4 booth encoding was found to be faster
dedicated blocks as well as several desired properties such as so it has been implemented in the final multiplier architecture.
intrinsic parallelism, flexibility, low cost and customizable The output of this stage is twelve partial products.
approaches. All this allows for a better performance and B. Partial product accumulator: The 24-bit partial products
accelerated execution of the involved algorithms. FPGAs are obtained from the previous stage are shifted appropriately in a
quickly becoming suitable for major floating point shifter module and then accumulated using multi-operand tree
computations. adders like Wallace tree, dadda tree, overturned stairs tree and
To attain a generic design, Verilog hardware description 4:2 compressor tree. In our design we have used the Wallace
language was used for design entry of the entire multiplier unit tree structure which comprises of carry-save adders. Use of
as it presents a tremendous productivity improvement for carry-save adders greatly reduces the carry propagation time of
circuit designers and descriptions of large circuits can be this stage.
written in a relatively compact and concise form. C. Final stage adder: The 4S-bit sum and carry outputs
Over the years, several different floating-point obtained from the partial product accumulator are added in the
representations have been used in computers; however, for the final stage adder to give the product of the mantissas. This
last ten years the most commonly encountered representation is stage calls for the implementation of adders with less delay and
that defined by the IEEE Standard for Floating-Point greater speed. Studying and comparing the power and delay
Arithmetic (IEEE 754) [1]. It is a technical
characteristics of various adders, we concluded that the Kogge The same implementation has been compared with that on the
Stone adder is the fastest. target device of Virtex4 family.

X[3l) Y[3l) X[30:23) Y[30:23) X[22:0) Y[22:0) IV. RESULTS

The following figure shows the functional simulation of the

multiplier. It takes two 32-bit floating point numbers and gives
their resultant product of 32-bits.

_
....-� .. �.,.
-0,",
•••
!'4" •
--
" ...
.. ....
" ....

Fig.2. Simulation of Multiplier

The multiplier module was implemented on two families of

Xilinx FPGA devices- Spartan 3E and Virtex4. The design and
timing information of the multiplier sub-modules have been
summarized in the following tables. Comparing the
information in these tables we find that the proposed multiplier
has a lower delay when implemented on the Virtex4 FPGA.
Fig.l. Multiplier module
TABLE I: SYNTHESIS REPORT OF MODIFIED BOOTH ENCODER
D. Normalization and rounding: In this stage, the product of
the mantissas is normalised and truncated. To do so, the Spartan3E Virtex4
leading-one is detected and the exponent is adjusted xc3s500E xc4vlx15
accordingly. The leading one is the implied bit and hence No. of 68/4656 (1%) 68/6144 (1%)
dropped. The remaining bits are truncated to a 26-bit value. A Slices
few extra bits from the truncated value are used for accuracy No. of 123/9312 (1%) 123/12288(1 %)
and extra precision namely the guard, round and sticky bits [4]. LUTs
The truncated value is finally rounded off using the rounding to
Delay 11.306 ns 7.244 ns
nearest even technique to give the 23 bit mantissa of the
product.
TABLE 2: SYNTHESIS REPORT OF KOGGE-STONE ADDER
To avoid unnecessary calculations in the event of occurrence
of zero in the input, a zero detect block is included in the Spartan3E Virtex4
multiplier architecture. xc3s500E xc4vlx15
No. of 204/4656 (4%) 205/6144 (3%)
III. IMPLEMENTATION
Slices
No. of 357/9312 (3%) 358112288(2%)
The main objective of this paper is to increase the multiplier
LUTs
speed by minimizing the overall delay. As is obvious from our
Delay 18.946 ns 11.097 ns
proposed architecture, almost every module is built on the
fundamental unit of an adder. So, we surveyed the different
TABLE 3: SYNTHESIS REPORT OF WALLACE TREE
fast adders available. Studying and comparing the power and
delay characteristics of the adders, we concluded that the
Spartan3E Virtex4
Kogge-Stone adder is the fastest and then proceeded to
xc3s500E xc4vlx15
implement the same at every stage.
No. of 515/4656 (11%) 538/6144 (8%)
The various sub modules of the single precision floating
Slices
point multiplier have been individually designed in verilog
No. of 895/9312 (10%) 935112288(7%)
HDL, synthesized and simulated using the Xilinx ISE 12.1
LUTs
targeted on the Spartan 3E FPGA. The blocks have then been
Delay 13.708 ns 8.825 ns
integrated to form the complete architecture of the multiplier.
V. CONCLUSION
TABLE 4: SYNTHESIS REPORT OF MANTISSA MULTIPLIER

We have designed an architecture for a fast floating point

Spartan3E Virtex4 multiplier based on the IEEE-754 single precision format. The
xc3s500E xc4vlx15 modules are written in Verilog HDL to
No. of 1307/4656(28%) 1306/6144(21%) optimizeimplementation on any FPGA. The design is done in
Slices such a way that the floating point unit can be effectively
No. of 2332/9312(25%) 2329112288(18%) interfaced with any processor of 32-bit. The main idea is to
LUTs increase the speed on the multiplier by reducing delay at every
Delay 28.600 ns 16.316 ns stage using the optimal adder design. We plan to extend this
work to design a fast floating point arithmetic logic unit.
TABLE 5: SYNTHESIS REPORT OF FINAL MULTIPLIER
VI. REFERENCES
Spartan3E Virtex4
xc3s500E xc4vlx15 [1] IEEE standards board, IEEE standard for floating-point arithmetic, 2008
No. of 1269/4656(27%) 1269/6144(20%) [2] Paschalakis, S., Lee, P., "Double Precision Floating-Point Arithmetic on
Slices FPGAs", In Proc. 2003, 2nd IEEE International Conference on Field
No. of 2270/9312(24%) 2270112288(18%) Programmable Technology (FPT '03), Tokyo, Japan, Dec. 15-17, pp.
352-358, 2003
LUTs
34.333 ns
[3] Hamacher, Carl, Vranesic, Zvonko, Zaky, Safwat, "Computer
Delav 18.783 ns
Organization" Fifth Edition, pp. 367-390
[4] Hamid, L.S.A., Shehata, K., El-Ghitani,H., ElSaid,M.,"Design of Generic
Floating Point Multiplier and Adder/Subtractor Units" , 12'h International
Conference on Computer modelling and Simulation, 2010, pp.615-618

10.4.4 Lab - Build A Switch and Router Network - ILM
100% (1)
10.4.4 Lab - Build A Switch and Router Network - ILM
14 pages
Design of Low-Area and High Speed Pipelined
No ratings yet
Design of Low-Area and High Speed Pipelined
6 pages
An FPGA Implementation of High Speed and Area Efficient Double-Precision Floating Point Multiplier Using Urdhva Tiryagbhyam Technique
No ratings yet
An FPGA Implementation of High Speed and Area Efficient Double-Precision Floating Point Multiplier Using Urdhva Tiryagbhyam Technique
6 pages
S S 32-B M C D: Imulation and Ynthesis of IT Ultiplier Using Onfigurable Evices
No ratings yet
S S 32-B M C D: Imulation and Ynthesis of IT Ultiplier Using Onfigurable Evices
8 pages
VLSI Implementation of Bit Serial Architecture Based Multiplier in Floating Point Arithmetic
No ratings yet
VLSI Implementation of Bit Serial Architecture Based Multiplier in Floating Point Arithmetic
6 pages
Finalpublishedpaperoriginal PDF
No ratings yet
Finalpublishedpaperoriginal PDF
10 pages
High Performance FPGA Based Floating Point Arithmetics: Project Report For Computer Arithmetic Algorithms
No ratings yet
High Performance FPGA Based Floating Point Arithmetics: Project Report For Computer Arithmetic Algorithms
10 pages
International Journal of Engineering Research and Development
No ratings yet
International Journal of Engineering Research and Development
7 pages
Implementation of A High Speed Single Precision Floating Point Unit Using Verilog
No ratings yet
Implementation of A High Speed Single Precision Floating Point Unit Using Verilog
5 pages
10 1 1 961 4530 PDF
No ratings yet
10 1 1 961 4530 PDF
5 pages
Implementation of Floating Point Multiplier
No ratings yet
Implementation of Floating Point Multiplier
4 pages
M. Al-Ashrafy, A. Salem, and W. Anis, An Efficient Implementation of Floating
No ratings yet
M. Al-Ashrafy, A. Salem, and W. Anis, An Efficient Implementation of Floating
6 pages
An Efficient Implementation of Oating Point Multiplier: Conference Paper
No ratings yet
An Efficient Implementation of Oating Point Multiplier: Conference Paper
6 pages
2174 PDF
No ratings yet
2174 PDF
7 pages
Floating Point Adder
No ratings yet
Floating Point Adder
14 pages
Design and Implementation of Fast Floating Point Multiplier Unit
No ratings yet
Design and Implementation of Fast Floating Point Multiplier Unit
5 pages
Efficient Implementation of Pipelined Double Precision Floating Point Unit On FPGA
No ratings yet
Efficient Implementation of Pipelined Double Precision Floating Point Unit On FPGA
6 pages
34 PDF
No ratings yet
34 PDF
4 pages
havaldar2016
No ratings yet
havaldar2016
5 pages
31_Design_JJ_new
No ratings yet
31_Design_JJ_new
8 pages
Implementation of Double Precision Floating Point Radix-2 FFT Using VHDL
No ratings yet
Implementation of Double Precision Floating Point Radix-2 FFT Using VHDL
7 pages
Design and Implementation of A High Performance Floating
No ratings yet
Design and Implementation of A High Performance Floating
15 pages
Bhattacharjee 2011
No ratings yet
Bhattacharjee 2011
5 pages
Manage-Implementation of Floating - Bhagyashree Hardiya
No ratings yet
Manage-Implementation of Floating - Bhagyashree Hardiya
6 pages
Hardware Implementation of 24-Bit Vedic Multiplier
No ratings yet
Hardware Implementation of 24-Bit Vedic Multiplier
5 pages
A High Performance and Full Utilization Hardware Implementation of Floating Point Arithmetic Units
No ratings yet
A High Performance and Full Utilization Hardware Implementation of Floating Point Arithmetic Units
4 pages
Design and Synthesizing of Floating Point Adder andMultiplier using Cadence RTL Compiler
No ratings yet
Design and Synthesizing of Floating Point Adder andMultiplier using Cadence RTL Compiler
6 pages
Hardware Algorithm For Variable Precision Multiplication On FPGA
No ratings yet
Hardware Algorithm For Variable Precision Multiplication On FPGA
4 pages
Simran Thesis
No ratings yet
Simran Thesis
70 pages
Synthesis of Single Precision Floating Point ALU: Department of Electronics and Communication Engineering
No ratings yet
Synthesis of Single Precision Floating Point ALU: Department of Electronics and Communication Engineering
20 pages
Design and Implementation of Floating Point ALU With Parity Generator Using Verilog HDL
No ratings yet
Design and Implementation of Floating Point ALU With Parity Generator Using Verilog HDL
6 pages
Design and Implementation of FPU For Optimised Speed: R. Bhuvanapriya, Menakadevi T
No ratings yet
Design and Implementation of FPU For Optimised Speed: R. Bhuvanapriya, Menakadevi T
12 pages
DSP48E Efficient Floating Point Multiplier Architectures On FPGA
No ratings yet
DSP48E Efficient Floating Point Multiplier Architectures On FPGA
6 pages
Architecture and Design of Generic IEEE-754 Based Floating Point Adder, Subtractor and Multiplier
No ratings yet
Architecture and Design of Generic IEEE-754 Based Floating Point Adder, Subtractor and Multiplier
5 pages
Design of Single Precision Floating Point Multiplication Algorithm With Vector Support
No ratings yet
Design of Single Precision Floating Point Multiplication Algorithm With Vector Support
8 pages
Comparative Study of Single Precision Floating Point Division Using Different Computational Algorithms
No ratings yet
Comparative Study of Single Precision Floating Point Division Using Different Computational Algorithms
9 pages
esda 3rd
No ratings yet
esda 3rd
4 pages
Performance Evaluation of Fixed-Point Array Multipliers On Xilinx Fpgas
No ratings yet
Performance Evaluation of Fixed-Point Array Multipliers On Xilinx Fpgas
5 pages
algorithms-14-00198
No ratings yet
algorithms-14-00198
21 pages
IJSPR_5901_30318
No ratings yet
IJSPR_5901_30318
5 pages
Major PPT Batch - 13
No ratings yet
Major PPT Batch - 13
28 pages
Design and Implementation of Single Precision Pipelined Floating Point Co-Processor
No ratings yet
Design and Implementation of Single Precision Pipelined Floating Point Co-Processor
4 pages
Project Report Vlsi
No ratings yet
Project Report Vlsi
33 pages
IJSPR_1203_438 (1)
No ratings yet
IJSPR_1203_438 (1)
4 pages
Floating Point Multiplier
100% (1)
Floating Point Multiplier
14 pages
32 Bit Floating Point ALU
80% (5)
32 Bit Floating Point ALU
7 pages
32 Bit Floating Point ALU
0% (1)
32 Bit Floating Point ALU
7 pages
electronics-12-00605-v2
No ratings yet
electronics-12-00605-v2
19 pages
Implementation of Binary To Floating Point Converter Using HDL
No ratings yet
Implementation of Binary To Floating Point Converter Using HDL
41 pages
Floating Point Ieee
No ratings yet
Floating Point Ieee
4 pages
Design of Double Ieee Precision
No ratings yet
Design of Double Ieee Precision
9 pages
Floating-Point Multiplication Unit With 16-Bit Significant and 8-Bit Exponent
No ratings yet
Floating-Point Multiplication Unit With 16-Bit Significant and 8-Bit Exponent
6 pages
Design and Simulation of Radix-8 Booth Multiplier For Signed and Unsigned Numbers Using VHDL
No ratings yet
Design and Simulation of Radix-8 Booth Multiplier For Signed and Unsigned Numbers Using VHDL
51 pages
Ijcaes Cse 2012 031
No ratings yet
Ijcaes Cse 2012 031
4 pages
Synthesis of Area Optimized 64 Bit Double Precision Floating Point Multiplier Using VHDL
No ratings yet
Synthesis of Area Optimized 64 Bit Double Precision Floating Point Multiplier Using VHDL
4 pages
Implementation of 32 Bit Floating Point MAC Unit To Feed Weighted Inputs To Neural Networks
No ratings yet
Implementation of 32 Bit Floating Point MAC Unit To Feed Weighted Inputs To Neural Networks
4 pages
B1 Group3
No ratings yet
B1 Group3
13 pages
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
From Everand
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
S. R. Jena
No ratings yet
Stack Computers: The New Wave
From Everand
Stack Computers: The New Wave
Philip Koopman
No ratings yet
Results Accountability - PPT Notes
No ratings yet
Results Accountability - PPT Notes
72 pages
Logistics Management Assignment - Group 4
No ratings yet
Logistics Management Assignment - Group 4
4 pages
Topographic Map of Welch SW
No ratings yet
Topographic Map of Welch SW
1 page
A General Magnetic Energy Based
No ratings yet
A General Magnetic Energy Based
8 pages
The Future of AI - How AI Is Changing The World - Built in
No ratings yet
The Future of AI - How AI Is Changing The World - Built in
17 pages
Introduction To Information Security: Process Confinement (1/2)
No ratings yet
Introduction To Information Security: Process Confinement (1/2)
19 pages
Sxp 18oc1761fcshp 1 En
No ratings yet
Sxp 18oc1761fcshp 1 En
57 pages
Integrated Professional Skills in Digital Age
No ratings yet
Integrated Professional Skills in Digital Age
16 pages
19BCS2626 Crab Graph
No ratings yet
19BCS2626 Crab Graph
3 pages
Information System Decision-Making: DSS Architecture
No ratings yet
Information System Decision-Making: DSS Architecture
7 pages
Installation and Operation Manual: 2301A Speed Control With Ma Speed Setting Input
No ratings yet
Installation and Operation Manual: 2301A Speed Control With Ma Speed Setting Input
40 pages
Topic 1. Number and Algebra
No ratings yet
Topic 1. Number and Algebra
107 pages
Safal Training Technical PDF
No ratings yet
Safal Training Technical PDF
18 pages
SRNS Relocation in UMTS Network
No ratings yet
SRNS Relocation in UMTS Network
8 pages
Transforming The Digital Architecture of Planning
No ratings yet
Transforming The Digital Architecture of Planning
35 pages
Benesphera H33S: Minimum Size, Maximum Capability
100% (1)
Benesphera H33S: Minimum Size, Maximum Capability
6 pages
End of Service Life Update Bulletin For Ers 8300 8600 8800 1
No ratings yet
End of Service Life Update Bulletin For Ers 8300 8600 8800 1
7 pages
Introduction To Arduino BSU Presentation
100% (1)
Introduction To Arduino BSU Presentation
51 pages
Information Management Maturity Assessment: # Questions Response
No ratings yet
Information Management Maturity Assessment: # Questions Response
12 pages
Project On DC Motor
No ratings yet
Project On DC Motor
18 pages
240105-Blacklane-REST Connector-Core Specification-V1.7.0
No ratings yet
240105-Blacklane-REST Connector-Core Specification-V1.7.0
68 pages
C6 - Adc2
No ratings yet
C6 - Adc2
36 pages
T3 Kview
100% (1)
T3 Kview
91 pages
NewHeritageDollCompany Questions Post
No ratings yet
NewHeritageDollCompany Questions Post
2 pages
Managing Social Media PDF
No ratings yet
Managing Social Media PDF
40 pages
Sync Scan
100% (1)
Sync Scan
12 pages
Httpsportal - Upr.edurumcoursesindex - PHPC RUM 2019S1 INME4005 017&a Student File&r 464976
No ratings yet
Httpsportal - Upr.edurumcoursesindex - PHPC RUM 2019S1 INME4005 017&a Student File&r 464976
2 pages
DATRON D5 Extended Brochure
No ratings yet
DATRON D5 Extended Brochure
44 pages
IBM 4247 Model V03 Service Manual
No ratings yet
IBM 4247 Model V03 Service Manual
567 pages

FPGA Design of A Fast 32-Bit Floating Point

Uploaded by

FPGA Design of A Fast 32-Bit Floating Point

Uploaded by

FPGA Design of a Fast 32-bit Floating Point

X[3l) Y[3l) X[30:23) Y[30:23) X[22:0) Y[22:0) IV. RESULTS

The following figure shows the functional simulation of the

Fig.2. Simulation of Multiplier

The multiplier module was implemented on two families of

We have designed an architecture for a fast floating point

You might also like