FPGA Design of A Fast 32-Bit Floating Point
FPGA Design of A Fast 32-Bit Floating Point
Multiplier Unit
Anna Jain, Baisakhy Dash, Ajit Kumar Panda, Member, IEEE, Muchharla Suresh, Member, IEEE
Abstract- An architecture for a fast 32-bit floating point standard established by the Institute of Electrical and
multiplier compliant with the single precision IEEE 754-2008 Electronics Engineers (IEEE) and the most widely used
standard has been proposed in this paper. This design intends to standard for floating-point computation, followed by many
make the multiplier faster by reducing the delay caused by the
hardware and software implementations. Single precision
propagation of the carry by implementing adders having the least
representation occupies 32 bits: a sign bit, S bits for exponent
power delay constant. The implementation of the multiplier
and 23 for the mantissa. It also specifies standards for
module has been done in a top down approach. The sub-modules
have been written in Verilog HDL and then synthesized and arithmetic operations and rounding algorithms.
simulated using the Xilinx ISE 12.1 targeted on the Spartan 3E The rest of the paper is organized as follows. Section 2
FPGA. presents the proposed floating point multiplier design and
explains the architectural details. Section 3 lists the progress
I. INTRODUCTION and proposals to achieve the objective of the paper. The
implementation is described in section 4 and with section 5 we
With the advent of technology, the demand of high-speed conclude the paper.
digital systems is on the rise and the multiplier is a ubiquitous
unit in almost every digital system. Compared to other II. FLOATING POINT MULTIPLIER DESIGN
operations in an arithmetic logic unit the multiplier consumes
more time and power. Hence researchers have always been The floating point multiplication is carried out in three parts
trying to design multipliers which incorporate an optimal [2]:
combination in terms speed, power and area. In the first part, we determine the sign of the product by
In computing, floating point describes a method of performing a xor operation on the sign bits of the two
representing real numbers in a way that can support a wide operands.
range of values. Floating point units are widely used in a In the second part, the exponent bits of the operands are
dynamic range of engineering and technology applications. passed to an adder stage and a bias of 127 is subtracted from
This demands for the development of faster floating point the obtained output. The addition and bias subtraction
arithmetic circuits. operations are both implemented using 8-bit kogge-stone
In this paper we propose an architecture for a fast floating adders. Overflow and underflow conditions are indicated by
point multiplier compliant with the single precision IEEE 754- setting the respective flags.
200S standard. The major issue in the implementation of high In the third and most important stage, we find the product of
speed multiplier circuit is the delay due to the propagation of the mantissa bits. The multiplication of mantissa bits is
carry in every component used in its design. In this proposed performed in the following stages.
architecture we are trying to minimize the carry propagation A. Partial product generator: There are various ways of
at every level possible. generating partial products for a given multiplier [3]. The ones
Modern Field Programmable Gate Arrays (FPGAs) are a that we have considered are booth encoding and radix-4 booth
suitable solution that provides thousands of logic elements and encoding. The radix-4 booth encoding was found to be faster
dedicated blocks as well as several desired properties such as so it has been implemented in the final multiplier architecture.
intrinsic parallelism, flexibility, low cost and customizable The output of this stage is twelve partial products.
approaches. All this allows for a better performance and B. Partial product accumulator: The 24-bit partial products
accelerated execution of the involved algorithms. FPGAs are obtained from the previous stage are shifted appropriately in a
quickly becoming suitable for major floating point shifter module and then accumulated using multi-operand tree
computations. adders like Wallace tree, dadda tree, overturned stairs tree and
To attain a generic design, Verilog hardware description 4:2 compressor tree. In our design we have used the Wallace
language was used for design entry of the entire multiplier unit tree structure which comprises of carry-save adders. Use of
as it presents a tremendous productivity improvement for carry-save adders greatly reduces the carry propagation time of
circuit designers and descriptions of large circuits can be this stage.
written in a relatively compact and concise form. C. Final stage adder: The 4S-bit sum and carry outputs
Over the years, several different floating-point obtained from the partial product accumulator are added in the
representations have been used in computers; however, for the final stage adder to give the product of the mantissas. This
last ten years the most commonly encountered representation is stage calls for the implementation of adders with less delay and
that defined by the IEEE Standard for Floating-Point greater speed. Studying and comparing the power and delay
Arithmetic (IEEE 754) [1]. It is a technical
characteristics of various adders, we concluded that the Kogge The same implementation has been compared with that on the
Stone adder is the fastest. target device of Virtex4 family.
_
....-� .. �.,.
-0,",
•••
!'4" •
--
" ...
.. ....
" ....