0% found this document useful (0 votes)
145 views

Floating Point Multiplier

Nothing

Uploaded by

Pruthvitej Ranga
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views

Floating Point Multiplier

Nothing

Uploaded by

Pruthvitej Ranga
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ISSN 2321-8665

Vol.03,Issue.11,
WWW.IJITECH.ORG December-2015,
Pages:2107-2112

An Efficient Implementation of Floating Point Multiplier using Verilog


G. SRUTHI1, M. RAJENDRA PRASAD2
1
PG Scholar, Dept of ECE, VidyaJyothi Institute of Technology, JNTU Hyderabad, Telangana, India,
Email: [email protected].
2
Assoc Prof, Dept of ECE, VidyaJyothi Institute of Technology, JNTU Hyderabad, Telangana, India,
Email: [email protected].

Abstract: To represent very large or small values, large to floating point convertor for single exactness bits which can
range is required as the integer representation is no longer solve this issue to anextend. The convertor is predicated on
appropriate. These values can be represented using the IEEE- IEEE single exactness format and this is often thirty two bits
754 standard based floating point representation. Floating wide. Numerous modules square measure written
point multiplication is a most widely used operation in victimisation Verilog Hardware Description Language [6]
DSP/Math processors, robots, air traffic controller, digital and is simulated with the assistance of Xilinx. They‟re then
computers. Because of its vast areas of application, the main synthesized victimisation Xilinx Integrated Software
emphasis is on the implementing it effectively such that it Environment (ISE) design suite. The work has been disbursed
uses less combinational delay with high Speed This project at Centre for Development and Advanced Computing (C-
implements high speed implementation of a floating point DAC) wherever a 32- bit floating point adder/subtractor
arithmetic unit which can perform multiplication function on module [3] in line with IEEE - 754 format is already enforced
32-bit operands. Multiplication is one of the common and is presently in use for several specific applications. The
arithmetic operations; this floating point multiplication projected convertor may be further into the already existing
handles various conditions like overflow, underflow, adder/subtractor to urge the total practicality of the design[9].
normalization, rounding. In this project we use IEEE the basic distinction between fastened and floating purpose
rounding method for perform the rounding of the resulted digital signal processors (DSPs) is their various numeric
number. This project reviews the implementation of an IEEE illustration of knowledge. Whereas fastened purpose
754 single precision floating point multiplier the multiplier hardware performs strictly number arithmetic, floating
implementation handles the overflow and underflow cases. purpose DSPs support number or real arithmetic, the latter
Pre-normalization unit and post normalization units are also normalized within the style of scientific notation. A 32-bit,
discussed along with exceptional handling. All the functions binary floating purpose DSP [5], supporting industry-
are built by feasible efficient algorithms with several changes standard, single exactness operations, provides bigger
incorporated that can improve overall latency, and if accuracy and bigger exactness than fastened purpose devices
pipelined then higher throughput. The algorithms are thanks to its wider word breadth, operation and actual internal
modeled in Verilog HDL, the RTL code for multiplier is representations of knowledge. Fastened purpose devices had
synthesized using Xilinx and the multiplier is simulated using to implement real arithmetic indirectly through package
Model Sim. routines that add recursive directions [2] and development
time, whereas with floating purpose format, real arithmetic
Keywords: Normalization Unit, Higher Throughput, Verilog may well be coded directly into hardware operations. So, this
HDL, Adder, Multiplier. thesis emphasizes on utilizing the capabilities of floating
purpose format. The binary input given can vary from 0-256
I. INTRODUCTION bits,that is that the most input vary which will be provided
The demand for floating point arithmetic operations in most which can satisfy the exponent zero in the thirty two bit IEEE
of the business, financial and web based mostly applications 754 single precision format.
is increasing day by day. So it becomes essential to seek out
an option to feed binary numbers directly as input for these II. BINARY TO FLOATING POINT CONVERSION
applications. This helps in saving time and is way easier. Converting a base ten complex quantity into associate
within the current situation, this is often unattainable, degree IEEE 754 binary32 format [4] exploitation the
because, within the adder/subtractor, inputs ought to lean in subsequent outline:
IEEE 754 format [1]. The binary inputs can't be given in and  Consider a true range with associate degree number and
of itself, however it must be reborn to the sign, exponent and a fraction half like twelve.375.
mantissa form, regarding that, are going to be delineate very  Convert and normalize the number half into binary.
well later. Hence during this project we've enforced a binary

Copyright @ 2015 IJIT. All rights reserved.


K.VIKRAM, SATHISH BANDARU
 Convert the fraction half exploitation the subsequent Z = (-1S) * 2 (E - Bias) * (1.M) (1)
methodology shown below. Where M = m222-1 + m212-2 + m202-3+…+ m12-22+ m0 2-23;
 Add 2 results and modify them to provide a correct final
conversion. Bias = 127.

Conversion of the aliquot half is finished as shown below: Multiplying two numbers in floating point format is done
contemplate zero.375, the aliquot a part of twelve.375. To by 1- adding the exponent of the two numbers then
convert it into a binary fraction, multiply the fraction by two, subtracting the bias from their result, 2- multiplying the
take the number half and re-multiply new fraction by two till significand of the two numbers, and 3- calculating the sign by
a fraction of zero is found or till the preciseness limit is XORing the sign of the two numbers. In order to represent
reached that is twenty three fraction digits for IEEE 754 the multiplication result as a normalized number there should
binary32 format. be 1 in the MSB of the result (leading one). Floating-point
0.375 x two = zero.750 = zero + zero.750 => b-2 = zero, implementation on FPGAs has been the interest of many
the number half represents the binary fraction digit. Next step researchers. In [2], an IEEE 754 single precision pipelined
is to re-multiply zero.750 by two to proceed. floating point multiplier was implemented on multiple
0.750 x two = one.500 = one + zero.500 => b-2 = one FPGAs (4 Actel A1280).
0.500 x two = one.000 = one + zero.000 => b-2 = one,
A. Floating Point Multiplication Algorithm
fraction = zero.000, terminate.
As stated in the introduction, normalized floating point
numbers have the form of Z = (-1S) * 2 (E - Bias) * (1.M).
We see that (0.375)10 will be precisely described in binary
Tomultiply two floating point numbers the following is done:
as (0.011)2. Not all decimal fractions will be described in a
1. Multiplying the significand; i.e. (1.M1*1.M2)
very finite digit binary fraction. As an example decimal
2. Placing the decimal point in the result
zero.1 cannot be described in binary precisely. Therefore it's
3. Adding the exponents; i.e. (E1 + E2 – Bias)
solely approximated.
4. Obtaining the sign; i.e. s1xor s2
Therefore (12.375)10 = (12)10 + (0.375)10 = (1100)2 +
5. Normalizing the result; i.e. obtaining 1 at the MSB ofthe
(0.011)2 = (1100.011)2
results‟ significand
• Conjointly in IEEE 754 binary32 format at real values
6. Rounding the result to fit in the available bits
got to be described in normalized form
7. Checking for underflow/overflow occurrence
• Hence it becomes one.100011 x twenty three from this
• The exponent is three (and within the biased kind it's so
Consider a floating point representation similar to the
127+3=130 = (1000 0010)2).
IEEE754 single precision floating point format, but with a
• The fraction is 100011 (looking to the proper of the
reducednumber of mantissa bits (only 4) while still retaining
binary point)
the hidden
The ensuing thirty two bit IEEE 754 binary32 format „1‟ bit for normalized numbers:
illustration of twelve.375 as: 0-10000010-100011000000
00000000000 = 41460000H. A = 0 10000100 0100 = 40, B = 1 10000001 1110 = -7.5

III. SINGLE PRECISION FLOATING POINT To multiply A and B


MULTIPLIER
Floating point numbers are one possible way of 1. Multiply significand: 1.0100
representing real numbers in binary format; the IEEE 754 [1]
standard presents two different floating point formats, Binary × 1.1110
interchange format and Decimal interchange format.
Multiplying floating point numbers is a critical requirement ___________
for DSP applications involving large dynamic range. This
paper focuses only on single precision normalized binary 00000
interchange format. Fig.1 shows the IEEE 754 single
10100
precision binary format representation; it consists of a one bit
sign (S), an eight bit exponent (E), and a twenty three bit 10100
fraction (M or Mantissa). An extra bit is added to the fraction
to form what is called the significand1. If the exponent is 10100
greater than 0 and smaller than 255, and there is 1 in the MSB
of the significand then the number is said to be a normalized 10100
number;In this case the real number is represented by (1)
___________

1001011000

2. Place the decimal point: 10.01011000


Figure1. IEEE single precision floating point format

International Journal of Innovative Technologies


Volume.03, Issue No.11, December-2015, Pages: 2107-2112
An Efficient Implementation of Floating Point Multiplier using Verilog
3. Add exponents: 10000100 numbers and results in a 16 bit product, which we will call
the intermediate product (IP). The IP is represented as (15
+ 10000001 down to 0) and the decimal point is located between bits 15
and 14 in the IP. The following sections detail each block of
___________ the floating point multiplier.
100000101 IV.DADDA MULTIPLIER (MANTISSA
MULTIPLICATION)
The exponent representing the two numbers is already Dadda proposed a sequence of matrix heights that are
shifted/biased by the bias value (127) and is not the true predetermined to give the minimum number of reduction
exponent; i.e. EA = EA-true + bias and EB = EB-true + bias stages. To reduce the N by N partial product matrix, dada
multiplier develops a sequence of matrix heights that are
And
found by working back from the final two-row matrix. In
EA + EB = EA-true + EB-true + 2 bias order to realize the minimum number of reduction stages, the
height of each intermediate matrix is limited to the least
So we should subtract the bias from the resultant exponent integer that is no more than 1.5 times the height of its
otherwise the bias will be added twice. successor. The process of reduction for a dadda multiplier [4]
is developed using the following recursive algorithm.
100000101 1. Let d1=2 and dj+1 = [1.5*dj], where dj is the matrix
height for the jth stage from the end. Find the smallest j
- 01111111 such that at least one column of the original partial
product matrix has more than dj bits.
____________ 2. In the jth stage from the end, employ (3, 2) and (2, 2)
counter to obtain a reduced matrix with no more than dj
10000110 bits in any column.
3. Let j = j-1 and repeat step 2 until a matrix with only two
4. Obtain the sign bit and put the result together: rows is generated.
1 10000110 10.01011000
This method of reduction, because it attempts to compress
5. Normalize the result so that there is a 1 just before the each column, is called a column compression technique.
radix point (decimal point). Moving the radix point one Another advantage of utilizing Dadda multipliers is that it
place to the left increments the exponent by 1; moving one utilizes the minimum number of (3, 2) counters. For
place to the right decrements the exponent by 1. Daddamultipliers there are N2 bits in the original partial
product matrix and 4.N-3 bits in the final two row matrix.
1 10000110 10.01011000 (before normalizing) Since each (3, 2) counter takes three inputs and produces two
outputs, the number of bits in the matrix is reduced by one
1 10000111 1.001011000 (normalized) with each applied (3, 2) counter therefore}, the total number
of (3,2) counters is #(3, 2) = N2 – 4.N+3 the length of the
The result is (without the hidden bit): carry propagation adder is CPA length = 2.N–2. The 8 by 8
multiplier takes 4 reduction stages, with matrix height 6, 4, 3
1 10000111 00101100 and 2. The reduction uses 35 (3, 2) counters, 7 (2, 2) counters,
reduction uses 35 (3, 2) counters, 7 (2, 2) counters, and a 14-
6. The mantissa bits are more than 4 bits (mantissa available bit carry propagate adder. The total delay for the generation
bits); rounding is needed. If we applied the truncation of the final product is the sum of one AND gate delay, one (3,
rounding mode then the stored value is: 2) counter delay for each of the four reduction stages, and the
delay through the final 14-bit carry propagate adder arrive
1 10000111 0010.
later, which effectively reduces the worst case delay of carry
propagate adder. The decimal point is between bits 45 and 46
in the significand IR. Critical path is used to determine the
time taken by the Dadda multiplier. The critical path starts at
the AND gate of the first partial products passes through the
full adder of the each stage, then passes through all the vector
merging adders. The stages are less in this multiplier
compared to the carry save multiplier and therefore it has
high speed
Figure2. Floating point multiplier block diagram
A. DADDA Algorithm
In this paper we present a floating point multiplier in which The performance of Mantissa calculation Unit determines
normalizing is implemented. Figure2.shows the multiplier overall performance of theFloating Point Multiplier. The
structure; Exponents addition, Significand multiplication, and picture shown in figure 3 elucidates the DaddaAlgorithm.
Result‟s sign calculation are independent and are done in This algorithm goes on as follows:
parallel. The significand multiplication is done on two 8 bit
International Journal of Innovative Technologies
Volume.03, Issue No.11, December-2015, Pages: 2107-2112
K.VIKRAM, SATHISH BANDARU
1. Multiply (logical AND) each bit of one of the inputs, by
each bit of the otheryielding partial product matrix.
2. Reduce the partial product matrix to two vectors using
series of full and halfadders.
3. This reduction can be done with the help of height of
each level.
4. The height of the preceding level is the height of the
succeeding level * (1.5).
5. Till the height is less than the number of bits in the
operands we are operatingon.
6. Make the vectors into two numbers, and add them with a
conventional multi bitadder.

Fig.4.Flow Graph of Floating Point Multiplier

V. UNDERFLOW/OVERFLOW DETECTION
Overflow/underflow means that the result‟s exponent is too
large/small to be represented in the exponent field. The
exponent of the result must be 11 bits in size, and must be
between 1 and 2048 otherwise the value is not a normalized
one. An overflow may occur while adding the two exponents
or during normalization. Overflow due to exponent addition
may be compensated during subtraction of the bias; resulting
in a normal output value (normal operation). An underflow
Fig.3. Implementation of Dot Diagram for Dadda may occur while subtracting the bias to form the intermediate
Multiplier. exponent. If the intermediate exponent < 0 then it‟s an
underflow that can never be compensated; if the intermediate
B. Normalizer exponent = 0 then it‟s an underflow that may be compensated
The result of the significant multiplication (intermediate during normalization by adding 1 to it. When an overflow
product) must be normalized to have a leading „1‟ just to occurs an overflow flag signal goes high and the result turns
theleft of the decimal point (i.e. in the bit 46 in the to ±Infinity (sign determined according to the sign of the
intermediate product). Since the inputs are normalized floating point multiplier inputs). When an underflow occurs
numbers then the intermediate product has the leading one at an underflow flag signal goes high and the result turns to
bit 46 or 47 ±Zero (sign determined according to the sign of the floating
1. If the leading one is at bit 46 (i.e. to the left of the point multiplier inputs). Denormalized numbers are signaled
decimal point) then the intermediate product is already to Zero with the appropriate sign calculated from the inputs
a normalized number and no shift is needed. and an underflow flag is raised. Assume that E1 and E2 are
2. If the leading one is at bit 47 then the intermediate the exponents of the two numbers A and B respectively. The
product is shifted to the right and the Exponent is result‟s exponent is calculated by below formula.
incremented by 1.
Eresult = E1 + E2 – 1023
The shift operation is done using combinational shift logic
made by multiplexers. Fig 4 shows a simplified logic of E1 and E2 can have the values from 1 to 2047; resulting in E
aNormalizer that has an 8 bit intermediate product input and a result having values from -1021 (2-1023) to 381 (1508-1023);
6bit intermediate exponent input.
International Journal of Innovative Technologies
Volume.03, Issue No.11, December-2015, Pages: 2107-2112
An Efficient Implementation of Floating Point Multiplier using Verilog
but for normalized numbers, Eresult can only have the values B. Synthesis Result
from 1 to 254. 1.Technological View

VI. PIPELINING THE MULTIPLIER


In order to enhance the performance of the multiplier, three
pipelining stages are used to divide the critical path thus
increasing the maximum operating frequency of the
multiplier. The pipelining stages are imbedded at the
following locations:
1. In the middle of the significand multiplier, and in the
middle of the exponent adder (before the bias
subtraction).
2. After the significand multiplier, and after the exponent
adder.
3. At the floating point multiplier outputs (sign, exponent
and mantissa bits).

Fig.6. Technological view of the multiplier

3.RTL diagram

Fig.5. Pipelining stages as dotted lines.

Three pipelining stages mean that there is latency in the


output by three clocks. The synthesis tool “retiming” option
was used so that the synthesizer uses its optimization logic to
better place the pipelining registers across the critical path.

VII. SIMULATION RESULTS


In this chapter all the simulation results which are done
using Xilinx simulator are shown and also synthesis results.

A. Simulation Results Fig.8. Single precision floating point RTL schematic

VIII. CONCLUSION
This paper describes an implementation of a floating point
multiplier using Dadda Multiplier that supports the IEEE
754-2008 binary interchange format. To improve speed
multiplication of mantissa is done using Dadda multiplier
replacing Carry Save Multiplier. The design achieves high
speed with maximum frequency of 526 MHz compared to
existing floating point multipliers.

IX.REFERENCES
[1] Mohamed Al-Ashrfy, Ashraf Salem and WagdyAnis “An
Efficient implementation of Floating Point Multiplier” IEEE
Transaction on VLSI 978-1-4577-0069-9/11@2011
Fig.6. Simulation result for Single precision Floating point IEEE,Mentor Graphics.
multiplier.
International Journal of Innovative Technologies
Volume.03, Issue No.11, December-2015, Pages: 2107-2112
K.VIKRAM, SATHISH BANDARU
[2] B. Fagin and C. Renard, “Field Programmable Gate “Pipeline Floating Point ALU Design using VHDL
Arrays and Floating Point Arithmetic,” IEEE Transactions on “ICSE2002 Proc. 2002, Penang, Malaysia.
VLSI, vol. 2, no. 3, pp. 365-367, 1994. [20] Al-Ashrafy M., Salem A. and Anis W., ―An Efficient
[3] N. Shirazi, A. Walters, and P. Athanas, “Quantitative Implementation of Floating Point Multiplier‖, 2011. [2] Eldon
Analysis of Floating Point Arithmetion FPGA Based Custom A.J., Robertson C., A Floating Point Format For Signal
Computing Machines,” Proceedings of the IEEE Symposium Processing‖, pp. 717-720, 1982.
on FPGAs for Custom Computing Machines (FCCM‟95), [21] Brisebarre N., Muller J.M., ―Correctly Rounded
pp.155-162, 1995. Multiplication by Arbitrary Precision Constants‖, Symposium
[4] L. Louca, T. A. Cook, and W. H. Johnson, on Computer Arithmetic, pp. 1-8, 2005.
“Implementation of IEEE Single Precision Floating Point [22] Enriquez A.B., and JONES K.R., ―Design of a
Addition and Multiplication on FPGAs,” Proceedings of 83 MultiMode Pipelined Multiplier for Floating-point
the IEEE Symposium on FPGAs for Custom Computing Applications‖,pp. 77-81, 1991.
Machines (FCCM‟96), pp. 107-116, 1996. [23] Amaricai A., Vladutiu M., Udrescu M., Prodan L. and
[5] Jaenicke and W. Luk, "Parameterized Floating-Point Boncalo O., ―Floating Point Multiplication Rounding
Arithmetic on FPGAs", Proc. of IEEE ICASSP, 2001, vol. 2, Schemes for Interval Arithmetic‖, pp. 19-24, 2008.
pp.897-900. [24] Louca L., Cook T.A. and Johnson W.H.,
[6] Whytney J. Townsend, Earl E. Swartz, “A Comparison of ―Implementation of IEEE Single Precision Floating Point
Dadda and Wallace multiplier delays”. Computer Addition and Multiplication on FPGAs‖, pp. 107-116, 1996.
Engineering Research Center, the University of Texas. [25] Awan M.A., Siddiqui M.R., ―Resolving IEEE
[7] B. Lee and N. Burgess, “Parameterisable Floating-point FloatingPoint Error using Precision-Based Rounding
Operations on FPGA,” Conference Record of the Thirty- Algorithm‖, pp. 329-333, 2005.
Sixth Asilomar Conference on Signals, Systems, and [26] Fagin B., Renard C., ―Field Programmable Gate Arrays
Computers, 2002. and Floating Point Arithmetic‖, pp. 365-367, Vol. 2, No. 3,
[8] Xilinx13.4, Synthesis and Simulation Design Guide”, 1994.
UG626 (v13.4) [27]Nhon T. Quach, Member, IEEE, Naofumi Takagi, Senior
January 19, 2012. Member, IEEE, and Michael J. Flynn, Fellow, IEEE”
[9] “DesignChecker User Guide”, HDL Designer Series Systematic IEEE Rounding Method for High-Speed Floating-
2010.2a, Mentor Graphics, 2010. Point Multipliers” IEEE transactions on very large scale
[10] “PrecisionR Synthesis User‟s Manual”, Precision RTL integration (vlsi) systems, vol. 12, no. 5, may 2004.
plus 2010a update 2, Mentor Graphics, 2010. [28] Report on Efficient Floating Point 32-bit single Precision
[11] Patterson, D. & Hennessy, J. (2005), Computer Multipliers Design using VHDL by Dr. Raj Singh, Group
Organization and Design: The Hardware/software Interface , Leader, VLSI Group, CEERI, Pilani.
Morgan Kaufmann. [29] Loucas Louca, Todd A. Cook, William H. Johnson,
[12] John G. Proakis and Dimitris G. Manolakis (1996), “Implementation of IEEE Single Precision Floating Point
“Digital Signal Processing: Principles,.Algorithms and Addition and Multiplication on FPGAs”, IEEE, Sept. 1996
Applications”, Third Edition. [30] Pardeep Sharma, Ajay Pal Singh,“Implementation of
[13] L. Song, K.K. Parhi, “Efficient Finite Field Serial/ Floating Point Multiplier on Reconfigurable Hardware and
Parallel Multiplication”, Proc. of International Conf. on Study its Effect on 4 input LUT„s”, International Journal of
Application Specific Systems, Architectures and Processors, Advanced Research in Computer Science and Software
pp. 72-82, Chicago, USA, 1996. Engineering, Volume 2, Issue 7, July 2012, pp 244-248.
[14] P. E. Madrid, B. Millar, and E. E. Swartzlander,
“Modified Booth algorithm for high radix fixed- point
multiplication,” IEEE Trans. VLSI Syst., vol. 1, no. 2, pp.
164-167, June 1993.
[15] A Booth, “A signed binary multiplication technique,” Q.
J. Me& Appl. March., vol. 4, pp. 236-240, 19.51.
[16] Nhon T. Quach, Member, IEEE, Naofumi Takagi, Senior
Member, IEEE, and Michael J. Flynn, Fellow, IEEE”
Systematic IEEE Rounding Method for High-Speed Floating-
Point Multipliers” IEEE transactions on very large scale
integration (vlsi) systems, vol. 12, no. 5, may 2004.
[17] Report on Efficient Floating Point 32-bit single Precision
Multipliers Design using VHDL by Dr. Raj Singh, Group
Leader, VLSI Group, CEERI, Pilani.
[18] Xianyang Jianga, Peng Xiaoa, Meikang Qiub, Gaofeng
Wanga” Performance effects of pipeline architecture on an
FPGA-based binary32 bit floating point multiplier
“Microprocessors and Microsystems xxx (2013) xxx–xxx.
[19] Mamu Bin Ibne Reaz, MEEE, Md. Shabiul Islam,
MEEE, Mohd. S. Sulaiman, MEEE Faculty of Engineering,
Multimedia University, 63 100 Cybejaya, Selangor, Malaysia

International Journal of Innovative Technologies


Volume.03, Issue No.11, December-2015, Pages: 2107-2112

You might also like