Hardware Implementation of 24-Bit Vedic Multiplier

Uploaded by

Nguyễn Cường

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Hardware Implementation of 24-Bit Vedic Multiplier

Uploaded by

Nguyễn Cường

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Proceedings of 2018 4th International Conference on Electrical, Electronics and System Engineering,

ICEESE2018

Hardware Implementation of 24-bit Vedic Multiplier

in 32-bit Floating-Point Divider
C R S Hanuman J Kamala
Department of ECE, CEG, Guindy Department of ECE, CEG, Guindy
Anna University Anna University
Chennai, India Chennai, India
[email protected] [email protected]

Abstract—Most of the Digital operations in computing The 24-bit mantissa multiplication (including sticky bit) is
systems performed by using Floating-Point (FP) arithmetic. FP implemented by Vedic Multiplier (Urdhva-Tiryakbhyam
multiplication is widely used arithmetic operation compared to Sutra) technique. This is one of the 16 sutras developed by
addition, subtraction and division operations. Multipliers Swami Bharati, collectively called as Vedic Mathematics [5].
performed using Vedic technique shows higher speed of
Another widely used Sutra is Karatsuba (divide and conquer).
operation with better precision but it occupies slightly more area
compared to conventional multipliers. In this paper, we In this method [6], the larger design divided into smaller
implemented 24-bit Vedic multiplier using Urdhva-Tiryakbhyam modules and performed separately. Various algorithms [7], [8]
(UT) technique with modified Carry Save Adders (CSA). The on Vedic multiplication shows improved performance in terms
proposed high speed multiplier is used for calculating Mantissa of area and speed.
part (24-bit) in single precision FP Division. This method
outperform existing multipliers used for FP Division in terms of The FP Division algorithms are classified into two
speed and accuracy. All the design parameters are evaluated families named Digit Recurrence and Functional Iteration
using VIVADO synthesis tool and results are verified by algorithms. Digit recurrence algorithm generalizes the paper-
simulation. The design was coded in Verilog HDL and is
pencil algorithm which produces one quotient bit per iteration
implemented in NEXYS 4 DDR FPGA kit.
and the result is exact [9]. Functional iteration algorithms use
Keywords—Floating-point, Vedic Multiplier, UT, FPGA. Newton-Raphson iterations for finding the function 1/x. The
number of iterations required is much less than Digit
I. INTRODUCTION recurrence algorithms (O(log p)) but each iteration [10]
Even though, Fixed-point arithmetic is easy to operate in requires more multiplications and subtractions. These
human presence applications, the Floating-point arithmetic has iterations are not exact because the method starts with
its unique advantages in terms of higher dynamic range with approximating the inverse. In this paper, we designed 24-bit
better precision. In performance critical applications, Floating- Vedic multiplier module which is used as submodule in 32-bit
point arithmetic is widely used [1] and it is supported by all FP Multiplier. The Newton-Raphson based FP Division need
computer systems and Digital Signal Processors. The three iterations of 32-bit FP Multiplier and one 32-bit FP
operations performed in FP arithmetic is much complex than Adder (for subtraction).
Fixed-point arithmetic and in particular, FP Division needs
much hardware compared to FP multiplier and FP adder. In FP The remaining paper organized as follows: section II
division, multiplier and adder are used iteratively [2]. For introducing the concept Vedic multiplication (Urdhva
implementing the FP designs in FPGA, we need efficient Tiryakbhyam). Section III discuss about the design of 32-bit
algorithms with optimum utilization of resources and low FP Multiplier using Vedic multiplier. Section IV provides
latency. Almost 40 percent of FP applications [3] use hardware results and simulation analysis. Conclusion and
multipliers. Floating point numbers are represented in single Future work part are presented in section V.
precision (binary32) and double precision (binary64) format.
These formats are standardized as IEEE-754 standard [4]. The II. 24-BIT VEDIC MULTIPLIER
single precision FP number representation is shown in Fig. 1. A. Back ground
The Vedic concepts formulated in the start of 19th Century
but started to implement in digital systems from early 2000.
The proposed method uses UT technique where the operations
are takes place in vertical and crosswise. The 24-bit multiplier
was designed in modular approach, where the fundamental
building block was 3-bit UT multiplier. It is further extended
to 6-bit, 12-bit and finally 24-bit UT multiplier. This technique
Fig.1. Single precision floating point format applied for all type of number systems with same procedure.
For 3-bit Vedic multiplier, let X and Y are 3-bit binary

60
Proceedings of 2018 4th International Conference on Electrical, Electronics and System Engineering,
ICEESE2018

numbers acts as inputs and the output, P is 6-bit number. The The modified 6-bit CSA with 6-bit inputs produces summing
vertical and crosswise operations of 3-bit multiplier needs 9 output shown in Fig. 4.
AND gates, X0Y0 ; X0Y1; X1Y0; X0Y2; X2Y0; X1Y1; X1Y2;
X2Y1; X2Y2 respectively [11]. In UT technique, the ending
digits on each line were multiplied and the output is added to
the carry generated in previous step. Fig. 2 shows the line
diagram for implementing 3-bit UT multiplier, where M0 to
M5 are intermediate signals. High speed CSAs are used in this
multiplier decreases the latency and improves frequency of
operation.

Fig.4. Architecture of 6-bit Carry Save Adder

C. 6-bit UT Multiplier
The design further extended to 12-bit UT multiplier and
finally designed 24-bit UT multiplier using four 12-bit UT
multipliers and three 24-bit carry save adders shown in Fig. 5.

Fig.2. Line diagram of 3-bit UT multiplier

B. 6-bit UT Multiplier
To implement the above logic, 3 Half adders and 3 Full adders
are required. The 6-bit UT multiplier designed from 3-bit UT
multipliers and 6-bit carry save adders is shown in Figure 3. Fig.5. Modular architecture of 24-bit UT Multiplier
The algorithm for implementing 24-bit UT multiplier is shown
below.
Algorithm-1. 24-bit UT multiplier
Inputs: 24 bit inputs A,B
Output: 48 bit result, P
Initialize the output to ‘0’.
Run for loop with variables i and j
Result P(m) = P(m) + a(i) * b(i-j)
Run the loop in reverse from 23 to 1.
Result P(m) = P(m) + a(i) * b(24 - (i-j))
End first loop
Increment m by 1
End second loop
Start for loop again by taking m as variable
Multiplication result updated
Fig.3. Architecture of 6-bit UT multiplier End loop

61
Proceedings of 2018 4th International Conference on Electrical, Electronics and System Engineering,
ICEESE2018

III. 32-BIT FLOATING-POINT MULTIPLIER The value Z0 is normalized fraction part, where the least
The block diagram of the 32-bit FP Multiplier using 24-bit UT significant digits [22:0] are ORed and placed at the LSB. The
Multiplier is shown in Figure 6. The inputs to the multiplier result is then fed into rounding block.
are Single precision FP numbers [12], divided into Sign (1- D. Rounding
bit), Exponent (8-bit) and Mantissa (23-bit) parts as shown in The input to the rounding [13] operation is 27 bit,
Fig. 1.
frac_norm. The rounding operation is performed in four
modes named as round_to_0, round_to_nearest,
round_to_Pinfinity and round_to_Ninfinity. The 2-bit input
rm[1:0] check the corresponding rounding mode if inputs are
specified. The final result from rounding block set to change
the exponent and fraction value.
IV. RESULTS
The multiplier circuit based on Urdhva Tiryakbhyam was
simulated and synthesized in VIVADO 18.1 and implemented
in Nexys 4 DDR FPGA with the help of Analog Discovery.
The experimental setup is shown in Fig. 7.

Fig.6. Architecture of 32-bit FP Multiplier.

A. Pre-processing Block
The sign bit is calculated by taking XOR of the MSB of the
inputs A and B. Check for special cases like zero, infinity and
NaN (Not a Number). Find the functions OR(Exponent A) and
OR(Exponent B). Place the result at MSB (always logic ‘1’,
except for zero) of the Fractional part and make it 24-bit
fraction. Fig.7. Experimental Setup

B. 24-bit UT Multiplier TABLE II provides the device utilization summary and

latency for Wallace, Urdhva Tiryakbhyam [14] and Proposed
The inputs to this module are 24-bit significands of A and UT multipliers.
B. By using Algorithm-1, perform the multiplication using
TABLE II. Optimization parameter of Existing and Proposed methods
high speed Vedic Multiplier specified in Section II. The result
P[47:0] is truncated to 27 bits by normalization. Logic Utilization Wallace
UT Multiplier Proposed UT
[14] Multiplier
C. Normalization
Slice LUTs 1163 1121 1187
This step ensures the result to be exact by properly
Number of I/Os 83 96 81
scaling the higher order bits in 6 steps shown in Table I.
Latency (ns) 47.33 27.76 22.42
TABLE I. Normalization Calculations
On-chip power (w) 0.161 ---- 0.168
Step Zeros Operation

1 Z[5] Not (OR (P[46:15])) From the synthesis reports, number of I/Os reduced by 15%
and the time delay of the proposed method is slightly better
2 Z[4] Not (OR (P[46:31])) than Existing UT multiplier. The area and On-chip power are
close to the existing values. Therefore, the proposed multiplier
3 Z[3] Not (OR (P[46:39]))
achieves better speed with little number of IO’s at the expense
4 Z[2] Not (OR (P[46:43])) of maintaining same area and power dissipation. The
simulation results are shown in Fig. 8 verifies 16 different
5 Z[1] Not (OR (P[46:45])) combinations divided into four sub images that cover entire
6 Z[0] Not (P[46]) range of values. The values are shown in decimal form for
human convenience.

62
Proceedings of 2018 4th International Conference on Electrical, Electronics and System Engineering,
ICEESE2018

Fig.8. Simulation results for 24 bit UT Multiplier

From Fig. 7., the FPGA board provides 16 LED outputs (for TABLE III. List of Input combinations for Hardware results
P[47:32]), Two analog discoveries showing the outputs
P[31:16] and P[15:0] respectively. Fig. 9 shows the snapshots Input A Input B Decimal output Hexadecimal output
of hardware results for 24-bit UT Multiplier. The two inputs A
0 15 0 000000000000
and B are 32-bit numbers directly fed into the code by
multiplexer logic. The 48-bit output (P[47:0]) shown in the 1111111 1111111 1234567654321 011F71F76BB1
Figures are represented in Hexadecimal form. Table III shows
1234567 7654321 9449772114007 898324F6057
the four combinations which were displayed in the hardware
results. 2097152 4194304 8796093022208 80000000000

63
Proceedings of 2018 4th International Conference on Electrical, Electronics and System Engineering,
ICEESE2018

V. CONCLUSION
This paper presents the hardware implementation of the 24-bit
UT multiplier. The proposed design outperforms Wallace and
Karatsuba multipliers in terms of speed and frequency of
operation. However, this design occupies 5% more resources
when compared to other designs. The simulation results are
verified by the real time hardware and the results are exactly
same without any delay. We will extend this work by
designing 54-bit UT multiplier, for double precision floating
point dividers.
References
[1] M. D. Ercegovac and T. Lang, Digital Arithmetic. San
Francisco:Morgan Kaufman, 2004.
[2] K. Scott Hemmert and Keith D. Underwood “Floating Point
Divider Design for FPGAs”, IEEE Transaction on very large scale
integration systems, vol. 15, No. 1, pp. 115-118,Jan 2007.
[3] P. Belanovic and M. Leeser, “A library of parameterized floating point
modules and their use”, in Springer Conf on Field Programmable logic
and Applications, 2002, pp. 657-666
[4] pp. 657–666.https://en.wikipedia.org/wiki/IEEE_floating_point
[5] R. K. Kodali, S. K. Gundabathula, and L. Boppana. “FPGA
implementation of IEEE-754 floating point Karatsuba Multiplier” IEEE
Conf on Control, Instrumentation, Communication and Computational
Technologies, 2014, pp. 300-304
[6] A. Mehta, C. Bidhul, S. Joseph, and P. Jayakrishnan, “Implementation
of single precision floating point multiplier using karatsuba algorithm,”
in International Conference on,Green Computing, Communication and
Conservation of Energy, Dec 2013, pp. 254–256.
[7] M S Athira Menon, R J Renjith,”Implementation of 24-bit High Speed
Floating Point Multiplier” in IEEE Conf on Networks and Advances in
Computational Technologies, 2017, pp. 453-457.
[8] Priyesh Dalmia, Abinav Parashar, Neeta Pandey, “Novel High Speed
Vedic Multiplier Proposal incorporating Adder based on Quaternary
Signed Digit Number System” in IEEE Conf on VLSI Design, 2018, pp.
289-294.
[9] Mohamed anane, Hamid Bessalah ,Mohamed Issad, Nadjia Anane and
Hassen Salhi “Higher radix and redundancy factor for floating point
SRT Division”, IEEE Transaction on very large scale integration
systems, vol. 16, no. 16, pp. 122-128,June 2008.
[10] A. A. Liddicoat, “High performance arithmetic for division and
elementary functions,” Ph.D. dissertation, Stanford University, Stanford,
CA, Feb. 2002.
[11] Y. Bansal, C. Madhu, and P. Kaur, “High speed vedic multiplier
designsa review,” in IEEE Conf on Recent Advances in Engineering and
Computational Sciences, 2014, pp. 1–6.
[12] Prashant S. Howal, Kishore Upla, Mehul Patel, “HDL Implementation
of Digital Filters using Floating Point Vedic Multiplier” in IEEE Conf
on Circuits and Systems, 2017, pp. 274–279.
[13] G. Even and P.-M. Seidel, “A comparison of three rounding algorithms
for ieee floating-point multiplication,” IEEE Transactions on Computers
, vol. 49, no. 7, pp. 638–650, Jul 2000.
[14] Ravi Kishore Kodali, Ravi Bopanna and Sai Y ,”FPGA Implementation
of Vedic Floating Point Multipliers" in IEEE Conf on Signal Processing
Informatics, Communication and Energy Systems, 2015, pp. 1-4