The Efficient Implementation of An Array Multiplier
The Efficient Implementation of An Array Multiplier
Guoping Wang
Indiana University Purdue University, Fort Wayne
[email protected]
James Shield
University of Oklahoma
0
xi Full 4to1 xi
yi Adder Mux yi
x0y0
si si 0
p1 p0
xj
0
Sout CELL I
yj c out x1y1
CLA ADDER
0
x2y2
xj
p3 p2
yj
Sin c in
cj
CLA ADDER
0 xi=xj
yi=yj
Full
x3y3 s i=s j Adder
x4y4
CELL II
Full xj
F.A. CLA ADDER Adder yj
cj+1
p9 p8 p7 p6 p5 p4
c out s out xiyi
implement
X2Y2
Pass-sum 8 Easily scalable
Parallel Calc 8 Fast and easily X3Y3 Carry
scalable FA
X4Y4 Output
3.2 Minimizing Partial Products To test and verify the performance of the proposed
multiplier, VHDL code was written and simulated using
Because the generated partial products are both non- a VHDL simulator. VHDL code was written and
symmetrical and irregular, a direct implementation of simulated for both the proposed multiplier and published
Wallace trees is not practical for the minimization of the array multipliers. As can be seen in section 2 and 3, this
partial products. However, an adapted Wallace tree design can be easily adapted to any size multiplier. The
format is proposed. This new form is based on carry- design consists of N-1 multiplexer, AND gates, and
save adders just as in Wallace trees. A Wallace tree adders (half and full adders). Because this design has
implementation provides a slightly faster and smaller only four basic components tied with interconnect
implementation; however, this paper does not investigate signals, it is very suitable for VLSI implementation;
the design and implementation of Wallace trees for however, design complexity increases as N approaches
irregular partial products. large values. This is due to the large amount of adders
Looking at the 8-bit example, the number of partial needed to compress the partial products with a height of
products for a given column is not standard from one (N/2+1).
column to its neighboring columns. Therefore, each Table 3 and Table 4 are the comparisons between
column must be treated independently from its the two implementations and the simulation results. The
neighbors. Only the carries from each column will be tables compare the recently published array multiplier,
related. Figure 5 is an example of how adders can be implemented with 8-bit inputs in a Xilinx Virtex FPGA,
combined for a given column to add the partial products. to the improved design with the same input and device
In this example, only full adders are needed; requirements. The proposed 8-bit design does not offer
however, if only two inputs exist to be added, then half significant speed improvement (0.5 ns faster); however,
adders are needed. It can also be seen in this example the reduction in size by nearly a factor of two is a great
how the adders are chained together by the carries from improvement (43% smaller).
the previously columns and the output (sum) of the
adders in the same column.
Table 3. Multiplier Comparison
References:
8-bit Multiplier 16-Bit Multiplier [1] S. D. Pezaris, “A 40-ns 17-bit by 17-bit array
Design # of Max # of Max multipliers,” IEEE Transactions on Computers,
gates Delay gates Delay vol. 20, pp. 442-447, April 1971.
Pekmestzi 1413 57.2ns 2684 108.8ns [2] K.Z. Pekmestzi, “Multiplexer-based array
Array multipliers,” IEEE Transactions on Computers,
Multiplier vol. 48, no. 1, pp. 15-23, Jan. 1999.
Improved 801 56.7ns 1410 92.1ns [3] C. Wallace, “A suggestion for a fast multiplier,”
Array IEEE Transactions on Electronic Computers,
Multiplier vol. 13, pp. 14-17, 1964
Table 4. Multiplier Comparison [4] L. Dadda, “Some schemes for parallel
multipliers,” Alta Frequenza, 34, pp. 349-356,
32-bit Multiplier 64-Bit Multiplier March 1965.
Design # of Max # of Max [5] N. Takagi, H. Yasuura, and S. Yajima, “High-
gates Delay gates Delay speed VLSI multiplicationalgorithm with a
Pekmestzi 5803 282.9ns 14256 508.1ns redundant binary addition tree,” IEEE
Array Transaction on Computers, vol. 34, no. 99, pp.
Multiplier 789-796, Sept, 1985.
Improved 3220 281.7ns 6129 413.0ns [6] H. Makino, Y. Nakase, H. Suzuki, H. Morinaka,
Array H. Shinohara, and K. Mashiko, “An 8.8-ns
Multiplier 54x54-bit multiplier with high speed redundant
binary architecture, " IEEE Journal of Solid-
5 Summary state Circuits, vol. 31, no. 6, pp. 773-783, June
1996.
A new improved array multiplier has been proved to [7] A. Booth, “A signed binary multiplication
be both smaller and faster than the current published techniques,” Quarterly Journal Mechanics of
array multipliers. The cost saving of the proposed Applied Mathematics, vol. 4, pp. 236-240,
implementation comes from a slightly different 1951.
implementation of the multiplexer. Where Pekmestzi [8] L. MacSorley, “High speed arithmetic in binary
designed for a strictly array based multiplier, the computers,” Proc. IRE, vol. 49, Jan. 1961.
proposed multiplier aimed for both speed and minimal [9] Y. Wang, Y. Jiang and E. Sha, “On a area-
size, but it is also scalable to be expanded to other size efficient low power array multipliers”, in 8th
multipliers. The proposed array multiplier is applicable IEEE International Conference on Electronics,
to VLSI and FPGA implementation. Current research is Circuits and Systems, 2001, vol. 3, pp. 1429-
to expand the unsigned number multiplier to signed 1432, Sept, 2001.
numbers.