Design of A Vedic Multiplier Based 64-Bit Multiplier Accumulator Unit 444
Design of A Vedic Multiplier Based 64-Bit Multiplier Accumulator Unit 444
Accumulator Unit
2024 5th International Conference on Innovative Trends in Information Technology (ICITIIT) | 979-8-3503-8681-3/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICITIIT61487.2024.10580179
S Depar
Abstract—VLSI (Very Large-Scale Integration) Design is a become paramount. This work focuses specifically in the
process of designing integrated circuits (ICs) by integrating design and synthesis of a Multiply Accumulate unit (MAC).
millions or even billions of transistors on a single Silicon wafer.
The three main corner stones of VLSI system are area, power The MAC unit from Fig 1. when broken down is basically
and delay. Low-Power VLSI is a niche field in which recent just multiplication and accumulation. It is a very fundamental
advancements are happening. One of the main applications of component in many digital signal processing (DSP) systems
low-power VLSI is a Multiply Accumulate (MAC) unit which is and also in the arithmetic logic unit (ALU) of some
extensively used in signal processing. This brief presents a microprocessors. The adder comes into picture when the result
Verilog implementation of a 64-bit MAC unit implemented of previous multiplier value is stored and then it is added with
using a Vedic sutra Urdhva Tiryagbhyam. The proposed the successive value thus, increasing the overall speed of
methodology has produced 39.7% delay efficient, 32.5% area computation.
efficient and 27.6% power efficient results compared to a
conventional MAC unit. The key applications involving the Multiply-Accumulate
operation include audio processing, image processing and
Keywords—Multiply Accumulate Unit (MAC), Low-Delay, communication. It is very efficient for tasks like filtering and
Urdhva Tiryagbhyāṃ, Verilog HDL, Kogge-Stone Adder convolution [1].
I. INTRODUCTION The “role of computing in VLSI circuits is critical as it decides
the power consumption of the design whilst, power area and
In the ever-evolving landscape of technology, the demand time decides the overall efficiency of a MAC unit. The basic
for high-performance at low-power / delay and area has also building blocks of a Multiplier-Accumulator unit consist of a
multiplication block and accumulator block. Also, making use
of Urdhva Tiryagbhyam sutra proves to yield a fast multiplier
when compared to other traditional multipliers like Booth’s
multiplier and Wallace-tree multiplier. Also, as an added
advantage to the computation speed, the area and power
consumption of vedic multiplier is less when compared to the
traditional multipliers mentioned here. Thus, the use of vedic
multiplier gave us an added advantage for reduction of area,
power and delay in a MAC unit.”
This research is structured in the subsequent manner:
Section I provides an exposition of the introduction. Section
II explores related works, while section III accentuates the
design and execution of the paper. The ensuing section,
section IV outlines the findings and section V draws
conclusion.
II. LITERATURE REVIEW
The authors of [2] designed a 64-bit MAC unit which
employs a vedic multiplier and a carry propagate adder (CPA).
It exhibits a significant reduction in delay when compared to
a multiplier accumulator system using a Booth Multiplier or a
shift and add Multiplier using a CPA.
The work in [3] have designed a Kogge-stone adder for high-
Fig.1. “Block Diagram of a N-Bit MAC Unit speed operation. Their research primarily focuses on the
thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 12,2025 at 12:00:48 UTC from IEEE Xplore. Restrictions appl
Fig 1. Block Diagram of MAC unit
design and implementation of different types of adders. They C. “Kogge-Stone Adder
have compared the proposed methodology with a Carry Skip • One kind of parallel-prefix adder is the Kogge-Stone
adder (CSKA) in terms of area, delay, speed and power
adder.
consumption. It also suggests that Kogge-Stone adder has
proven to be an efficient adder as it has minimal power • When compared to a traditional adder, delay of KSA
consumption with area compaction and high speed. is very less. The KSA has minimum fan-out which
makes it a fast adder [9].
In [4] a detailed analysis and implementation of combinational • The functionality of Kogge-Stone adder can be
circuits and its delay and power are carried out. This gives us broken down into three simple steps,
a good idea of how much each individual leaf cell of our entire o Preprocessing.
system might perform. Concepts of low power design were o Carry generation.
explored from [5].
o Post processing.”
In this study, the multiplier employs the concept of vedic These will be explained in detailed in the upcoming
multiplication, which has its roots in vedic mathematics. The sections.
Atharvaveda is believed to be the primary source of this
knowledge, from which the sixteen sutras of Vedic III. DESIGN AND IMPLEMENTATION
mathematics have been derived. Each sutra facilitates the As discussed earlier, the MAC unit is composed of a
execution of mathematical calculations with enhanced speed multiplier unit, an adder and an accumulator. Hence the
and precision. For this study, Urdhva-Tiryagbhyam Sutra is proposed design has a systemic approach where each block
adopted, which can be succinctly translated as "vertically and was designed and analyzed individually and then finally
crosswise." integrated together.
When choosing an adder, conventional options like Carry look Multiplication is the most crucial part of a MAC unit.
ahead adder (CLA) and Carry Propagate Adder (CPA) are Based on the study done by H. D. Tiwari in [10] use of India’s
commonly favored for their low delay. However, when several Vedic mathematics technique is advantageous for
evaluating the selection from the standpoint of a specific VLSI applications demanding low power and high speed. The
application or entity, it becomes crucial to consider trade-offs Urdhva Tiryagbhyam sutra is utilized for the multiplication
associated with the use of a traditional fast adder. The process (both vertically and crosswise). Here, the fundamental
proposed design integrates the Kogge-Stone Adder with the 2x2 Vedic multiplier can be used to create a 64x64 Vedic
aim of further reducing delay, albeit with slight compromises multiplier.
in terms of area—considering that area concerns are already
addressed by the inclusion of the Vedic multiplier [6]. 1) 2x2 Vedic Multiplier:
Consider two 2-bit numbers (binary), 𝑥1 𝑥0 and 𝑦1 𝑦0 ,
A. Vedic Multiplier“ being the multiplicand and the multiplier respectively. The
• “As previously stated, there are 16 Vedic Sutras in lowest significant bit ( 𝑝0 ) in the final product is derived only
Vedic mathematics, and the Urdhva-Tiryagbhyam
(UT) sutra is a method that may be employed for any
kind of multiplication [7].
• The fundamental notion underlying is to execute
vertical and cross-wise multiplication.
• Through the process of simplifying multiplication,
the three outputs, referred to as partial products, can
be determined with greater ease.
• The adoption of this technique becomes more
beneficial as the number of bits multiplied increases.
B. Booth Multiplier
• In certain circumstances, the Booth multiplier,
originally developed by Donald Booth and Andruo,
is considered as the natural selection for the
multiplier unit.
• In Booth algorithm, the objective is to limit the Fig. 2. Example of 2x2 Vedic multiplication
number of partial products added during the
multiplying process, which is decided on the total from the “least significant” bit of 𝑥𝑛 and 𝑦𝑛 (n represents
number of bits to be encoded. sequence length). They are multiplied vertically, indicated in
• It is also known as the binary number 2’s Fig. 2 (step 1). Subsequently, the multiplicand's 𝑥0 bit and
complement multiplication algorithm [8]. the multiplier's 𝑦1 bit’s product is summed with the product
• This algorithm takes up only one-half of the stages of the multiplier's 𝑦0 bit and the multiplicand's 𝑥1 bit
but the stages themselves are more complicated than (crosswise), as seen in Fig. 2 (step 2),to yield the second LSB,
a simple multiplier.” 𝑝1 , of the final product and carry bit 𝑐1 .The third 𝑝2 bit of
the final product and the carry bit, 𝑐2 , could be calculated by
vertically multiplying the MSBs(Most Significant Bits), 𝑥1
thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 12,2025 at 12:00:48 UTC from IEEE Xplore. Restrictions appl
of the multiplicand with 𝑦1 of the multiplier, as illustrated in 3)Generalization of Vedic Multipliers:
Fig. 2 (step 3). Subsequently a NxN Vedic multiplier can be created
using 2x2 Vedic multipliers. The first step will be to divide
As represented in Fig. 3, the fundamental 2x2-bit Vedic the multiplicand into 𝑦1 and 𝑦0 where 𝑦1 consists of the
multiplier can be realized employing four 2 input AND gates
”
𝑁 𝑁
bits from 𝑁 − 1 𝑡𝑜 and 𝑦0 from − 1 𝑡𝑜 0 . Similar
2 2
and 2 half adders.
process is done to produce 𝑥1 and 𝑥0 . The next step is
pairing the multiplicand and multiplier as 𝑦1 𝑦0 and 𝑥1 𝑥0
respectively. This is now analogous to a 2-bit multiplication.
Utilizing fundamental building pieces and the partitioning
technique, the distinct products are obtained[11].
3. Post Processing:
It’s the point of termination in which the total number
of input bits is calculated. The formulas for
calculating the sum bit and carry are equations (5) and
(6) respectively. They are the same for all adders.
𝑆𝑢𝑚𝑖 = 𝑃𝑃𝑟𝑜𝑝 𝑖 ⨁ 𝐶𝑎𝑟𝑟𝑦𝑖 (5)
thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 12,2025 at 12:00:48 UTC from IEEE Xplore. Restrictions appl
adders discussed by Ramesh S.R et.al in [12] which
was used in the adder accumulator unit and low delay As discussed
based full adders discussed by Al-Akel et.al in [13]. earlier, the 64-bit
Since the aim of this work is to reduce the delay of the input sequence is
multiplier unit, it has adopted the full adder divided into two
architecture from the above research finding to further sections.
reduce the delay of the multiplier unit.
For example, Fig. 5 shows a 4-bit KSA circuit. As
demonstrated, a "propagate" and a "generate" bit are
created at each vertical stage. In the last stage
(vertically), the carry generate bits (the carries) are
generated. XOR operation is performed with the
Fig. 7. Proposed Accumulator architecture
initial propagate after the input to find the sum bits.
By using the propagate in the second box from the 𝑋[31: 0] 𝑎𝑛𝑑 𝑋[63: 32] for the multiplier and
right side an XOR operation is performed to compute 𝑌[31: 0] 𝑎𝑛𝑑 𝑌[63: 32] for the multiplicand. These
the second bit. represents the two bits analogous to a 2x2 Vedic multiplier.
The LSB of both the multiplier and the multiplicand, i.e.,
𝑋[31: 0] and 𝑌[31: 0] are multiplied using a 32x32-bit Vedic
multiplier block. The output 𝑃1 is of 64-bit long out of which
the 32-bits from the LSB are considered for the output of the
final product 𝑀[31: 0]. The remaining 32 bits of 𝑃1 are
appended with zeros to increase the length to 64 bits. Let us
represent it as 𝑃1′ in Equation (7).
𝑆1 = 𝑃1′ + 𝑃2 (10)
The sum 𝑆1′ and 𝑆2 are passed to the stage 2 adder consisting
of a 96-bit Kogge-Stone adder performing Equation (13)
Fig. 6. Proposed Multiplier Architecture 𝑆3 = 𝑆1′ + 𝑆2 (13)
thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 12,2025 at 12:00:48 UTC from IEEE Xplore. Restrictions appl
Hence, the multiplication unit gets two inputs 𝑋[63: 0] and Spartan7 Evaluation board. The simulation was carried out
𝑌[63: 0] and outputs the product of these two inputs as using ModelSim.The current method has been compared to
𝑀[127: 0]. the existing model presented in[2], From Figure 9, we can
infer that the area efficiency of the proposed methodology is
Total on-
MAC unit 0.095 0.106 0.541
chip
with KSA
and Vedic
Multiplier Logic 80µ 0.001 0.147
Fig 8. Simulation of proposed MAC unit using Model Sim TABLE II. COMPUTED DELAYS
Delay(ns)
Method Min Max
Total Logic Net Total Logic Net
MAC unit
with KSA
3.6228 1.5148 2.108 38.3764 8.9472 29.4292
and Vedic
multiplier
250
200
150
100
50
0
MAC Design By MAC Design By MAC Design by MAC Design by
Booth Multiplier Shift-Add Vedic Mutliplier Vedic Multiplier
and CPA Multiplier and and CPA (KSA)
CPA
Delay(ns) 163.671 201.144 63.771 38.376
Gate Delay(ns) 87.722 130.329 33.085 8.9472
Net Delay(ns) 79.95 70.816 30.686 29.4292
thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 12,2025 at 12:00:48 UTC from IEEE Xplore. Restrictions appl
The delay computation was done and it was observed IEEE Region 10 Conference (TENCON), Hong Kong, Hong Kong,
that the proposed methodology is 39.7% delay efficient when 2022, pp. 1-5.
compared with [2]. The following results are reported in the [13] W. Al-Akel, K. Abugharbieh, A. Hasan and H. W. Marar, "A Power
Efficient 500MHz Adder," 2019 SoutheastCon, Huntsville, AL, USA,
Table II and Fig.12. 2019, pp. 1-6.
V. CONCLUSION [14] B. Surya, D. Prakalya, K. Abinandhan and N. Mohankumar, "Design
and Synthesis of reversible data selectors for low power applications”
This work discusses in detail how to construct a Multiplier- in Proc. of 2020 Third Int. Conf. on Smart Systems and Inventive
Accumulator unit using the Vedic Multiplier and the Kogge- Technology (ICSSIT), pp. 657–661, 2020.
Stone Adder. In comparison to past works, the proposed [15] P. Zicari, S. Perri, P. Corsonello and G. Cocorullo, "An optimized
adder accumulator for high speed MACs," 2005 6th International
methodology has reduced delay and area utilisation. The RTL Conference on ASIC, Shanghai, China, 2005, pp. 757-760.
schematic and synthesis were completed in Xilinx Vivado [16] A. S. Phuse and P. P. Tasgaonkar, "Design and Implementation of
2020.2, and the simulation output of the 64-Bit MAC unit was Different Multiplier Techniques and Efficient MAC Unit on
evaluated in Model Sim. The reduced delay and area design FPGA," 2022 International Conference on Signal and Information
Processing (IConSIP), Pune, India, 2022, pp. 1-5, doi:
serves to cater the needs of high-end filter design and many 10.1109/ICoNSIP49665.2022.10007482.
other applications. As a future scope further reduction in [17] H. M. Rakesh and G. S. Sunitha, "Design and Implementation of Novel
delay and area can be targeted using architectural 32-Bit MAC Unit for DSP Applications," 2020 International
modifications. Conference for Emerging Technology (INCET), Belgaum, India, 2020,
pp. 1-6, doi: 10.1109/INCET49848.2020.9154177.
REFERENCES
[1] D. S. Manikanta, K. S. S. Ramakrishna, M. Giridhar, N. Avinash, T.
Srujan and Ramesh S. R, "Hardware Realization of Low power and
Area Efficient Vedic MAC in DSP Filters," 2021 5th International
Conference on Trends in Electronics and Informatics (ICOEI),
Tirunelveli, India, 2021, pp. 46-50.
[2] E. V. Babu, S. Talasila, N. Divya, S. S. Deva and C. Vani, "Analysis
of Low-Delay in 64-bit Vedic multiplier based MAC unit," 2023
International Conference for Advancement in Technology (ICONAT),
Goa, India, 2023, pp. 1-6.
[3] I U. Penchalaiah and S. K. VG, "Design of High-Speed and Energy-
Efficient Parallel Prefix Kogge Stone Adder," 2018 IEEE International
Conference on System, Computation, Automation and Networking
(ICSCA), Pondicherry, India, 2018, pp. 1-7.
[4] R. Kumari and R. Mehra, "Power and delay analysis of CMOS
multipliers using Vedic algorithm," 2016 IEEE 1st International
Conference on Power Electronics, Intelligent Control and Energy
Systems (ICPEICES), Delhi, India, 2016, pp. 1-6.
[5] Devika, C., Anita, J.P. “ Design of a High-Speed Binary Counter Using
a Stacking Circuit. In: Ranganathan, G., Fernando, X., Shi, F. (eds)
Inventive Communication and Computational Technologies. Lecture
Notes in Networks and Systems, vol 311. Springer, Singapore,
2022, pp. 135 – 143.
[6] P. Singh, C. Dinendra, R. J. Hemanth and N. M. Vivek, "Design Of
High-Speed Vedic Multiplier Using Urdhva Tiryakbhyam Sutra," 2022
First International Conference on Electrical, Electronics, Information
and Communication Technologies (ICEEICT), Trichy, India, 2022, pp.
1-5,
[7] D. Jaina, K. Sethi and R. Panda, "Vedic Mathematics Based Multiply
Accumulate Unit," 2011 International Conference on Computational
Intelligence and Communication Networks, Gwalior, India, 2011, pp.
754-757
[8] V. Thamizharasan & N. Kasthuri, "High-Speed Hybrid Multiplier
Design Using a Hybrid Adder with FPGA Implementation," IETE
Journal of Research, vol.69, no.5, 2023, pp. 2301-2309.
[9] A. M. and R. K.S., "Comparative Study of Parallel Prefix Adders Based
on Carry Propagation and Sum Propagation," 2023 International
Conference on Power, Instrumentation, Control and Computing
(PICC), Thrissur, India, 2023, pp. 1-6,
[10] H. D. Tiwari, G. Gankhuyag, Chan Mo Kim and Yong Beom Cho,
"Multiplier design based on ancient Indian Vedic Mathematics," 2008
International SoC Design Conference, Busan, Korea (South), 2008, pp.
II-65-II-68.
[11] K. D. Rao, C. Gangadhar and P. K. Korrai, "FPGA implementation of
complex multiplier using minimum delay Vedic real multiplier
architecture," 2016 IEEE Uttar Pradesh Section International
Conference on Electrical, Computer and Electronics Engineering
(UPCON), Varanasi, India, 2016.
[12] G. D. Mahesh, K. Sohith, P. R. Yeshwanth, P. Shyaamsrinivas and
Ramesh. S. R, "Low Area Design Architecture of XOR-MUX Full
Adder based Discrete Wavelet Transform," TENCON 2022 - 2022
thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 12,2025 at 12:00:48 UTC from IEEE Xplore. Restrictions appl
thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 12,2025 at 12:00:48 UTC from IEEE Xplore. Restrictions appl