0% found this document useful (0 votes)
30 views

A_Power-Efficient_Core_Micro-Architecture_Based_on_RISC-V_Instruction_Set_Architecture

This paper presents a power-efficient core micro-architecture based on the RISC-V instruction set, focusing on optimizing the Arithmetic Logic Unit (ALU) to reduce power consumption, particularly in multipliers. The proposed redesign utilizes the Baugh-Wooley algorithm for signed multiplication, achieving a reduction in power consumption by 7.4% and area by 18.5% compared to existing systems. The implementation is conducted in Verilog HDL, demonstrating the effectiveness of the proposed architecture for energy-efficient RISC-V processors.

Uploaded by

prithyushr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

A_Power-Efficient_Core_Micro-Architecture_Based_on_RISC-V_Instruction_Set_Architecture

This paper presents a power-efficient core micro-architecture based on the RISC-V instruction set, focusing on optimizing the Arithmetic Logic Unit (ALU) to reduce power consumption, particularly in multipliers. The proposed redesign utilizes the Baugh-Wooley algorithm for signed multiplication, achieving a reduction in power consumption by 7.4% and area by 18.5% compared to existing systems. The implementation is conducted in Verilog HDL, demonstrating the effectiveness of the proposed architecture for energy-efficient RISC-V processors.

Uploaded by

prithyushr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Power-Efficient Core Micro-Architecture Based

On RISC-V Instruction Set Architecture


1st Chatrapathi Srs Krishna P 2nd Prabhu E
Department of Electronics and Communication Engineering, Department of Electronics and Communication Engineering,
Amrita School of Engineering, Coimbatore, Amrita School of Engineering, Coimbatore,
Amrita Vishwa Vidyapeetham, India. Amrita Vishwa Vidyapeetham, India.
[email protected] e [email protected]

Abstract—RISC processors have become a dominant force tate computer architecture research, educational purposes, and
in processor design due to their simplicity and efficiency. One industrial implementation. Numerous architectures have been
such architecture, RISC-V, offers a powerful yet streamlined built upon the foundation of RISC-V, owing to its status as an
instruction set, making it a popular choice for embedded systems.
This paper delves into the design of a RISC processor with a open-source ISA. Today, most of the architectures have been
particular focus on optimizing the power consumption of the migrating and utilizing RISC-V ISA [2]. The optimization of
Arithmetic Logic Unit (ALU). Since multipliers are known to the micro-architecture is essential for less power consumption
2024 IEEE Region 10 Symposium (TENSYMP) | 979-8-3503-6486-6/24/$31.00 ©2024 IEEE | DOI: 10.1109/TENSYMP61132.2024.10752175

be the most power-hungry components within an ALU, the and lower area. Most of the power hunger circuits in the
paper proposes a redesign of the multiplier’s internal circuitry architecture lie in the Arithmetic Logic Unit (ALU). Out of
using suitable alternatives. Specifically, the design utilizes the
Baugh-Wooley algorithm for signed multiplication due to its the components present in the ALU, the multiplier plays a
efficiency. Furthermore, the paper details how the internal significant role in deciding the overall power consumption and
circuit of this multiplier is redesigned to minimize both power hence optimizing it is an essential task.
consumption and area footprint, ultimately leading to a more This paper is organised as follows: Section II describes the
energy-efficient system overall. By focusing on optimizing the literature survey. Section III describes the design architecture
power-hungry multiplier within the ALU, this design approach
aims to reduce the overall power consumption of the RISC-V with subsection A describing the Baugh-Wooley multiplier,
processor significantly. The proposed system is implemented in subsection B discussing the existing design and subsection
Verilog HDL. The proposed design optimizes the area and power C detailing the proposed design. Section IV describes the
by reducing 18.5% and 7.4% respectively when compared to the application of the proposed multiplier-The RISC-V ISA. The
existing system. results and discussion with the detailed comparison of the
Keywords— ALU, Baugh-Wooley, CMOS, FPGA, HDL, RISC-V existing system with the proposed system are mentioned in
ISA, Xilinx, XOR Section V. Section VI draws the conclusion of the paper with
the future scope.
I. I NTRODUCTION
In today’s era of advanced technology, there’s a growing II. L ITERATURE S URVEY
demand for various appliances like mobile phones, medical The RISC-V ISA extensions have been under research and
devices, wired and wireless communication tools, and numer- several models were proposed in [3] by Enfang Cui, Tianzheng
ous Internet of Things (IoT) devices. All these devices require Li, And Qian Wei. One of the extensions of RISC-V ISA
efficient processors to carry out their functions effectively. is multiplication extension and its notation is changed to
Moreover, many of these applications are designed to be RV32IM (RISC-V ISA, 32-bit integer multiplication exten-
portable and operate on battery power. Consequently, there’s sion).
a need for processors that offer high performance while In [4] the authors discussed about Deep learning and Con-
consuming minimal power, thus preserving battery life. Over volutional Neural Networks(CNN) that have become essential
the past four decades, processor performance has primarily in various image-related tasks and industrial applications, such
improved through CMOS scaling. However, this trend is as object detection, speech processing, and autonomous vehi-
reaching its limits as CMOS technology approaches physical cles. While GPUs and ASICs have been used for accelerator
constraints. As a result, designers are compelled to explore designs, FPGAs offer better trade-offs in terms of flexibility
alternative avenues in processor design to achieve better power of design and consumption of power. However, developing an
efficiency and performance within the specified constraints. energy and area-efficient hardware design for CNN accelera-
Among the crucial aspects of processor design, Instruction tors on FPGAs remains a challenge due to limited hardware
Set Architecture (ISA) and micro-architecture stand out as the resources such as MAC units and on-chip memories. Previous
most significant consideration [1]. research has focused on optimizing CNN designs on FPGAs,
Taking into account the Instruction Set Architecture (ISA), but limited MAC units and multi-operand adders have been
a novel ISA known as RISC-V has been created to facili- identified as bottlenecks. They proposed a new processing

979-8-3503-6486-6/24/$31.00 ©2024 IEEE


Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY TIRUCHIRAPALLI. Downloaded on January 22,2025 at 17:46:35 UTC from IEEE Xplore. Restrictions apply.
element design based on Modified BOOTH encoding (MBE) and dual reduction techniques contribute to high performance
multipliers and WALLACE tree-based adders to overcome and decreased power consumption in floating-point processors.
these limitations and achieve optimization of hardware for an The paper focuses on reducing power consumption and area
efficient CNN design accelerator. However, the experimental while improving the overall efficiency of the floating-point dot
results and discussion in the paper lack detailed analysis and product unit.
comparison with other state-of-the-art designs, limiting the An FPGA implementation of high performance multiplier
understanding of the performance and efficiency improvements was designed in [10]. Their design focuses on reducing the
achieved by the proposed design. latency, area, and designing an accurate multiplier. The design
The design of full adders using multiplexers has shown a utilizes a single LUT5 with multiplexers for accurate 8-bit
significant improvement in the power and area. Several works multipliers. They have achieved a notable decrease in area and
have been done in this area. A full adder is generally designed delay for signed and unsigned multipliers. In digital signal
with Ex-or Gates, AND gates and OR gates. Suitable replace- processing, computationally heavy arithmetic tasks, such as
ment or redesign with a multiplexer can reduce the power image smoothing and filtering, often rely on multiplication-
consumption [5]. In this work, the authors have proposed an based operations like inner-product generation and accumula-
XOR-MUX based design. They have proposed to design a 1- tion. Among these operations, the time taken for multiplication
bit full adder with two Exor gates for ‘Sum’ and one 2x1 plays a crucial role in the overall time for execution of
multiplexer for ‘Carry’. Their results show that the proposed any DSP chip. The functionality units within the multiplier
design has been efficient in power consumption and delay. contribute significantly to the dissipation of power due to
This paper discusses the need for efficient digital wavelet their switching activity. The authors in [11] introduce a high-
transform (DWT) architectures in signal and image processing speed multiplier that is designed to address these challenges.
applications, highlighting the challenges posed by the size and They have proposed a multiplier that focuses on minimizing
power consumption of conventional adders and multipliers. switching activity and optimizing the count of computations.
The authors propose a novel approach using XOR-MUX They achieved performance improvement in delay and power-
adders and truncation multipliers to reduce logic size and delay product (PDP).
power consumption in DWT architectures, and they demon- Research has been carried out on the multipliers and the
strate the performance of the proposed architecture in the most widely used multipliers are the Wallace tree multiplier,
parameters of delay, area, and power using VHDL and FPGA Array-based multiplier, Vedic multiplier and Baugh-Wooley
implementation. Extending this work, the authors in [6] have multiplier. Shanmuganathan and Brindhadevi provided a com-
designed DWT architecture using a truncation multiplier with parative analysis of different multiplier types, as discussed in
XOR-MUX based adders. Their results have shown further [12]. Additionally, the authors in [13] conducted an analysis
reduction in area and the dynamic power was considerably and comparison of various multipliers, such as the Wallace
reduced. tree, array, and Baugh Wooley multipliers. Their aim was
Based on the XOR-MUX adders a Vedic multiplier has been to improve the performance of compound circuit designs.
designed in [7]. The Vedic multiplier is implemented for a 4x4 They verified the physical characteristics of all substitute
multiplier, demonstrating up to 74.26% reduction in power- blocks of the multipliers to demonstrate their effectiveness
delay product and extending the design to an 8x8 Vedic mul- and relevance. Furthermore, they optimized transistor size for
tiplier. The Vedic multiplier design with XOR-MUX based full low power consumption. The paper focuses on the design of
adder exhibits significant reductions in power-delay product, multipliers and their analysis through the reduction of partial
making it suitable for VLSI circuits. A 16-bit Vedic Multiplier products. The paper simulates area, power, and speed using
using logic gates modification technique is proposed in [8]. the three multipliers and analyzes the values using Xilinx ISE.
The paper presents the multiplier by using the BEC adders With a detailed study of the literature, it can be summarized
and modified logic gates, showcasing power savings compared that the Baugh-Wooley multiplier has proven to be effective
to traditional approaches. The logic gates are modified and in terms of power and delay. Baugh-Wooley multiplier has a
redesigned using multiplexers and their different combinations properly structured form of partial product arrays which serves
are used to check the multiplier performance. The basic gates to reduce the complexity and increases the effectiveness of the
such as AND, OR and XOR are being replaced with 2x1 multiplier. Hence, it is selected in this paper.
multiplexers. The utilization of multiplexers decreases the
III. D ESIGN A RCHITECTURE
quantity of computations conducted. Their results have shown
that the power consumption and setup time have been reduced A. Baugh-Wooley Multiplier
by 7.23% and 10.3% respectively by using modified AND gate Out of the available multipliers for signed multiplication,
and modified OR gate architecture. Based on Vedic mathemat- from the literature, the Baugh-Wooley multiplier has proven
ics, another efficient multiplier was designed by authors in [9]. to be efficient in terms of power and area. The mathematical
Based on their research, the Vedic multiplier was found to be representation of the signed multiplication of two numbers
efficient compared to the Dadda multiplier that is being used A and B with the result of the product as P is expanded
for floating-point arithmetic. The use of Vedic multiplier, early by separating the Most Significant Bit (MSB). By using (1),
normalization, leading zero anticipation, compound addition, the partial products are calculated individually. The signed

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY TIRUCHIRAPALLI. Downloaded on January 22,2025 at 17:46:35 UTC from IEEE Xplore. Restrictions apply.
bit is separated from the other bits in both multiplicand and every row except the last two rows, all the non-MSB bits are
multiplier. All the other bits are represented as a summation fed to the white cells. The MSB bits are fed to the grey cells.
with the two raised to the power of bit position. In the row just above the last row, the MSB is fed to the white
cell and the other bits are fed to the grey cell. The white cell
P =A∗B
!   consists of an AND Gate and a Full Adder. The full adder
p−2 p−2
X X receives one of its three inputs from the output of the AND
= −ap−1 2p−1 + bi 2 i ∗ −bp−1 2p−1 + bj 2j  gate, the other two inputs come from the sum and carry of the
i=0 j=0
previous block as shown in Fig.2. The Grey cell consists of a
p−2 X
p−2
X NAND Gate and a Full Adder. The Grey cell also functions
= ap−1 bp−1 22p−2 + ai bi 2i+j similar to that of the white cell except that it receives one of its
i=0 j=0 three inputs from the output of the NAND gate. The other two
p−2 p−2
X X inputs come from the sum and carry of the previous stage. The
− ap−1 bj 2j 2p−1 − ai bp−1 2i 2p−1 White cell and the Grey cell are shown in Fig. 3. The outputs
j=0 i=0 of these cells are fed as the inputs to the next stage white cells
(1)
and grey cells and this process continues until the last stage.
In Eq. 1, when these two signed numbers (A and B) are
The last stage has a stream of Full adders stacked as a parallel
multiplied by each other, the carry-out of both numbers can
adder. The partial products in each row of the multiplier are
be taken out and can be represented by two ones, one at the
generated by AND Gates and NAND Gates. The full adders
end of the first partial product and the other at the end of the
present in each White cell and Grey cell contain 2 Exor gates
last partial product. All the MSBs of the partial product are
along with AND gates and OR gates. The Exor gates present in
negated except the last row. In the last row, all the elements of
each of the full adders consume a lot of power and are proven
the partial product are negated except the MSB. Hence, an or-
to be power-hunger circuits. If these exor gates are replaced
ganized and structured way of representation without the sign
with suitable alternatives, it could reduce power consumption
extension can be achieved by mathematical analysis of signed
by the multiplier.
numbers. This procedure of reduced hardware multiplication
constitutes the Baugh-Wooley Algorithm. The partial product
generated from an example 4x4 multiplier is shown in Fig.1.
The mathematical form is transformed into the generation of
partial products and its arrangement is arranged in a structured
way providing linearity.

Fig. 2. Internal circuit of the Baugh-Wooely multiplier with Grey cells and
White cells [13]
Fig. 1. Partial products of the Baugh-Wooley multiplier

B. Existing Multiplier Design C. Proposed Multiplier Design - Full Adder using Multiplex-
The internal circuit of the Baugh-Wooley Multiplier for ers
example 4x4 signed multiplication is shown in Fig. 2. It A full adder can be conveniently designed using a mul-
contains ‘White cells’ and ‘Grey cells’ joined concretely. In tiplexer thereby reducing the power consumption by Ex-Or

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY TIRUCHIRAPALLI. Downloaded on January 22,2025 at 17:46:35 UTC from IEEE Xplore. Restrictions apply.
gates. Ex-Or gates are the most power-hunger circuits in IV. A PPLICATION OF THE P ROPOSED M ULTIPLIER -T HE
any design and suitable replacement is the need of the hour. RISC-V ISA
Fig. 4 shows the realization of the full adder with two 4x1
multiplexers. The selection lines of the two 4x1 Multiplexers The RISC-V ISA was designed by Engineers at the UC-
are given to ‘a’ and ‘b’ and the inputs to the first 4x1 mux Berkeley in 2010. Several engineers and researchers have tried
are given as ‘c’, ‘!c’, ‘!c’ and c respectively. The first 4x1 to build a customized architecture based on RISC-V ISA. It
mux gives the ‘SUM’ output. The inputs to the second 4x1 has become a standard free and open architecture for industry
mux are given as ‘0’, ‘c’, ‘c’ and ‘1’ respectively. The output applications. The RISC-V has 32 general-purpose registers
of the second 4x1 mux gives a ‘CARRY’ signal. Both these with each being 32-bit. Each instruction is also 32-bit wide
multiplexers together constitute the realization of the one- and it should be aligned to a four-byte boundary. The general
bit Full Adder. Redesigning a one-bit full adder with a 4x1 architecture contains 5 steps namely, Fetch, Decode, Execute,
multiplexer reduces the power consumption and LUTs occu- Memory and Write-back. The instruction formats are shown in
pied on the FPGA. The full adders present in the white cell Table I. The RISC-V instructions are classified into different
and the grey cell can be replaced with the 4x1 multiplexers. categories such as R, I, S, and U. These fall under RV-32
The proposed modification is proven to consume less power Integer type core instructions excluding the branching instruc-
and less area when compared to the existing Baugh-Wooley tions. The R-type instructions operate on the data present in
multiplier. In this paper, the proposed multiplier is designed the source registers and the result is written on the destination
for 32-bit signed multiplication and this multiplier is placed registers. The I-type instructions work on the immediate data
in the data path of the single-cycle RISC-V architecture. By value provided as part of the instruction. S-type instructions
incorporating the efficient 32-bit signed multiplier, the overall work on loading and storing the data either on the memory
power of the RISC-V system can be reduced. or the register bank. U-type represents the operations to be
done on unsigned data. In this brief, the integer instructions
are extended to include multiplication instructions specifically
for the signed bit multiplication. This signed multiplication
is done through the proposed Baugh-Wooley multiplier. The
single-cycle RISC-V architecture is shown in Fig. 5. The de-
sign of this system is done by 3 different sections, namely the
instruction module, the control path module and the datapath
module. All the required instructions are stored inside instruc-
tion memory. Each instruction of 32-bit and they are fetched
based on the address stored in the program counter(PC). The
PC points to the next instruction to be fetched. After the
instruction is fetched, the PC is incremented by 4 and it
Fig. 3. White cell & grey cell
points to the next instruction to execute. As the branching
instructions such as jump instructions are not included in this
brief, the PC gets incremented by four every single time the
instruction is executed. The fetched instruction is then given
to the instruction parser where the instruction is broken down
to get different fields such as opcode, func3 and func7 fields.
The operations to be done are classified by the opcode given as
part of the 32-bit instruction. The opcode includes the 7-least
significant bits of the 32-bit instruction. Further classification
of the instructions is done by the ‘func3’ and ‘func7’ bit part
of the instructions. These are fed to the control unit to define
various control outputs. The control unit is responsible for
the overall operations of the system. The datapath contains
the register bank where the 32-bit registers are located. The
registers are classified as source registers (rs1 and rs2) and
destination registers (rd1). The data is fetched from the source
registers and the corresponding results are written back to the
destination registers. The read data1 and read data2 act as the
outputs of the register bank containing the 32-bit data. These
outputs are fed as the inputs to the ALU in the data path.
The ALU operates on the data provided by these registers and
Fig. 4. Full adder using 4x1 Multiplexers
they form as the operands for the ALU. All the arithmetic
instructions are operated based on the ALUOp signal from

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY TIRUCHIRAPALLI. Downloaded on January 22,2025 at 17:46:35 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Single Cycle RISC-V architecture

TABLE I. RISC-V ISA Instruction formats [2] TABLE II. Comparison of Existing Baugh-Wooley multiplier with the pro-
posed Baugh-Wooley multiplier
func7 rs2 rs1 func3 rd opcode R
imm[11:0] rs1 func3 rd opcode I Parameter Existing Proposed
imm[11:5] rs2 rs1 func3 imm[4:0] opcode S Multiplier multiplier
imm[31:12] rd opcode U Design Design
Utilization(LUT) 2943 2134
Power(mW) 164 157
the alu control unit. This alu control unit is present in the
control unit and it regulates the functions of the ALU. The data
path also contains the signed 32-bit Baugh-Wooley multiplier. utilization. Also, the reduction of power is about 4.2% when
The multiplication extension of RV32I instructions is carried compared to the existing multiplier design. The simulation
by the this modified multiplier. The outputs from the ALU results for the Baugh-Wooley multiplier for 32-bit inputs for
are stored in the data memory. The mem addr points to the signed numbers are shown in Fig. 6. Different sets of 32-bit
location where the data is stored inside the data memory. The
mem write signal acts as an enable signal for the data to be
written onto the data memory. Based on the type of instruction
given by the user, the mem write output from the control unit
decides and navigates the data to be written back onto the
register bank based on the address pointed by the mem addr.
This constitutes the write-back process of the architecture.
V. R ESULTS AND D ISCUSSION
The design and the verification of this project are done in
Xilinx Vivado using Verilog HDL. This is targeted to the Artix-
7 FPGA platform. Table II shows the comparison of the exist-
ing multiplier design and the proposed multiplier design with
respect to the LUT’s utilization (area occupied) and power Fig. 6. Simulation Result of Baugh-Wooley Multiplier
consumption. The LUT’s occupied by the existing Baugh-
Wooley multiplier designed with full adders is 2943 whereas inputs are given to the multiplier and tested for the possible
the number of LUT’s occupied by the proposed Baugh- combinations of the data. The proposed multiplier is used in
Wooley multiplier designed with full adders using multiplexers the design of single-cycle 32-bit RISC-V architecture. Table III
consumes 2134. This shows a reduction of 27.48% in LUT shows the comparison of both architectures in terms of LUT

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY TIRUCHIRAPALLI. Downloaded on January 22,2025 at 17:46:35 UTC from IEEE Xplore. Restrictions apply.
TABLE III. Comparison of the Existing system with the proposed system
the overall system performance. The synthesis results from the
Parameter RISC-V System RISC-V Xilinx Vivado show that the designed RISC-V system with the
with existing BW System with proposed Baugh-Wooely multiplier is power and area-efficient
Multiplier proposed BW without compromising on the delay. The proposed system
Multiplier reduces the consumption of power by 7.4% and the area is
Utilization(LUT’s) 3699 3014 reduced by 18.5%. The proposed system can be best utilized
Slices 1085 932 for low-power applications. The single-cycle system can be
Power(mW) 284 263 extended further for multi-cycle and pipelined architecture
Delay(ns) 9.72 9.68 with further internal modifications [14] as a future scope.
R EFERENCES
[1] S. Bora and R. Paily, “A high-performance core micro-architecture based
utilization, slices, power consumption and delay. The power on risc-v isa for low power applications,” IEEE Transactions on Circuits
consumed and the area occupied are selected specifically and Systems II: Express Briefs, vol. 68, no. 6, pp. 2132–2136, 2021.
to compare both architectures. The original multiplier’s area [2] A. Waterman, K. Asanovic, and SiFive, “The RISC-V instruction
and power are compared with the modified multiplier. The set manual volume i: Unprivileged ISA,” https://riscv.org/wp-content/
uploads/2019/12/riscv-spec-20191213.pdf, Berkeley,CA, USA, Dec.
redesigned multiplier increases the efficiency by reducing the 2019.
power consumed and the number of LUT’s utilized. The speed [3] E. Cui, T. Li, and Q. Wei, “Risc-v instruction set architecture extensions:
remains considerably unchanged. The RISC-V system with the A survey,” IEEE Access, vol. 11, pp. 24 696–24 711, 2023.
existing multiplier consumes a total power of 284mW whereas [4] F. U. D. Farrukh, C. Zhang, Y. Jiang, Z. Zhang, Z. Wang, Z. Wang,
the RISC-V system with the proposed multiplier has consumed and H. Jiang, “Power efficient tiny yolo cnn using reduced hardware
resources based on booth multiplier and wallace tree adders,” IEEE Open
a power of 263mW. It shows a power reduction of about 7.4%. Journal of Circuits and Systems, vol. 1, pp. 76–87, 2020.
The area occupied by the existing system is 3699 LUT’s and
[5] P. Radhakrishnan and G. Themozhi, “Fpga implementation of xor-mux
the proposed system is 3014 LUT’s which is 18.5% less than full adder based dwt for signal processing applications,” Microprocessors
the LUT’s utilized by the existing system. The simulation and Microsystems, vol. 73, p. 102961, 2020. [Online]. Available:
results are shown in Fig. 7. It contains inputs namely clock, https://www.sciencedirect.com/science/article/pii/S0141933119304818
reset, and 32-bit instruction code. The instruction code is [6] G. D. Mahesh, K. Sohith, P. R. Yeshwanth, P. Shyaamsrinivas, and
R. S. R, “Low area design architecture of xor-mux full adder based
divided into several fields such as opcode, func3 and func7. discrete wavelet transform,” in TENCON 2022 - 2022 IEEE Region 10
The values of read data1, read data2 and write data are also Conference (TENCON), 2022, pp. 1–5.
captured along with the Program Counter (PC) value and the [7] A. Verma, A. Khan, and S. Wairya, “Performance analysis of vedic
alu control. The system starts operating after the reset signal multiplier using high performance xor-mux based adder for fast compu-
tation,” in Advances in VLSI, Communication, and Signal Processing,
is being de-asserted. The instruction code for all the operations A. Dhawan, R. A. Mishra, K. V. Arya, and C. R. Zamarreño, Eds.
is given as a string of 32 bits. Singapore: Springer Nature Singapore, 2022, pp. 693–705.
[8] K. Rupa Lakshmi, K. Bala Sindhuri, K. Mani Kumar, and N. Udaya Ku-
mar, “Design of 16-bit vedic multiplier using modified logic gates
and bec technique,” in VLSI, Microwave and Wireless Technologies,
B. Mishra and M. Tiwari, Eds. Singapore: Springer Nature Singapore,
2023, pp. 65–72.
[9] D. L. Prasanna and E. Prabhu, “An efficient fused floating-point
dot product unit using vedic mathematics,” 2019 3rd International
Conference on Trends in Electronics and Informatics (ICOEI), pp. 12–
15, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:
204230567
[10] P. Mahendra and S. R. Ramesh, “Fpga implementation of high perfor-
mance precise signed and unsigned multiplier using ternary 6-lut ar-
chitecture,” in 2022 International Conference on Inventive Computation
Technologies (ICICT), 2022, pp. 202–207.
[11] S. Karthick, C. Kamalanathan, P. Sunita, S. Ananthakumaran, and E.
Prabhu, “High speed energy efficient multiplier for signal processing,”
Fig. 7. Simulation Result of the RISC-V architecture nternational Journal of Engineering Systems Modelling and Simulation
(2021), vol. 12, no. 4, pp. 221–229.
[12] R. Shanmuganathan and K. Brindhadevi, “Comparative analysis of
VI. C ONCLUSION AND F UTURE S COPE various types of multipliers for effective low power,” Microelectronic
Engineering, vol. 214, pp. 28–37, 2019. [Online]. Available: https:
An efficient design of the single-cycle 32-bit RISC-V archi- //www.sciencedirect.com/science/article/pii/S0167931719301042
tecture is proposed in this paper. This is achieved through the [13] S. Y. Neyaz, I. Saxena, N. Alam, and S. A. Rahman, “Fpga and asic
optimization of the internal components of the architecture, implementation and comparison of multipliers,” in 2020 International
Symposium on Devices, Circuits and Systems (ISDCS), 2020, pp. 1–4.
specifically the multiplier design. The optimization is done
[14] J. Zhu, H. Liu, R. Zhang, and J. Qu, “Optimization of alu with gated
through a modified 32-bit Baugh-Wooley signed multiplier. clock and its internal modules in rvim64 processor,” Journal of Physics:
The proposed multiplier is used for the power and area reduc- Conference Series, vol. 2625, no. 1, p. 012005, oct 2023. [Online].
tion of the ALU of the processor design thereby improving Available: https://dx.doi.org/10.1088/1742-6596/2625/1/012005

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY TIRUCHIRAPALLI. Downloaded on January 22,2025 at 17:46:35 UTC from IEEE Xplore. Restrictions apply.

You might also like