0% found this document useful (0 votes)
27 views5 pages

(2023 Conference) Novel - Clock - Gating - Broadcasting - Applications - For - Low-Power - FPGA - Architectures

Uploaded by

Elon Musk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views5 pages

(2023 Conference) Novel - Clock - Gating - Broadcasting - Applications - For - Low-Power - FPGA - Architectures

Uploaded by

Elon Musk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2023 International Conference on Computer Communication and Informatics (ICCCI), Jan.

23 − 25, 2023, Coimbatore, INDIA

Novel, Clock Gating Broadcasting Applications for


Low-Power FPGA Architectures
2023 International Conference on Computer Communication and Informatics (ICCCI) | 979-8-3503-4821-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICCCI56745.2023.10128437

1st Dr. Tata Jagannadha Swamy 2nd Pranav Kumar


Professor M.Tech, VLSI Design
Electronics and Communication Engineering Electronics and Communication Engineering
Gokaraju Rangaraju Institute of Engineering and Technology Gokaraju Rangaraju Institute of Engineering and Technology
Bachupally, Hyderabad, India Bachupally,Hyderabad, India
[email protected] [email protected]

Abstract—In VLSI technology, clock gating strategies and portion of the total energy used by the system[1]. So, decreas-
designs are required for efficient power utilization as well as ing the clock energy loss is a wonderful way to save power in
for other design applications. In this connection, clock is one the circuit. Lower torque swings, buffer insertion, and clock
of the efficient tools for Gating Broadcasting Applications for
Low-Power FPGA Architectures. The proposed method mainly routing have been the primary focuses of system energy reduc-
explores the use of clock gating strategies to reduce the power tion initiatives[2]. Clock changeover often results in excessive
utilization for streamed applications that result from asyn- gate activity. It’s for this reason that programmable clocks
chronous dataflow architectures. Streaming applications include are being integrated into electronic circuitry. This implies that
a wide variety of computer methods from diverse fields as digital sub-clocks are derived first from master clock and that, under
logic, digital content coding, encryption, etc. The dynamically
streaming nature of algorithms is taken into account in this certain situations, they may be made to run at a slower rate
research to present a set of strategies that can produce power or even halted entirely with regard towards the master clock.
usage by selectively shutting off sections of the circuitry when Savings in electricity costs are a natural consequence of this
they are briefly dormant. These methods may be included into plan, thanks to the following points: 1) The master clock is
the synthesis phase of a high-level dataflow design flow without under less stress, and the clock tree needs fewer buffers. As a
regard to the semantics of the program being designed. At-
scale implementations based on field-programmable gate array result, clock tree power dissipation may be decreased [3], [4].
platforms show that power may be reduced without impacting Today, silicon computer technologies are limited mostly by
bandwidth utilization, according to experimental studies. their ability to dissipate power. Saving energy has obvious
Index Terms—Clock Gating Strategies, Power FPGA Archi- financial benefits, but it also offers additional advantages, such
tecture, Digital Content Coding, Encryption, Data flow Design, as reduced cooling requirements, increased durability, more
Bandwidth Utilisation.
autonomy in battery-powered gadgets, and so on. These factors
I. INTRODUCTION mean that power is often a deciding factor in selecting a
computer platform from the get-go. While the power consump-
Space, efficiency, affordability, and dependability were his- tion of a field-programmable gate array (FPGA) is greater per
torically more important to the VLSI designer than power logic unit than that of an analogous implementation integrated
considerations. But this has changed over the years, and power circuit (ASIC), it is often lower than that of a traditional
is now being considered alongside size and velocity. The processor. Fig.1 represents the block diagram of clock gating
growth of this pattern may be attributed to many causes. The [5], [6], [9].
proliferation of mobile computing devices (laptops, tablets,
This paper is organized as follows. In section II, the
and smartphones) and wireless networking systems (personal
literature review is represented. In section III, describes the
digital assistants, communicators, and other similar devices)
system model and its clock functions and timing constraints.
that require fast arithmetic operations and complex perfor-
In section IV, Implementation of the system model with
mance characteristics with low power consumption is likely
the help of various parameters are discussed. In Section V,
the most important factor. Since the clock is the sole signal that
discussed the various simulation and its corresponding results
switches continuously, the sequencing circuits in a network
like design summary and timing reports are discussed followed
are blamed for a disproportionate amount of wasted energy
by conclusion in Section VI.
[1], [2]. Furthermore, the clock signal is usually overloaded.
The clock must be distributed, and clock skew must be
controlled by constructing a clock infrastructure (often a clock II. LITERATURE REVIEW
tree) including clock buffer. This raises the clock network’s In a GALS-based system, asymmetrical communications
capacitance. According to recent research, the clocking signals channels connect several locally synchronized components.
in digital systems account for a significant (15–45 Percentage) The following are the three categories into which academicians
have placed GALS Research: The first is dividing the space,
979-8-3503-4821-7/23/$31.00 ©2023 IEEE

Authorized licensed use limited to: National Institute of Technology. Downloaded on July 08,2024 at 06:51:53 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Computer Communication and Informatics (ICCCI), Jan. 23 − 25, 2023, Coimbatore, INDIA

III. SYSTEM MODEL

When designing asynchronous circuits, control method is a


common practise for lowering the amount of dynamic power
dissipation. Clock gating reduces power consumption by en-
hancing a circuit with logic to disable unnecessary clocks.
The clock is pruned to prevent unnecessary state transitions
by flip-flops in certain circuits. Power is used up when states
are switched. When it’s not shifting, only leaking currents are
used, since trying to switch power consumption is zero[15].
Most modern SoC designs include clock gating, a technique
for conserving dynamic power that entails disabling the clocks
Fig. 1. Block Diagram of Clock Gating
for a certain block while it is not in use. There are two possible
degrees of hardware implementation in SoC designs:
• Clock The RTL gating capability is included into the SoC
technology and is programmed into the RTL. When a block
the second is communication tools, and the third is special pur- is not being actively used, its timer is stopped, rendering
pose configurations. Several writers have devoted much time the block useless. When huge chunks of logic don’t have
and energy to modelling, researching, and perfecting dataflow to switch for many cycles, a lot of dynamic power may be
concepts for GALS-based architectures. System capacity, such saved. As shown in Fig.2, the most basic and ubiquitous
as processing capacity, is considered during optimization of method of hardware implementation is the employment of a
the design in the approach developed. for simulating the logical ”AND” function to selectively deactivate the clock to
deployment of a GALS-based stream processing infrastructure particular blocks through a control signal.
for cyclostatic workloads [8], [9], [11].
The tools may turn off the timers to groups of flip-flops
To decrease the power dissipation of sequential logic CMOS
(FFs) that have a common enabling control signal during
devices, we look at using activity-driven clock trees in this
synthesizing. The above two techniques for controlling the
article. By gating the clock signals even during active/idle
timing of subsequent clocks involve the installation of physi-
periods of the clocked components, you may switch on/off
cally barriers in the clock pathways that lead to those clocks.
segments of an interaction clock tree. In order to get the
Even though transferred onto the SoC, these gates can cause
switching patterns of activity of the clocked circuits early on in
switching activity and setup/hold-time violations; nonetheless,
the design phase, we suggest an approach. Three new activity-
these issues are mitigated by clock-tree synthesizing and con-
based challenges are designed. The goal of these issues is
figuration tools at different points in the SoC’s back-end flow.
to reduce the amount of dynamic power a system needs to
Scheduling closure is achieved via synchronous generation for
run. Methods of advanced design and fabrication for software
SoC designs by equitably distributing clocks amongst both
running on programmable interface hardware and embedded
sources and destinations along pathways that may or may not
systems are becoming more necessary due to the increasing
include hardware implementation. If the SoC design uses a
complexity of applications involving digital signal processing.
substantial percentage of gating clocks or a complicated clock
For complex systems, previous studies have demonstrated that
structure, the mapping process will need to use a technique
increasing the abstraction of design phases does not always
that is not supported by FPGA technology [17].
have a negative impact on requirements in terms of either
performance or resources [10], [12], [14]. Clocking Characteristics and Clock-Gating: A Characteri-
Dataflow programming describe behavior in such a way zation: Whenever the orientation of the clock source changes,
that may express both simultaneous and sequential parts of the flip-flop in an is triggered. To trigger the flip-flop, any
applications processes, facilitating the abstracting, modifying, clock other than the internal oscillator must follow the same
and porting that are inherent to good software design. Any pattern of continuous transition. Referring to the pre- and post-
silicon-based device’s leakage current may be split into two switch logic characteristics of something like the clock source
classes: both static and dynamic. Stationary heat dissipation, clk as clk(t) and clk+(t) correspondingly, as illustrated in
also called quiescent or inactive power consumption, results Fig.2, we may represent four distinct behaviours of the clock
from the power losses of the semiconductors, which is affected using a single quaternary variable, clk. There are two types of
by the air temperature. However, dynamic leakage current is transitional behaviours represented by the and values, and two
the consequence of transistor switching and the resistance of types of stable behaviours represented by the 0 and 1 values.
cables carrying electrical current. Heat dissipation increases (Despite their similarities in appearance to the data signal 0
as a function of frequency because parasitic capacitances. In and 1, their interpretations are distinct.)
the last twenty years, clock gating (CG) strategies have been A clock starts operation defined in terms of a literal opera-
utilized by ASIC developers to lessen this effect [13], [16]. tion (1):

Authorized licensed use limited to: National Institute of Technology. Downloaded on July 08,2024 at 06:51:53 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Computer Communication and Informatics (ICCCI), Jan. 23 − 25, 2023, Coimbatore, INDIA

( Expressions for the ascent of clki may be written as


1, if b = 0. (1) clkiα = g α + ḡ .p .clk α (4)
clk b = i i i i
0, if b ̸= 0.
It’s worth noting that the tethered circuitry required for cre-
where b ∈0,{ α, β, 1 }. For this reason, a clock’s rising and ating the derived clock need to be basic so as to minimise
falling transitions, denoted by clk, are examples of binary vari- excessive power waste owing to this additional circuitry. As a
ables that may be used as parameters in Boolean operations. result, the functions gi and pi in equations (3) and (4) ought
Let’s use fig.2 as an example. to be rather elementary. We need gi to be easy to use so that it
doesn’t have any potentially harmful bugs. In a simultaneous
clk 0 = cl̄k.clk̄ + , clk α = cl̄k.clk+, clk β = clk.clk̄ + and
sequential circuit, if gi = 0 in (3) or gi = 1 in (4), we are
clk1 = clk.clk+ back at the state where the master clock clk is applied. Fig.2
represents about the clock and its behaviour in various input
Let’s imagine there are n flip-flops in a sequence, and that levels.
their outputs and clocking sources are Qi and clki, whereby I
= 0, 1,..., n-1. Since the same masters duty cycle (clk) drives
together all flip-flops in a simultaneously sequence, clki =
clk. Nonetheless, a derivation clock for Qi should always be
utilised if the flip-flop Qi must be disconnected first from
crystal oscillator for some (idle) cycles. Keep in mind that
for the electronics to remain connected, the generated clock
would have to be ”in step” with both the signal generator. The
deduced clock is often assumed to be a product of clk and the
outputs of the additional flipflops Q0, Qi1, Qi+1, and Qn1.
(which make transitions following the triggering transition of
their respective clocks.) In light of the fact that AND gated
and OR filtering may both be used to regulate the masters Fig. 2. Clock and its Behavior
clock, we have two distinct clock-gating structures (2):

clki = (gi + pi.clk), and clki = gi(pi + clk) (2)


IV. MODEL FOR IMPLEMENTATION
where 0 1 1 1, Q Qi Qi+ Qn are the possible outcomes of a
It’s the horizontal edges that are filtered out first in
flip-flop and gi as well as pi are the corresponding functions.
H.265/HEVC, and then the vertical ones. The exact sequence
As an example, think of a flip-flop that is precipitated by the
of operations is shown in Fig.3. The de-blocking filter uses
clock’s downward movement (i.e. a negative edge-triggered
parallel processing to run two parallel longitudinal filtering and
flip-flop).
parallel vertical filtering on the luminance block at the same
Ratios among clk, pi, p clk I, and p clk I+ over time are time. When using a horizontal filter, the order is top to bottom
shown in Fig.2. Only after clock slows down, pi lags behind, and left to right. In the first stage, we apply the horizontal filter
shows some fluctuations (being shown by vertically grid lines), between blocks E1 and B1. Also happening concurrently are
then stabilises at clk = 0. As may be seen, p clk I + does not the horizontally filter procedures connecting blocks B1 and
work to eliminate the problems and may cause a new one. B2. As a second stage, horizontal filter operations are carried
Because of this, only (3) works with the flip-flops that are out between blocks B2 and B3. Likewise, the horizontal
activated by a negative edge, whereas (4) does not. Please filter activities between blocks B3 and B4 are carried out
take note that when clk = 0, gi in (3) must operate flawlessly. concurrently.
Following from the above explanation, we can deduce that the
following two situations involve the dropping transition of clki
in (3): Assuming gi = 0 and pi = 1, the derived clock clki will
also switch to a falling state when clk does so. To that end, we
may refer to pi as the transitional propagate term. Since clk
and pi clk are both 0 at gi = 1, the generated clock clki also
experiences a falling transition at this point in time. So, we
may refer to gi as the transition’s ”generator.”[14], [16],[18].
clkiβ = g β + ḡi .pi .clk β (3)
i i

The clock signal produced in (4) may similarly be shown to be


appropriate for the flip-flops activated by the rising transitions Fig. 3. The proposed parallel-zigzag processing order
of the clock. When clk = 1, gi in (4) must operate flawlessly.

Authorized licensed use limited to: National Institute of Technology. Downloaded on July 08,2024 at 06:51:53 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Computer Communication and Informatics (ICCCI), Jan. 23 − 25, 2023, Coimbatore, INDIA

The operations in the rows below the first one are very much
like the first one. By storing the coding blocks in the internal
memory and reusing them for subsequent vertical filters, we
can cut down on the number of times we have to access the
external memory[18], [19], [20]. Fig.4 indicates The data how
the suggested simultaneous contra filter design.

Fig. 4. The data how the suggested simultaneous contra filter design
Fig. 5. RTL Schematic Diagram

The data for the vertically filter is transposed from rows


to columns using a RAM transposer. It all kicks off in step
4 with the application of the vertical filter among blocks F1
and B1. It is also possible to run in parallel the vertical filter
processes between sections B1 and B5. When blocks F2 and
B2 are connected, the fifth step is to apply a vertical filter.
Furthermore, the filtering procedures in the vertical direction
between blocks B2 and B6 may be carried out concurrently.
The steps in the subsequent columns generally similar to those
in the first. Each chrominance block’s data-flow procedure is
similar to that of the luminance block.
In eleven stages, the represents the fundamental filter of
the coding block may be completed if one follows this Fig. 6. Synthesis and Implementation of the Design
computing sequence. Fig.3 represents the proposed parallel-
zig-zag processing order. Fig.4 gives the contra filter design
works[13], [17], [20]. Fig.5, Fig.6, and Fig.7 represents the
RTL schematic diagram, synthesis and its detail connections. VI. CONCLUSION
Fig.8 and Fig.9 gives the design summary and timing reports
This paper introduces a CG approach for incorporating
of the simulation results.
dataflow designs into the synthesis phase of an HLS design
flow. There is no extra work or time required during the
V. SIMULATION RESULTS ”design” phase of the application’s dataflow programme to
use the power-saving strategy; it works regardless of the
The functioning of the generated result is tested using sim- application’s schematic. Together with the synthesis of the
ulation. After the RTL model has been functionally verified, computing kernels coupled by FIFO queues creating the
synthesis may begin using the Xilinx ISE software. The RTL dataflow network, the CG logic is formed at this step. These
model will be translated into a gate-level netlist that is assigned methods might feasibly be used to improve upon existing
to a particular technology library during the synthesis process. dataflow approaches to computing. The experimental findings
The Xilinx ISE software supported a wide variety of devices are highly promising, demonstrating a decrease in power
from the Spartan 3E series. The ”XC3S500E” device, housed dissipation with just a marginal increase in control logic and no
in the ”FG320” package, and operating at a speed of ”-4” noticeable drop in throughput. Finally, it could be worthwhile
was selected for this design’s synthesis. The following is an to put some further thought to regulating the clock rate and,
analysis of the synthesized outcomes from this design. perhaps, voltage changes.

Authorized licensed use limited to: National Institute of Technology. Downloaded on July 08,2024 at 06:51:53 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Computer Communication and Informatics (ICCCI), Jan. 23 − 25, 2023, Coimbatore, INDIA

pp. 3–56, Jan. 1996. [Online]. Available: http://doi. acm . org/10 .


1145/225871 . 225877 .
[2] Q. Wu, M. Pedram and X. Wu, “Clock-gating and its application to
low power design of sequential circuits,” IEEE Trans. Circuits Syst. I,
Fundam. Theory Appl. , vol. 47, no. 3, pp. 415–420, Mar. 2000.
[3] G. E. Tellez, A. Farrahi, and M. Sarrafzadeh, “Activity-driven clock
design for low power circuits,” in IEEE/ACM Int. Conf. Comput.-Aided
Design Dig. Tech. Papers (ICCAD) , San Jose, CA, USA, Nov. 1995,
pp. 62–65.
[4] E. A. Lee and A. Sangiovanni-Vincentelli, “Comparing models of
computation,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design
, Austin, TX, USA, 1997, pp. 234–241.
[5] G. Kahn, “The semantics of simple language for parallel programming,”
in Proc. IFIP Congr. , Stockholm, Sweden, 1974, pp. 471–475.
[6] E. A. Lee and D. G. Messerschmitt, “Static scheduling of synchronous
data flow programs for digital signal processing,” IEEE Trans. Comput.
, vol. 36, no. 1, pp. 24–35, Jan. 1987.
[7] E. A. Lee and T. M. Parks, “Dataflow process networks,” Proc. IEEE ,
vol. 83, no. 5, pp. 773–801, May 1995.
[8] S. Suhaib, D. Mathaikutty, and S. Shukla, “Dataflow architectures for
GALS,” Electron. Notes Theory. Comput. Sci. , vol. 200, no. 1, pp. 33–
50, 2008.
[9] T.-Y. Wuu and S. B. K. Vrudhula, “Synthesis of asynchronous systems
from data flow specification,” Inf. Sci. Inst., Univ. Southern California,
Los Angeles, CA, USA, Tech. Rep. ISI/RR-93-366, Dec. 1993.
[10] B. Ghavami and H. Pedram, “High performance asynchronous design
Fig. 7. RTL Schematic Diagram with detailed connections flow using a novel static performance analysis method,” Comput. Elect.
Eng. , vol. 35, no. 6, pp. 920–941, Nov. 2009.
[11] S. C. Brunet et al. , “Partitioning and optimization of high level
stream applications for multi clock domain architectures,” in Proc. IEEE
Workshop Signal Process. Syst. (SiPS) , Taipei, Taiwan, Oct. 2013, pp.
177–182.
[12] Kumar, Devarasetty and Jagannadha Swamy, Tata, “VHDL Design and
Implementation of C.P.U by Reversible Logic Gates”, International
Journal of Advanced Scientific Technologies in Engineering and Man-
agement Sciences. 2. 108, 2016, 10.22413/ijastems/2016/v2/i12/41283.
[13] M. Monajati and E. Kabir, ”A modified inexact arithmetic median filter
for removing salt-and-pepper noise from gray- level images”, IEEE
Trans. on Circuits and Systems-II: Express Briefs, vol. 67, no. 4, 2020.
[14] U. Erkan, L. Gökrem and S. Enginoglu, Different applied median filter
in salt and pepper noise, vol. 70, 2018.
[15] E. Sindhu and K. Vasanth, ”VLSI architectures for 8 Bit data compara-
tors for rank ordering image applications”, Int. Conf. on Communication
and Signal Processing, 2019.
Fig. 8. Design Summary [16] Zhenghao Shi, Yaowei Li, Changqing Zhang, Minghua Zhao, Yaning
Fenf and Bo Jiang, ”Weighted median guided filtering method for single
image rain removal”, EURASIP Journal on Image and Video Processing,
2018.
[17] Saohua Wan, Yu Xia, Lianyong Qi, Yee-Hong Yang and Mohammed
Atiquzzaman, ”Automated colorization of a grayscale image with seed
points propagation”, IEEE Trans. on Multimedia, 2020.
[18] Ugur Erkan, Levent Gokrem and Serdar Enginooglu, ”Different applied
median filter in salt and pepper noise”, Computer and Electrical Engi-
neering (Elsevier), vol. 70, pp. 789-798, 2018.
[19] Luka Sekanina, Zdenek Vasicek and Vojtech Mrazek, ”Automated
search-based functional approximation for digital circuits” in Approxi-
mate Circuits, Cham:Springer, 2018, ISBN 978-3-319-99321-8.
[20] V. Geetha, V. Anbumani, K. Ragakavya, P. Navaladi and S. Ponraj,
”Performance assessment of different VLSI architectures for data com-
parators for cost effective sorting networks”, International Journal of
Engineering and Advanced Technology (IJEAT), vol. 9, no. 1, 2019,
ISSN 2249-8958.
[21] A.P., H., C, P., & I.R, R. (2019). Patient Monitoring and Abnormality
Detection Along with an Android Application. International Journal of
Computer Communication and Informatics, 1(1), 52-57.
https://doi.org/10.34256/ijcci1919

Fig. 9. Timing Report

REFERENCES
[1] M. Pedram, “Power minimization in IC design: Principles and appli-
cations,” ACM Trans. Design Autom. Electron. Syst. , vol. 1, no. 1,

Authorized licensed use limited to: National Institute of Technology. Downloaded on July 08,2024 at 06:51:53 UTC from IEEE Xplore. Restrictions apply.

You might also like