3
3
Keywords- Direct Memory Access, Direct Memory Access Controller, Advance High
Performance Bus, Finite State Mission, Power consumption, propagation delay.
INTRODUCTION
Page | 1
1.1 Introduction
The consumer demand for more noteworthy usefulness and higher execution,
yet additionally for lower costs includes critical weight on System-on-Chip (SoC)
producers. The proceeding with propels in process technology, and capacity to plan
exceedingly complex SoCs does not come without a cost. So the next generation of
procedures without a doubt achieves the next generation of challenges. With ever
increasing System-on-Chip (SoC) unpredictability, energy consumption has turned
into the most basic imperative for today’s integrated circuit (IC) outline [1]. Thusly, a
great deal of exertion is spent in outlining for low-control scattering. Power
consumption has turned into an essential limitation in outline, alongside execution,
clock recurrence and bite the dust estimate [8]. Low power can be accomplished just
by outlining at all levels of reflection: from architectural design to licensed innovation
(part determination and physical usage [10]. Designers should utilize segments that
convey the most recent improvements in low power innovation.
Page | 2
Increased area overhead of state memory by discrete sub FSM encoding. At
the point when architect deteriorate a major FSM in at least two sections then
it will require more region than a solitary FSM and encoding of individual
FSM state additionally require singular encoding.
More power consumption by intersection advances between various subs
FSMs in the synchronous Design. In single FSM there is no requirement for
intersection advances, however when it is splatted into more parts at that
point to change starting with one sub FSM then onto the next progress called
as intersection advances are required which thusly require more power
consumption.
Page | 3
demonstrated or evaluated at the technological level. Along these lines, it is vital to
utilize IP components, for example, embedded memories and logic libraries that offer
adaptability in choosing different design and power saving techniques. As process
geometries have scaled, design teams have utilized increasingly of the extra silicon
land accessible on chips to coordinate embedded memories that fill in as scratch-
cushions, FIFOs and reserves to store data for the computational cores [22]. These
embedded memories consider altogether better framework execution and lower power
contrasted with an answer where off chip memories are utilized. Thus, most current
outlines have more than half of their territory utilized by embedded memories and
these memories account for 50-70% of the aggregate SoC control dispersal. In the case
of planning the littlest handheld wireless device, the most recent buyer electronics
product, or a networking and elite computing solution, power is now a key thought for
all outlines.
The size and many-sided quality of the present ICs makes it basic to consider
power all through the plan stages—the chip/system architecture, power architecture,
Page | 4
and configuration (counting small scale design choices)— and the distance to
execution with power-mindful combination, arrangement, and directing.
Correspondingly, with a specific end goal to keep functional issues from surfacing in
the final silicon, control mind-full check must be performed all through the
development process. The execution of the low power design must be checked
predictable with the hardware’s actual conduct [26].
For example, state information must be held and re-established, ports must be
detached to counteract spillage and to clip protected rationale esteem, while multi
voltage systems require level moving to swing rationale esteems from one voltage
domain to another. Prior, confirmation of the utilitarian ramifications of low power
design has been performed late in the process regularly after physical design as all
significant data was not accessible sooner in an obvious arrangement. In this manner,
low power design check has been troubled with every one of the issues of full-timing,
gate level reproduction: moderate recreations with long pivot times, long debug ID,
and long resolution times. The different parts of the power aware improvement process
might be compressed as follows:
Power aware design starts with the engineering detail of the chip system,
including apportioning the system into its hardware and software components At this
stage, evaluations ought to be made as to which squares are not execution basic, which
implies they can conceivably be keep running at a lower voltage as well as recurrence
to conserve power. Correspondingly, certain squares might be appropriate contender
for "sleep mode" or to be totally shutdown to conserve power when they are inert.
At the point when this bit of the design is re-established to its working
condition, it will take after the meaning of the chip system engineering determination,
the following stage in the development process is to refine the power architecture [26].
Page | 5
For example, the building particular may determine that a specific square ought to be
executed such that it is equipped for being totally power down. In the power
architecture segment of the process, the group will decide exactly how frequently this
square is to be closed down, and furthermore any inter decencies among this piece and
different squares [28]. To put this in context, before a specific piece is powered down,
it might be important to first power down at least one different squares in a particular
request, and to guarantee that each square is completely powered down before a
consequent square is powered down. Similarly, it is important to guarantee that the
different pieces are controlled up in a particular request.
In this context, "design ―refers to the bit of the stream where – taking the
outcomes from the chip system architecture and power architecture stages – design
engineers catch the RTL portrayals for the different pieces framing the design. The
designers associated with each square are in charge of guaranteeing that the piece will
meet its practical, timing, and power requirements while possessing the base silicon
real-estate.
The implementation phase is the place the majority of the work performed
amid the chip system architecture, power architecture, and control mindful plan stages
comes to finish with the guide of power aware motors for rationale blend, clock gating,
position, clock-tree amalgamation, steering, et cetera [17]. Moreover, amid the
execution stage, the sensible and physical structures required for the different power
techniques are made. These incorporate power grid blend, power plane
implementation, and addition of level shifters, switch cells, isolation cells, and state
retention cells.
Page | 6
1.2.5 Power-Aware Verification:
Page | 7
DMA CONTROLLER
ARCHITECTURE
Page | 8
2.1 DMA CONTROLLER ARCHITECTURE:
1. Transmission of data can be done from both sides either from AHB bus to
peripheral interface or peripheral interface to AHB bus side. In the proposed
architecture, Receiving and transmission can be performed separately.
Page | 9
2. Low power technique “FSM Decomposition” is used for lowering the power in the
proposed architecture.
3. To indicating the completion of transmission, “End_of_transfer” signal is used. It
goes active after the transmission of last byte.
4. User defined FIFO size is used. According to incoming data FIFO size can be
varied.
5. This architecture can work in full duplex mode. Data transmission and receiving
operation can be performed on same time.
Page | 10
2.4 SIGNAL DESCRIPTION
Page | 11
Table 2.2 Signal description of Rx DMA
Sr. Signal name No. Signal Description Signal type
No. of
Bits
Page | 12
2.5 WORKING OF DMA CONTROLLER ARCHITECTURE
1. The AHB Slave block interfaces the AHB bus to the DMA channel registers. To
initiate a transfer, CPU will program all the operational registers of the peripheral
IP and all the operational registers of the AHB-DMA.
2. The value programmed into “Block Size Register” of the DMA Controller for a
particular channel (Tx or Rx). FIFO (Tx& Rx) inside the DMA, can have a either
a fixed or para-meterizable depth.
3. In case of Transmit, When-ever FIFO is not full, Data fetched from the system
memory, will be written into the Tx-FIFO.
4. Similarly, in case of receive, when-ever FIFO is not empty, data read from the
RxFIFO is stored to system memory.
5. Now DMA control signal check for transmitting/reviewing process. Only after the
activation of this control signal, data can proceed further.
6. After activation of control signal, receiving side (Ahb side or peripheral interface
side) will give the status that it is ready for transition or not.
7. At every transmission, last byte will be matched.
When the last bytes are transferred, an “End of Transfer” (Tx or Rx) is generated
by the DMA for conforming the completion of operation.
FIFO is a First-In-First-Out memory queue with control logic that deals with
the read and writes operations, creates status flags, and gives discretionary handshake
signs to interfacing with the user logic. It is regularly used to control the flow of data
amongst source and goal. FIFO can be named synchronous or offbeat relying upon
whether same clock or different (no concurrent) clocks control the read and write
operations.
Page | 13
FIFO full and empty flags are produced and passed on to source and goal
logics, respectively, to pre-empt any flood or sub-current of data. Along these lines
data integrity amongst source and goal is kept up.
The clock domain that supplies data to FIFO is frequently eluded as write or
input logic and the clock domain that reads data from the FIFO is regularly eluded as
read or output logic. If read and write clock areas are governed by same clock signal
the FIFO is said to bisynchronous and if read and write clock domains are administered
by various (no concurrent) clock signals FIFO is said to be asynchronous.
FIFO full and FIFO empty banners are of extraordinary worry as no data ought
to be written in full condition and no data ought to be read in empty condition, as it
can prompt loss of data or age of non-relevant data. The full and empty conditions of
FIFO are controlled utilizing paired or gray pointers.
Asynchronous FIFO alludes to a FIFO design where data values are written
sequentially into a memory array utilizing a clock flag, and the data values are
sequentially read out from the memory array utilizing the same clock signal.
In synchronous FIFO the age of empty and full banners is straight forward as
there is no clock space crossing included. Considering this reality client can even
produce programmable partial empty and halfway full banners which are required in
many applications.
Page | 14
2.6.1.2 Block Diagram of FIFO
Write_data fifo_empty
Wdata_valid fifo_aempty
Fifo_full Synchronous rdata_valid
FIFO
Fifo_afull read_data
Write_ack read_req
Page | 15
Table 2.4: Write Side Ports
Name I/O Width Description
write_data I 16 16-bit data input to FIFO
wdata_valid I 1 Qualifies the write data. Logic high indicates
the data on write_data bus is valid and need to
be sampled at next rising edge of the clock.
Page | 16
2.6.2 Write Control Logic
Write Control Logic is used to control the write operation of the FIFO’s internal
memory. It generates binary-coded write pointer which points to the memory location
where the incoming data is to be written. Write pointer is incremented by one after
every successful write operation. Additionally, it generates FIFO full and almost full
flags which in turn are used to prevent any data loss. For example, if a write request
comes when FIFO is full then Write Control Logic stalls the write into the memory till
the time fifo_full flag gets de-asserted. It intimates the stalling of write to source by
not sending any acknowledgement in response to the write request.
2.6.2.1 Block diagram of write control logic
Page | 17
wdata_valid I 1 Qualifies write data in. A logic high indicates
the data on write_data bus is valid
Read Control Logic is used to control the read operation of the FIFO’s internal
memory. It generates binary-coded read pointer which points to the memory location
from where the data is to be read. Read pointer is incremented by one after every
successful read operation. Additionally, it generates FIFO empty and almost empty
flags which in turn are used to prevent any spurious data read. For example, if a read
request comes when FIFO is empty then Read Control Logic stalls the read from the
memory till the time fifo_empty flag gets de-asserted. It intimates the stalling of read
to the requestor by not asserting rdata_valid in response to the read request.
Page | 18
2.6.3.1 Block diagram of read control logic
Page | 19
fifo_aempty O 1 Indicates to the source that FIFO’s internal
memory has only few spaces left for new data.
Upon seeing this source may decide to slow
down or stall the write operation
Page | 20
write_data I 16 Data Input to the memory
read_addr I 4 Read address to the memory. It is derived from
read pointer by knocking-off its MSB
At reset, both read and write pointers are 0. This is the empty condition of the
FIFO, and fifo_empty is pulled high and fifo_full is low.
At empty, reads are blocked and only operation possible is write.
Since fifo_full is low, upon seeing a valid write data Write Control Logic will
ensure the data be written into location 0 of memory array and write_ptr be
incremented to 1. This causes the empty signal to go LOW.
With fifo_empty pulled down, read operation can now be performed. Upon
seeing read request at this state Read Control Logic will fetch data from location
0 and will increment read_prt to 1.
In this way read keeps following write until the FIFO gets empty again.
If write operations are not matched by read soon FIFO will get full and any
further write will get stalled until fifo_full is pulled down by a read.
With the help of FIFO full and empty flags data integrity is maintained between
the source and the requestor.
Page | 21
Figure 2.6 Complete flow of internal Architecture
Page | 22
Figure 2.8 Signals applied to DUT for testing
Page | 23
LOW POWER DMA
CONTROLLER
Page | 24
3.1 FSM Decomposition Approach
Page | 25
CMOS is dynamic power consumption caused by the actual effort of the circuit to
switch. A first order approximation of the dynamic power consumption of CMOS
circuitry is given by the formula:
P=C*V2* f (3.1)
The power dissemination emerges from the charging and releasing of the circuit
hub capacitances found on the yield of each rationale door. Each low-to-high rationale
progress in an advanced circuit acquires a difference in voltage, drawing vitality from
the power supply.
An originator at the innovative and structural level can endeavor to limit the
factors in these conditions to limit the general vitality utilization. Nonetheless, control
minimization is frequently an intricate procedure of exchange offs between speed,
region, and power utilization.
Static vitality utilization is caused by cut off, inclination, and spillage streams.
Amid the change on the contribution of a CMOS door both p and n channel gadgets
may direct at the same time, quickly building up a short from the supply voltage to
ground.
Page | 26
capacitance. This can be accomplished at the structural level of configuration and
additionally at the rationale and physical execution level.
Associations with outside parts, for example, outer memory, normally have
substantially more prominent capacitance than associations with on-chip assets.
Therefore, getting to outer memory can build vitality utilization. Thus, an approach to
decrease capacitance is to lessen outer gets to and improve the framework by utilizing
on-chip assets, for example, stores and registers. Likewise, utilization of less outside
yields and occasional exchanging will bring about unique power investment funds.
P=PSwitching+PShort-Circuit+PLeakage (3.2)
PSwitching, called switching power, is due to charging and discharging capacitors driven
by the circuit.
PShort-Circuit, called short-circuit power, is caused by the short circuit currents that arise
when pairs of PMOS/NMOS transistors are conducting simultaneously.
Finally, PLeakage, called leakage power, originates from substrate injection and sub
threshold effects. For older technologies (0.8 µm and above), PSwitching was
predominant.
For deep-submicron processes, PLeakage becomes more important. Design for low
power implies the ability to reduce all three components of power consumption in
CMOS circuits during the development of a low power electronic product.
Page | 27
PSwitching= CLVDD 2f (3.3)
where CL is the output load of the gate,
VDD is the supply voltage,
f is the expected frequency,
where f can be calculated as
f=P(1-P) (3.4)
i=1
Page | 28
3.3.2 Power Estimation using Entropy
Total no of nodes N, each node with frequency fi has cap Ci then for constant
Vdd, total power will be
N
Ptotal = Σ C V2 f (3.7)
i=1
Page | 29
that some particular units in a larger architecture don't require peak performance for
some clock machine cycles. Specific frequency scaling (and voltage scaling) on such
units may hence be connected, at no punishment in the general system speed.
Optimization approaches that have a lower effect on execution, yet permitting critical
power savings, are those focusing on the minimization of the switched capacitance
(i.e., the result of the capacitive load with the switching activity). Static solutions (i.e.,
pertinent at design time) handle exchanged capacitance minimization through area
optimization (that compares to a diminishing in the capacitive load) and switching
activity decrease by means of misuse of different sorts of signal correlations (temporal,
spatial, spatiotemporal). Dynamic techniques, on the other hand, aim at taking out
power wastes that might be begun by the application of certain system workloads (i.e.,
the data being processed).
3.4 Tx FSM
A finite state machine is decomposed into a number of coupled sub machines.
These sub machines are said to be coupled in the sense that state transitions take place
either within a submachine or between two sub machines. Most of the time, only one
of the sub machines will be activated which, consequently, could lead to substantial
savings in power consumption.
Basic Methodology:
Keep minimum crossing Transition to reduce power consumption. Designer
looks in state transition table and find out which state is having maximum
number of transition, that state which is having maximum transition is ignored
for crossing transition and state which is having minimum transition is suitable
for outgoing transition.
First bit of state code is Control bit distinguish between Sub machines. For e.g.
if state code is 011 , first bit is 0 it shows it belongs to sub machine
M1.Similarly if state code is 101 , first bit is 1 it shows that it belongs to sub
machine M2.
Inner two bits are to distinguish between states within each submachine.
Page | 30
3.4.1 State Transition matrix of Tx FSM
Page | 31
ST6 110
Above table shows the codes of states ST1 to ST6. This FSM can proceed for
decomposition depending upon the values of state codes.
In upper table, the State Transition matrix is created on the basis of the occurrence of
the states when an input sequence is given to it. Here the input sequence is mentioned
in the extreme left column of state transition matrix and depending on these input
sequences the probability of occurrence of states are mapped in it.
Page | 32
When 1 is applied, it jumps to next state ST5. After that data proceed to ST6 where
last byte of data is checked.
Page | 33
3.4.3 Decomposition of Tx FSM
Above figure demonstrates unique FSM with dashed lines which decay it into
two parts upper half and lower half [24]. Decomposition is done on the premise of first
digit of state codes of states for e.g. in state codes of ST1 , ST2 , ST3 are having 0 as
the main piece so these has a place to submachine M1.Similarly other states are having
1 as first bit so these have a place to submachine M2.
Page | 34
3.4.3.1 Power Calculation of upper Tx FSM
As can be interpreted from State transition Matrix, the State ST2 in submachine M1
has maximum number of transition. So we cannot have outgoing crossing transition
from state ST2 as well as from state ST1 as this state also has more number of
transitions. . Only state ST3 in submachine M1 can have an outgoing crossing
transition as it has minimum transition in submachine M1. So, a crossing transition is
done from State ST3 to State ST6 depending upon the last three bits of the state code
Page | 35
Table 3.3: State Transition Matrix for TX upper fsm M1
i/p ST1 ST2 ST3 ST4 ST5 ST6
Sequence
0 1
1 1
0 1
1 1
0 1
Total 2 1 2
Transitions
Page | 36
3.4.3.2 Power calculation of lower Tx FSM
P(ST4) = 0
P(ST5)= (1/72) ln(72/1) =
0.05923
P(ST6) = (2/72) ln(72/2)
= 0.0995
Page | 37
Sum of the switching probability of these states will give the total probability of
occurrence of States in Upper Sub FSM.
Total Switching probability of Lower Tx FSM M2= P(ST4) + P(ST5) + P(ST6)
= 0+ 0.05932+0.0995
=0.1588
f =P *(1-P)
=0.1588 * (1-0.1588)
=0.13358
Let C= 1mF, Vdd = 5 V
Power of Lower fsm of Tx M2 = C* V2 * f
= 1*52 *0.13358 mW
= 3.3395 mW
Page | 38
3.5 Rx FSM
There are 6 states in Rx FSM of DMA controller which consists of data transfer
states from Peripheral interface to AHB side. In upper table, the State Transition
matrix is created on the basis of the occurrence of the states when an input sequence
is given to it. Here the input sequence is mentioned in the extreme left column of state
transition matrix and depending on these input sequences the probability of occurrence
of states are mapped in it.
Page | 39
Referring above figure, at first originator is at ST1 state. At the point when
input 1 is connected, it bounces to ST2. Presently again 1 is connected, next state is
ST3.It demonstrates the piece estimate have any esteem or not. Presently at ST3 when
1 is connected it bounces to state ST4. It demonstrates the when information comes to
FIFO, FIFO is allowed to consider the operation or not. At the point when 1 is
connected, next state in ST5. After that information continue to ST6 where last byte
of information is checked. In the wake of getting last byte a flag end of exchange
affirm the consummation of operation.
Page | 40
= 6* 12
=72
= 0.0593
P(ST3)=(2/72) ln(72/2)
=0.0995
P(ST4)=(1/72) ln (72/1)
=0.0593
P(ST5)=(2/72) ln(72/2)
=0.0995
P(ST6)=(2/72) ln(72/2)
=0.0995
= 0.1605+0.0593+0.0995+0.0593+0.0995+0.0995
=0.5776 f
=P*(1-P)
=0.2439
Let C= 1mF, Vdd = 5V
=6.0975 mW
Page | 41
3.5.3 Decomposition of Rx FSM
Above figure shows original FSM with dashed lines which decompose it into
two parts upper half & lower half [24]. Decomposition is done on the basis of first
digit of state codes of states for e.g. in state codes of ST1 , ST2 , ST3 are having 0 as
the first bit so these belongs to sub machine M1.Similarly other states are having 1 as
first bit so these belong to sub machine M2.
Page | 42
3.5.3.1 Power calculation of upper Rx FSM
Page | 43
=0.0995
Total Switching probability of Upper TX FSM M1= P (ST1) + P (ST2) + P (ST3)
=0.2583
f = P* (1-P)
=0.2583*(1-0.2583)
=0.1915
Page | 44
1 1
0 1
Total 2 2
Transitions
P (ST4) = 0
P (ST5) = (2/72) in (72/2)
= 0.0995
P (ST6) = (2/72) in (72/2)
= 0.0995
Total Switching probability of Lower Rx FSM M2= P (ST4) + P (ST5) + P (ST6)
= 0+ 0.0995+0.0995
=0.199 f =P
*(1-P)
=0.199 * (1-0.199)
=0.1593
Let C= 1mF, Vdd = 5 V
Power of Lower fsm of Rx M2 = C* V2 * f
= 1*52 *0.1593 mW
= 3.9825 mW
Page | 45
Page | 46