0% found this document useful (0 votes)
15 views48 pages

3

The document discusses the design and architecture of a Direct Memory Access (DMA) Controller utilizing the Advanced High-Performance Bus (AHB) to facilitate efficient data transmission while minimizing CPU involvement. It emphasizes the importance of power reduction techniques, particularly through finite state machine decomposition, to address power consumption challenges in System-on-Chip (SoC) designs. The proposed architecture supports full duplex operation, user-defined FIFO sizes, and includes a mechanism for signaling the completion of data transfers.

Uploaded by

rowdyh20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views48 pages

3

The document discusses the design and architecture of a Direct Memory Access (DMA) Controller utilizing the Advanced High-Performance Bus (AHB) to facilitate efficient data transmission while minimizing CPU involvement. It emphasizes the importance of power reduction techniques, particularly through finite state machine decomposition, to address power consumption challenges in System-on-Chip (SoC) designs. The proposed architecture supports full duplex operation, user-defined FIFO sizes, and includes a mechanism for signaling the completion of data transfers.

Uploaded by

rowdyh20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ABSTRACT

DMA (Direct Memory Access) Controller is used to transmission of data from


Memory Side to peripheral side and peripheral side to memory side with keeping CPU
free during data transmission. AHB (Advance High Performance Bus) is a new
generation of AMBA bus which is intended to address the requirements of high
performance synthesizable designs. It is a high-performance system bus that supports
multiple bus masters and provides high-bandwidth operation. Power consumption and
power-related issues have become a major concern for most designs. The primary
method used for reducing power has been supply voltage reduction, this technique
begins to lose its effectiveness as voltages drop to sub threshold voltage range and
further reductions in the supply voltage begin to create more problems than are solved.
In this work a new approach to the synthesis problem for finite state machines with
the reduction of power dissipation as a design objective. A finite state machine is
decomposed into a number of coupled sub machines. Most of the time, only one of the
submachine will be activated which, consequently, could lead to substantial savings
in reduction power consumption. Designer compute two sub-FSMs that together have
the same functionality as the original FSM. To minimize the average switching
activity, search for a small cluster of states with high stationary state probability and
use it to create the small sub-FSM. This way designer will have a small amount of
logic that is active most of the time, during which is disabling a much larger circuit,
the other sub-FSM.

Keywords- Direct Memory Access, Direct Memory Access Controller, Advance High
Performance Bus, Finite State Mission, Power consumption, propagation delay.
INTRODUCTION

Page | 1
1.1 Introduction

The consumer demand for more noteworthy usefulness and higher execution,
yet additionally for lower costs includes critical weight on System-on-Chip (SoC)
producers. The proceeding with propels in process technology, and capacity to plan
exceedingly complex SoCs does not come without a cost. So the next generation of
procedures without a doubt achieves the next generation of challenges. With ever
increasing System-on-Chip (SoC) unpredictability, energy consumption has turned
into the most basic imperative for today’s integrated circuit (IC) outline [1]. Thusly, a
great deal of exertion is spent in outlining for low-control scattering. Power
consumption has turned into an essential limitation in outline, alongside execution,
clock recurrence and bite the dust estimate [8]. Low power can be accomplished just
by outlining at all levels of reflection: from architectural design to licensed innovation
(part determination and physical usage [10]. Designers should utilize segments that
convey the most recent improvements in low power innovation.

The best power savings can be accomplished by settling on the correct


decisions from the get-go amid the framework and design level of deliberation.
Notwithstanding utilizing power conscious hardware design methods, it is vital to save
power through careful design of the working framework and application programs. In
CMOS circuits, power is scattered in a gate when the door yield changes. In successive
circuit outline, a powerful way to deal with reduce power scattering is to "turnoff
―portions of the circuit and subsequently lessens the switching activities in the circuit
[9]. In this work, we proposed a technique that is additionally in view of specifically
killing bits of a circuit. Our approach is propelled by the perception that, for a FSM,
dynamic changes happen just inside a subset of states in a period of time. In a CMOS
circuit, by and large, the exchanging action of the gate output contributes most to the
total power scattering. For FSM low power design, parcelling method turns out to be
viable for diminishing exchanging movement. That is, segment the first FSM into a
few littler sub FSMs and just a single of them is dynamic at once. Be that as it may,
two issues are frequently presented [13]:

Page | 2
 Increased area overhead of state memory by discrete sub FSM encoding. At
the point when architect deteriorate a major FSM in at least two sections then
it will require more region than a solitary FSM and encoding of individual
FSM state additionally require singular encoding.
 More power consumption by intersection advances between various subs
FSMs in the synchronous Design. In single FSM there is no requirement for
intersection advances, however when it is splatted into more parts at that
point to change starting with one sub FSM then onto the next progress called
as intersection advances are required which thusly require more power
consumption.

1.2 Design Abstractions:

Power reduction can be executed at various levels of design abstraction:


system, architectural, gate, circuit and the innovation level. At the framework level,
latent modules might be killed to save power. At the architectural level, parallel
hardware might be utilized to diminish global interconnect and permit a lessening in
supply voltage without corrupting framework throughput. Clock gating is ordinarily
utilized at the gate level [23]. An assortment of design techniques can be utilized at
the circuit decrease both dynamic and state control. For a given design specification,
designers have numerous decisions to make at various levels of deliberation Based on
specific outline imperatives, (for example, power, performance, cost etc.) the designer
must choose a specific algorithm, architecture and decide different parameters, for
example, supply voltage and clock recurrence.

This multi-dimensional design space offers an extensive variety of conceivable


exchange offs. Properties of a design are most persuasive at the largest amounts of
reflection; in this way, the best outline choices get from picking and optimizing
architectures and algorithms at those levels. Nonetheless, it moves toward becoming
a challenge to anticipate the outcomes and viability of design decisions made at the
more elevated amounts of reflection since usage points of interest must be precisely

Page | 3
demonstrated or evaluated at the technological level. Along these lines, it is vital to
utilize IP components, for example, embedded memories and logic libraries that offer
adaptability in choosing different design and power saving techniques. As process
geometries have scaled, design teams have utilized increasingly of the extra silicon
land accessible on chips to coordinate embedded memories that fill in as scratch-
cushions, FIFOs and reserves to store data for the computational cores [22]. These
embedded memories consider altogether better framework execution and lower power
contrasted with an answer where off chip memories are utilized. Thus, most current
outlines have more than half of their territory utilized by embedded memories and
these memories account for 50-70% of the aggregate SoC control dispersal. In the case
of planning the littlest handheld wireless device, the most recent buyer electronics
product, or a networking and elite computing solution, power is now a key thought for
all outlines.

IC power consumption essentially impacts the originator's capacity to separate


an item in light of highlights, cost, execution, time to showcase, and even unwavering
quality. With this move in focus, power is presently a primary design requirement,
joining the customary imperatives of timing and area. This implies an effective design
environment and philosophy should at the same time consider all design constraints
(including power) in a consistent shut circle, multi-target intending to-signoff
arrangement. Tending to these necessities is fundamental, paying little mind to the
sorts of power diminishment strategies that are being connected: basic techniques
(multi-VT libraries and clock gating), more advanced techniques (dynamic
voltage/recurrence scaling and power shut-off), or emerging techniques
(back inclination and low-swing tickers). Whichever blend of techniques will be
connected, plan groups should effectively gauge and oversee hazard like utilitarian
defects, auxiliary imperfections while keeping up profitability levels. Quite, architects
can't just "Bolton low power" toward the finish of the development process.

The size and many-sided quality of the present ICs makes it basic to consider
power all through the plan stages—the chip/system architecture, power architecture,

Page | 4
and configuration (counting small scale design choices)— and the distance to
execution with power-mindful combination, arrangement, and directing.
Correspondingly, with a specific end goal to keep functional issues from surfacing in
the final silicon, control mind-full check must be performed all through the
development process. The execution of the low power design must be checked
predictable with the hardware’s actual conduct [26].

For example, state information must be held and re-established, ports must be
detached to counteract spillage and to clip protected rationale esteem, while multi
voltage systems require level moving to swing rationale esteems from one voltage
domain to another. Prior, confirmation of the utilitarian ramifications of low power
design has been performed late in the process regularly after physical design as all
significant data was not accessible sooner in an obvious arrangement. In this manner,
low power design check has been troubled with every one of the issues of full-timing,
gate level reproduction: moderate recreations with long pivot times, long debug ID,
and long resolution times. The different parts of the power aware improvement process
might be compressed as follows:

1.2.1 Chip/System Architectural Specification:

Power aware design starts with the engineering detail of the chip system,
including apportioning the system into its hardware and software components At this
stage, evaluations ought to be made as to which squares are not execution basic, which
implies they can conceivably be keep running at a lower voltage as well as recurrence
to conserve power. Correspondingly, certain squares might be appropriate contender
for "sleep mode" or to be totally shutdown to conserve power when they are inert.

1.2.2 Power Architecture:

At the point when this bit of the design is re-established to its working
condition, it will take after the meaning of the chip system engineering determination,
the following stage in the development process is to refine the power architecture [26].

Page | 5
For example, the building particular may determine that a specific square ought to be
executed such that it is equipped for being totally power down. In the power
architecture segment of the process, the group will decide exactly how frequently this
square is to be closed down, and furthermore any inter decencies among this piece and
different squares [28]. To put this in context, before a specific piece is powered down,
it might be important to first power down at least one different squares in a particular
request, and to guarantee that each square is completely powered down before a
consequent square is powered down. Similarly, it is important to guarantee that the
different pieces are controlled up in a particular request.

1.2.3 Power-Aware Design:

In this context, "design ―refers to the bit of the stream where – taking the
outcomes from the chip system architecture and power architecture stages – design
engineers catch the RTL portrayals for the different pieces framing the design. The
designers associated with each square are in charge of guaranteeing that the piece will
meet its practical, timing, and power requirements while possessing the base silicon
real-estate.

1.2.4 Power-Aware Implementation:

The implementation phase is the place the majority of the work performed
amid the chip system architecture, power architecture, and control mindful plan stages
comes to finish with the guide of power aware motors for rationale blend, clock gating,
position, clock-tree amalgamation, steering, et cetera [17]. Moreover, amid the
execution stage, the sensible and physical structures required for the different power
techniques are made. These incorporate power grid blend, power plane
implementation, and addition of level shifters, switch cells, isolation cells, and state
retention cells.

Page | 6
1.2.5 Power-Aware Verification:

Verification commences with the planning process. Every hardware and


software element containing the outline that will be tried is nitty gritty, the path in
which each element will be checked is characterized, and the required scope
measurements for each element are indicated. In a modern design domain,
confirmation arranging must be supplemented by a complex check administration
usefulness that translates the arrangement and consequently sends the proper
apparatuses required to perform comprehensive verification.

1.3 POWER ANALYSIS

Power analysis, such as timing examination, should be reliable and concurrent


all through a design flow. Early power analysis that isn't usage mindful will neglect to
have any understanding into what sorts of timing, area, and power optimizations that
will be required to meet the design’s constraints [24]. This adds up to no superior to a
figure. Implausible early power estimation can have genuine negative impacts, for
example, causing cost invades because of a more costly bundle, seeking after the
wrong engineering or optimization strategy, or plan overwhelms. All through the
execution flow, power examination needs to precisely ascertain and report all
segments of power consumption, including active power and leakage power. Power
analysis additionally requires extensive announcing that enables designers to
comprehend where power is devoured and how it could be limited [20]. A good design
flow should utilize predictable power calculations that utilization the best accessible
execution knowledge from early power estimates through close down power
calculation, so that power is always measured, refined, and appropriately followed up
on.

Page | 7
DMA CONTROLLER
ARCHITECTURE

Page | 8
2.1 DMA CONTROLLER ARCHITECTURE:

Under AMBA (Advance microcontroller bus architecture) there are 3


protocols defined, which are as follows, AHB (advance high performance bus), APB
(advance peripheral bus), ASB (Advance system bus) among which AHB is used for
high speed, low latency operation and high frequency operation. AHB is a new
generation of AMBA transport which is expected to address the necessities of high-
performance synthesizable outlines. It is a high-performance system bus that backings
multiple bus experts and gives high bandwidth operation. Features of AHB (Advance
high performance Bus): 1. It supports multiple bus masters and provides high band
operation. 2. AHB has high clock frequency operation. 3. It can handle burst transfer.
4. Single clock edge operation. AMBA-AHB has following four components: 1. AHB
Master: It initiates read write operation. Only single master activate one time for using
bus. 2. AHB Slave: It responds the read write operation at particular address. It also
replies to AHB master that operation has been performed or not. 3. AHB Arbiter: It
ensures that only one bus master is allowed at single time. 4. AHB Decoder: It decodes
the address of each transfer and provides a select signal for the slave that is involved
in the transfer. DMA can be programmed using its AHB slave interface by the system
processor. The DMA's master controller is at first idle yet comes to dynamic mode
after processor programs an arrangement of control registers through its AHB slave
interface. The parameters that are to be set are "Base address for next transfer", "Block
size (in bytes) for next transfer" and "DMA enable disable" piece (TX/RX). When the
programming of enroll is done, DMAcontroller will then start the bus transaction as a
bus master on the AHBbus/IPbus to play out the asked for data transfer.

2.2 BENEFITS OF ARCHITECTURE:

1. Transmission of data can be done from both sides either from AHB bus to
peripheral interface or peripheral interface to AHB bus side. In the proposed
architecture, Receiving and transmission can be performed separately.

Page | 9
2. Low power technique “FSM Decomposition” is used for lowering the power in the
proposed architecture.
3. To indicating the completion of transmission, “End_of_transfer” signal is used. It
goes active after the transmission of last byte.
4. User defined FIFO size is used. According to incoming data FIFO size can be
varied.
5. This architecture can work in full duplex mode. Data transmission and receiving
operation can be performed on same time.

2.3 BLOCK DIAGRAM

Figure 2.1: AHB-DMA Block Diagram

Page | 10
2.4 SIGNAL DESCRIPTION

Table 2.1: Signal description of Tx DMA

Sr. Signal name No. Signal Description Signal type


No. of
Bits

1 Clk 1 Clk Signal for DMA controller Input

2 Rst 1 Signal for Reset the DMA Input


controller
3 on_ip 1 Start signal for the data Input
transfer from AHB to
peripheral interface
4 block_size 1 Data have any value or not Input

5 fifo_data 1 Tx FIFO is empty or not Input


6 dma_tx_process 1 DMA control signal for Input
transmitting the data

7 ahb_tx_process 1 AHB is ready for transmitting Input


the data or not
8 tx_last_byte 1 Last byte conformation signal Input

9 input_data 8 The data which is to be Input


transfer from AHB to
peripheral
10 output_data 8 The data which is received by Output
peripheral interface

Page | 11
Table 2.2 Signal description of Rx DMA
Sr. Signal name No. Signal Description Signal type
No. of
Bits

1 Clk 1 Clk Signal for DMA controller Input

2 Rst 1 Signal for Reset the DMA Input


controller

3 on_ip 1 Start signal for the data transfer Input


from peripheral interface to
AHB side
4 Rx_block_size 1 Data have any value or not Input

5 Rx_fifo_data 1 Rx FIFO is empty or not Input

6 dma_rx_process 1 DMA control signal for Input


receiving the data

7 Ahb_rx_process 1 AHB is ready for receiving the Input


data or not

8 Rx_last_byte 1 Last byte conformation signal Input

9 Rx_input_data 8 The data which is to be transfer Input


from Peripheral to AHB

10 ahb_output_data 8 The data which is received by Output


AHB side
11 end_of_transfer 1 This signal shows the Output
completion of data transfer
from peripheral to AHB

Page | 12
2.5 WORKING OF DMA CONTROLLER ARCHITECTURE
1. The AHB Slave block interfaces the AHB bus to the DMA channel registers. To
initiate a transfer, CPU will program all the operational registers of the peripheral
IP and all the operational registers of the AHB-DMA.
2. The value programmed into “Block Size Register” of the DMA Controller for a
particular channel (Tx or Rx). FIFO (Tx& Rx) inside the DMA, can have a either
a fixed or para-meterizable depth.
3. In case of Transmit, When-ever FIFO is not full, Data fetched from the system
memory, will be written into the Tx-FIFO.
4. Similarly, in case of receive, when-ever FIFO is not empty, data read from the
RxFIFO is stored to system memory.
5. Now DMA control signal check for transmitting/reviewing process. Only after the
activation of this control signal, data can proceed further.
6. After activation of control signal, receiving side (Ahb side or peripheral interface
side) will give the status that it is ready for transition or not.
7. At every transmission, last byte will be matched.
When the last bytes are transferred, an “End of Transfer” (Tx or Rx) is generated
by the DMA for conforming the completion of operation.

2.6 INTERNAL ARCHITECTURE


2.6.1 FIFO Architecture

FIFO is a First-In-First-Out memory queue with control logic that deals with
the read and writes operations, creates status flags, and gives discretionary handshake
signs to interfacing with the user logic. It is regularly used to control the flow of data
amongst source and goal. FIFO can be named synchronous or offbeat relying upon
whether same clock or different (no concurrent) clocks control the read and write
operations.

Page | 13
FIFO full and empty flags are produced and passed on to source and goal
logics, respectively, to pre-empt any flood or sub-current of data. Along these lines
data integrity amongst source and goal is kept up.

The clock domain that supplies data to FIFO is frequently eluded as write or
input logic and the clock domain that reads data from the FIFO is regularly eluded as
read or output logic. If read and write clock areas are governed by same clock signal
the FIFO is said to bisynchronous and if read and write clock domains are administered
by various (no concurrent) clock signals FIFO is said to be asynchronous.

2.6.1.1 Synchronous FIFO

FIFO full and FIFO empty banners are of extraordinary worry as no data ought
to be written in full condition and no data ought to be read in empty condition, as it
can prompt loss of data or age of non-relevant data. The full and empty conditions of
FIFO are controlled utilizing paired or gray pointers.

Asynchronous FIFO alludes to a FIFO design where data values are written
sequentially into a memory array utilizing a clock flag, and the data values are
sequentially read out from the memory array utilizing the same clock signal.

In synchronous FIFO the age of empty and full banners is straight forward as
there is no clock space crossing included. Considering this reality client can even
produce programmable partial empty and halfway full banners which are required in
many applications.

Page | 14
2.6.1.2 Block Diagram of FIFO

Write_data fifo_empty

Wdata_valid fifo_aempty
Fifo_full Synchronous rdata_valid

FIFO
Fifo_afull read_data

Write_ack read_req

Clk resent_n flush


Figure 2.2 Block diagram of FIFO

2.6.1.3 Signal description of FIFO

Table 2.3: FIFO common signal description


Name I/O Width Description
Clk I 1 Clock input to the FIFO. This is common input
to both read and write sides of FIFO

reset_n I 1 Active-low asynchronous reset input to FIFO


read and write logics
Flush I 1 Active-high synchronous flush input to FIFO.
A clock-wide pulse resets the FIFO read and
write pointers

Page | 15
Table 2.4: Write Side Ports
Name I/O Width Description
write_data I 16 16-bit data input to FIFO
wdata_valid I 1 Qualifies the write data. Logic high indicates
the data on write_data bus is valid and need to
be sampled at next rising edge of the clock.

fifo_full O 1 Indicates to the source that FIFO’s internal


memory has no space left to take in new data

fifo_afull O 1 Indicates to the source that FIFO’s internal


memory has only few spaces left for new data.
Upon seeing this source may decide to slow
down or stall the write operation

write_ack O 1 Write acknowledgement to source.

Table 2.5: Read Side Ports


Name I/O Width Description
read_req I 1 Data read request to FIFO from the requestor
read_data O 16 Read data in response to a read request. Data is
valid in the next cycle of read_req provided
FIFO is not empty
rdata_valid O 1 Qualifies read data out. A logic high indicates
the data on read_data bus is valid and need to be
sampled at the next rising edge of the clock

fifo_empty O 1 Indicates to the requestor that FIFO’s internal


memory is empty and therefore has no data to
serve upon the read request

fifo_aempty O 1 Indicates to the requestor that FIFO’s internal


memory is almost empty and therefore has only
few data left to serve upon the future read
requests. Upon seeing this requestor may decide
to slow down or stall the read operation.

Page | 16
2.6.2 Write Control Logic

Write Control Logic is used to control the write operation of the FIFO’s internal
memory. It generates binary-coded write pointer which points to the memory location
where the incoming data is to be written. Write pointer is incremented by one after
every successful write operation. Additionally, it generates FIFO full and almost full
flags which in turn are used to prevent any data loss. For example, if a write request
comes when FIFO is full then Write Control Logic stalls the write into the memory till
the time fifo_full flag gets de-asserted. It intimates the stalling of write to source by
not sending any acknowledgement in response to the write request.
2.6.2.1 Block diagram of write control logic

Figure 2.3 Block diagram of write control logic


2.6.2.2 Signal description of write control logic

Table 2.6: Signal description of write control logic


Name I/O Width Description
Clk I 1 Clock input
reset_n I 1 Active-low reset input
Flush I 1 Active-high synchronous flush input to FIFO.
A clock-wide pulse resets the FIFO read and
write pointers

Page | 17
wdata_valid I 1 Qualifies write data in. A logic high indicates
the data on write_data bus is valid

read_ptr I 5 Read pointer from Read Control Logic. This


along with write pointer is used to find FIFO
full and almost full condition

write_enable O 1 Write enable to FIFO’s internal memory


write_ptr O 5 Write pointer value. This serves as a write
address to FIFO’s internal memory

write_ack O 1 Acknowledgement to source that write


operation is done.

fifo_full O 1 Indicates to the source that FIFO’s internal


memory has no space left to take in new data

fifo_afull O 1 Indicates to the source that FIFO’s internal


memory has only few spaces left for new data.
Upon seeing this source may decide to slow
down or stall the write operation

2.6.3 Read Control Logic

Read Control Logic is used to control the read operation of the FIFO’s internal
memory. It generates binary-coded read pointer which points to the memory location
from where the data is to be read. Read pointer is incremented by one after every
successful read operation. Additionally, it generates FIFO empty and almost empty
flags which in turn are used to prevent any spurious data read. For example, if a read
request comes when FIFO is empty then Read Control Logic stalls the read from the
memory till the time fifo_empty flag gets de-asserted. It intimates the stalling of read
to the requestor by not asserting rdata_valid in response to the read request.

Page | 18
2.6.3.1 Block diagram of read control logic

Figure 2.4 Bock diagram of read control logic


2.6.3.2 Signal description of read control logic
Table 2.7: Signal description of read control logic
Name I/O Width Description
Clk I 1 Clock input
reset_n I 1 Active-low reset input
Flush I 1 Active-high synchronous flush input to FIFO.
A clock-wide pulse resets the FIFO read and
write pointers

read_req I 1 Read request from the requestor.


write_ptr I 5 Write pointer from Write Control Logic. This
along with read pointer is used to find FIFO
empty and almost empty conditions

read_enable O 1 Read enable to FIFO’s internal memory


read_ptr O 5 Read pointer value. This serves as a read
address to FIFO internal memory

rdata_valid O 1 Acknowledgement to source that write


operation is done.

fifo_empty O 1 Indicates to the source that FIFO’s internal


memory has no space left to take in new data

Page | 19
fifo_aempty O 1 Indicates to the source that FIFO’s internal
memory has only few spaces left for new data.
Upon seeing this source may decide to slow
down or stall the write operation

2.6.4 Memory Array

Memory Array is an array of flip-flops which stores data. Number of data


words that the memory array can store is often referred as depth of the FIFO. Length
of the data word is referred as width of the FIFO. Besides flop-array it comprises read
and write address decoding logic.

2.6.4.1 Block diagram of memory array

Figure 2.5 Block diagram of memory Array

2.6.4.2 Signal description of memory array

Table 2.8: Signal description of Memory Array

Name I/O Width Description


Clk I 1 Clock input
write_addr I 4 Write address to the memory. It is derived from
write pointer by knocking-off its MSB

write_enable I 1 Active-high write enable input to the memory

Page | 20
write_data I 16 Data Input to the memory
read_addr I 4 Read address to the memory. It is derived from
read pointer by knocking-off its MSB

read_enable I 1 Active-high read enable to memory


read_data O 16 Data read out from the memory

2.6.5 Working of Internal Architecture

 At reset, both read and write pointers are 0. This is the empty condition of the
FIFO, and fifo_empty is pulled high and fifo_full is low.
 At empty, reads are blocked and only operation possible is write.
 Since fifo_full is low, upon seeing a valid write data Write Control Logic will
ensure the data be written into location 0 of memory array and write_ptr be
incremented to 1. This causes the empty signal to go LOW.
 With fifo_empty pulled down, read operation can now be performed. Upon
seeing read request at this state Read Control Logic will fetch data from location
0 and will increment read_prt to 1.
 In this way read keeps following write until the FIFO gets empty again.
 If write operations are not matched by read soon FIFO will get full and any
further write will get stalled until fifo_full is pulled down by a read.
 With the help of FIFO full and empty flags data integrity is maintained between
the source and the requestor.

Page | 21
Figure 2.6 Complete flow of internal Architecture

2.6.6 Test Bench of Internal Architecture

A test bench is a virtual environment used to verify the correctness or


soundness of a design or model. It is a computer program written in a Verilog HDL,
for the purpose of verifying a module’s functional behavior.

Figure 2.7 Simplified view of Test Bench

Page | 22
Figure 2.8 Signals applied to DUT for testing

Different aspects of test bench of internal architecture are as follows:


 Parameterized clock generator block
 Reset signal generator
 16-bit LFSR counter to generate 16-bit pseudo-random
data 16 bit input data is generated using maximum length
LFSR.
 data_valid signal is generated by the condition of FIFO full or empty as well
as ready signal.
 LFSR bits are also used to generate source_ready and read_req signals.
 Source and destination buffers to store each data word sent to FIFO and read
out of FIFO respectively.

Page | 23
LOW POWER DMA
CONTROLLER

Page | 24
3.1 FSM Decomposition Approach

A finite state machine is decayed into a number of coupled submachines. These


submachines are said to be coupled in the feeling that state transitions occur either
inside a submachine or between two submachines. The vast majority of the time, just
a single of the submachines will be initiated which, thusly, could prompt considerable
reserve funds in power consumption so, the aim is to diminish the power dissipation
in FSM circuits by limiting the switching activity in state registers.
The key steps in our approach are:
 Decomposition of a finite state machine into submachine so that there is a high
probability that state transitions will be confined to the smaller of the sub
machines most of the time.
 Synthesis of the coupled sub machines to optimize the logic circuits i.e. to
assign state codes to the states of the sub machines.
 Decomposition is based on the concept in which the redundant computation
can be dynamically disabled to reduce the overall power dissipation.
 When the input signal arrives, the active submachine remains active. In that
case, the other submachine will not be turned on and remain inactive. On the
other hand when next input signal arrives, the active submachine might turn
on another submachine and turn itself off, becoming inactive.
 At any moment, only one submachine is active (with its corresponding
combinational circuit turned on) while all other sub machines are inactive
(with their corresponding combinational circuits turned off).
 An effective approach to reduce power dissipation is to “turn off” portions of
the circuit, and hence reduces the switching activities in the circuit.
 In general, since the combinational circuit for each submachine is smaller than
that for the original machine, power consumption in the decomposed machine
will be smaller than that of the original machine.
The sources of energy consumption on a CMOS chip can be classified as static and
dynamic power dissipation. The dominant component of energy consumption in

Page | 25
CMOS is dynamic power consumption caused by the actual effort of the circuit to
switch. A first order approximation of the dynamic power consumption of CMOS
circuitry is given by the formula:
P=C*V2* f (3.1)

Where P is the dynamic power, C is the compelling switch capacitance, V is


the supply voltage, and f is the recurrence of operation.

The power dissemination emerges from the charging and releasing of the circuit
hub capacitances found on the yield of each rationale door. Each low-to-high rationale
progress in an advanced circuit acquires a difference in voltage, drawing vitality from
the power supply.

An originator at the innovative and structural level can endeavor to limit the
factors in these conditions to limit the general vitality utilization. Nonetheless, control
minimization is frequently an intricate procedure of exchange offs between speed,
region, and power utilization.

Static vitality utilization is caused by cut off, inclination, and spillage streams.
Amid the change on the contribution of a CMOS door both p and n channel gadgets
may direct at the same time, quickly building up a short from the supply voltage to
ground.

While statically-one-sided entryways are normally found in a couple of specific


circuits, for example, PLAs, their utilization has been drastically decreased. Spillage
current is turning into the overwhelming part of static vitality utilization. Up to this
point, it was viewed as a moment arranges impact; be that as it may, the aggregate sum
of static power utilization pairs with each new procedure innovation.

Vitality utilization in CMOS hardware is relative to capacitance; in this


manner, a procedure that can be utilized to diminish vitality utilization is to limit the

Page | 26
capacitance. This can be accomplished at the structural level of configuration and
additionally at the rationale and physical execution level.
Associations with outside parts, for example, outer memory, normally have
substantially more prominent capacitance than associations with on-chip assets.
Therefore, getting to outer memory can build vitality utilization. Thus, an approach to
decrease capacitance is to lessen outer gets to and improve the framework by utilizing
on-chip assets, for example, stores and registers. Likewise, utilization of less outside
yields and occasional exchanging will bring about unique power investment funds.

Directing capacitance is the primary driver of the constraint in clock


recurrence. Circuits that can run quicker can do as such due to a lower directing
capacitance. Therefore, they disperse less power at a given clock recurrence. In this
way, vitality diminishment can be accomplished by improving the clock recurrence of
the plan, regardless of the possibility that the subsequent execution is far in
overabundance of the necessities.

3.2 Basic Principles


CMOS is, by far, the most common technology used for manufacturing digital
ICs. There are 3 major sources of power dissipation in a CMOS circuit:

P=PSwitching+PShort-Circuit+PLeakage (3.2)
PSwitching, called switching power, is due to charging and discharging capacitors driven
by the circuit.
PShort-Circuit, called short-circuit power, is caused by the short circuit currents that arise
when pairs of PMOS/NMOS transistors are conducting simultaneously.
Finally, PLeakage, called leakage power, originates from substrate injection and sub
threshold effects. For older technologies (0.8 µm and above), PSwitching was
predominant.
For deep-submicron processes, PLeakage becomes more important. Design for low
power implies the ability to reduce all three components of power consumption in
CMOS circuits during the development of a low power electronic product.

Page | 27
PSwitching= CLVDD 2f (3.3)
where CL is the output load of the gate,
VDD is the supply voltage,
f is the expected frequency,
where f can be calculated as
f=P(1-P) (3.4)

where P is Transition probability of states or probability of occurrence of the states.

3.3 Signal Entropy


Entropy theory has been effectively utilized as a part of correspondence
framework to break down the data substance or limits of the frameworks with
incredible achievement. The use of entropy theory to VLSI control investigation is
moderately new. In entropy examination, the signs in a rationale circuit are dealt with
as a gathering of arbitrary signs. The entropy or irregularity of the signs is identified
with the normal exchanging exercises of the circuit.

3.3.1 Basics of entropy

Entropy is a measure of the arbitrariness conveyed by an arrangement of


discrete occasions saw after some time. In the investigations of the data hypothesis, a
strategy to evaluate the data content Ci of an occasion Ei in this way is o take
logarithmic of the occasion probability
Ci=log2(1/Pi) (3.5)
Since 0≤ Pi ≤1, the logarithmic term is non-negative and we have Ci>0
The average information contents of the system are the weighted sum of the
information content of Ci by its occurrence probability this is also called the entropy
of the system.
N

H(X) = Σ pi log2 ( 1 / pi ) (3.6)

i=1

Page | 28
3.3.2 Power Estimation using Entropy

Intuitively, entropy likewise corresponds to the normal exchanging recurrence


of the signs. An n-bit flag that seldom flips recommends that the word-level esteems
are moderately dormant and much esteem won't show up. This skewed event
likelihood gives a low entropy measure. Then again if the flag exchanging is extremely
dynamic, all word-level esteems are probably going to show up with a similar
likelihood and this expands the entropy of the signs. Such perception prompts utilizing
signal entropy for power estimation.

Total no of nodes N, each node with frequency fi has cap Ci then for constant
Vdd, total power will be
N

Ptotal = Σ C V2 f (3.7)

i=1

3.3.3 Shortcoming of the solution

In this approach major shortcoming is increment in zone as a result of number


of more FSM's and more crossing transitions as a result of hopping starting with one
state then onto the next. So, power is lessened at the cost of zone which brings about
reduction in general execution. In this way, planner needs to deal with both the
imperatives. Additionally, it impacts speed likewise as it requires greater investment
to change starting with one state then onto the next. As a result, both plan and
technological solutions must be connected keeping in mind the end goal to remunerate
the abatement in circuit execution presented by breaking down it in different sub
FSM's.
A similar problem, i.e., performance decrease, is experienced when power
optimization is acquired through recurrence scaling. Techniques that depend on
diminishments of the clock frequency to lower power consumption are in this manner
usable under the requirement that some execution slack exists. In spite of the fact that
this may sometimes happen for designs considered completely, it happens regularly

Page | 29
that some particular units in a larger architecture don't require peak performance for
some clock machine cycles. Specific frequency scaling (and voltage scaling) on such
units may hence be connected, at no punishment in the general system speed.
Optimization approaches that have a lower effect on execution, yet permitting critical
power savings, are those focusing on the minimization of the switched capacitance
(i.e., the result of the capacitive load with the switching activity). Static solutions (i.e.,
pertinent at design time) handle exchanged capacitance minimization through area
optimization (that compares to a diminishing in the capacitive load) and switching
activity decrease by means of misuse of different sorts of signal correlations (temporal,
spatial, spatiotemporal). Dynamic techniques, on the other hand, aim at taking out
power wastes that might be begun by the application of certain system workloads (i.e.,
the data being processed).

3.4 Tx FSM
A finite state machine is decomposed into a number of coupled sub machines.
These sub machines are said to be coupled in the sense that state transitions take place
either within a submachine or between two sub machines. Most of the time, only one
of the sub machines will be activated which, consequently, could lead to substantial
savings in power consumption.
Basic Methodology:
 Keep minimum crossing Transition to reduce power consumption. Designer
looks in state transition table and find out which state is having maximum
number of transition, that state which is having maximum transition is ignored
for crossing transition and state which is having minimum transition is suitable
for outgoing transition.
 First bit of state code is Control bit distinguish between Sub machines. For e.g.
if state code is 011 , first bit is 0 it shows it belongs to sub machine
M1.Similarly if state code is 101 , first bit is 1 it shows that it belongs to sub
machine M2.
 Inner two bits are to distinguish between states within each submachine.

Page | 30
3.4.1 State Transition matrix of Tx FSM

Figure 3.1 FSM of TX DMA controller

Table 3.1: States and codes of Tx FSM


States Codes
ST1 001
ST2 010
ST3 011
ST4 100
ST5 101

Page | 31
ST6 110
Above table shows the codes of states ST1 to ST6. This FSM can proceed for
decomposition depending upon the values of state codes.

Table 3.2: State Transition Matrix for Tx fsm


i/p Sequence ST1 ST2 ST3 ST4 ST5 ST6
0 1
1 1
0 1
1 1
0 1
1 1
0 1
1 1
0 1
1 1
0 1
1 1
Total 3 3 2 1 1 2

In upper table, the State Transition matrix is created on the basis of the occurrence of
the states when an input sequence is given to it. Here the input sequence is mentioned
in the extreme left column of state transition matrix and depending on these input
sequences the probability of occurrence of states are mapped in it.

Referring above figure, initially designer is at ST1 state. When input 1 is


applied, it jumps to ST2. Now again 0 is applied, next state is ST3.It shows the block
size have any value or not. Now at ST3 when 1 is applied, it jumps to state ST4. It
shows that when data comes to FIFO, FIFO is free to consider the operation or not.

Page | 32
When 1 is applied, it jumps to next state ST5. After that data proceed to ST6 where
last byte of data is checked.

3.4.2 Power calculation of Tx FSM

Total no. of combination = No. of states * input sequence


= 6* 12=72
Frequency (f) =P (1-P) where P= Transition Probability of states
P (ST1) = (3/72) in (72/3)
= 0.1324
P (ST2) = (3/72) in (72/3) =
0.1324
P (ST3) =(2/72) in (72/2)
=0.0995
P (ST4)=(1/72) in (72/1)
=0.0593
P (ST5)=(1/72) in (72/1)
=0.0593
P (ST6)=(2/72) in (72/2)
=0.0995

Total Switching probability of FSM= Sum of switching probability of all states


= P (ST1) + P (ST2) + P (ST3) + P (ST4) + P (ST5) + P (ST6)
= 0.1324+0.1324+0.0995+0.0593+0.0593+0.0995 =0.0995 f =P*(1-P)
=0.2432
Let C= 1mF, Vdd = 5V
Power of Tx_fsm = C*V2*f = 1* 52* 0.2432 mW
=6.08 mW

Page | 33
3.4.3 Decomposition of Tx FSM

Figure 3.2 Decomposition of Tx FSM

Above figure demonstrates unique FSM with dashed lines which decay it into
two parts upper half and lower half [24]. Decomposition is done on the premise of first
digit of state codes of states for e.g. in state codes of ST1 , ST2 , ST3 are having 0 as
the main piece so these has a place to submachine M1.Similarly other states are having
1 as first bit so these have a place to submachine M2.

Page | 34
3.4.3.1 Power Calculation of upper Tx FSM

Figure 3.3 FSM decomposed in upper half sub FSM M1

As can be interpreted from State transition Matrix, the State ST2 in submachine M1
has maximum number of transition. So we cannot have outgoing crossing transition
from state ST2 as well as from state ST1 as this state also has more number of
transitions. . Only state ST3 in submachine M1 can have an outgoing crossing
transition as it has minimum transition in submachine M1. So, a crossing transition is
done from State ST3 to State ST6 depending upon the last three bits of the state code

Page | 35
Table 3.3: State Transition Matrix for TX upper fsm M1
i/p ST1 ST2 ST3 ST4 ST5 ST6
Sequence
0 1
1 1
0 1
1 1
0 1
Total 2 1 2
Transitions

P(ST1) = (2/72) ln (72/2)


=0.0995
P(ST2) = (1/72) ln(72/1)
=0.0593
P(ST3)= (2/72) ln (72/2)
=0.0995
Total Switching probability of Upper Tx FSM M1= P(ST1) + P(ST2) + P(ST3)
=0.2583
f = P* (1-P)
=0.2583*(1-0.2583)
=0.19158
Let C= 1mF, Vdd = 5 V
Power of Upper fsm of Tx = C* V2 * f
= 1 *52 * 0.19158 mW
=4.789 mW

Page | 36
3.4.3.2 Power calculation of lower Tx FSM

Figure 3.4 Lower Sub machine of Tx M1


Table 3.4: State Transition Matrix for Tx Lower fsm M2
i/p ST1 ST2 ST3 ST4 ST5 ST6
Sequence
0
1 1
0
1 1
0 1
Total 1 2
Transitions

P(ST4) = 0
P(ST5)= (1/72) ln(72/1) =
0.05923
P(ST6) = (2/72) ln(72/2)
= 0.0995

Page | 37
Sum of the switching probability of these states will give the total probability of
occurrence of States in Upper Sub FSM.
Total Switching probability of Lower Tx FSM M2= P(ST4) + P(ST5) + P(ST6)
= 0+ 0.05932+0.0995
=0.1588
f =P *(1-P)
=0.1588 * (1-0.1588)
=0.13358
Let C= 1mF, Vdd = 5 V
Power of Lower fsm of Tx M2 = C* V2 * f
= 1*52 *0.13358 mW
= 3.3395 mW

3.4.3.3 Power saving of Tx FSM

Power Consumption by Tx Decomposition = Average of M1 Tx and M2 Tx


= (4.789 + 3.3365)/2
=4.064
Percentage of power saving =( Poriginal – P sub fsm)/ Poriginal
= (6.08-4.064)/6.08
= 33.15%

Page | 38
3.5 Rx FSM

Figure 3.5 FSM of Rx DMA controller

Table 3.5: State and codes of Rx FSM


States Codes
ST1 001
ST2 010
ST3 011
ST4 100
ST5 101
ST6 110

There are 6 states in Rx FSM of DMA controller which consists of data transfer
states from Peripheral interface to AHB side. In upper table, the State Transition
matrix is created on the basis of the occurrence of the states when an input sequence
is given to it. Here the input sequence is mentioned in the extreme left column of state
transition matrix and depending on these input sequences the probability of occurrence
of states are mapped in it.

Page | 39
Referring above figure, at first originator is at ST1 state. At the point when
input 1 is connected, it bounces to ST2. Presently again 1 is connected, next state is
ST3.It demonstrates the piece estimate have any esteem or not. Presently at ST3 when
1 is connected it bounces to state ST4. It demonstrates the when information comes to
FIFO, FIFO is allowed to consider the operation or not. At the point when 1 is
connected, next state in ST5. After that information continue to ST6 where last byte
of information is checked. In the wake of getting last byte a flag end of exchange
affirm the consummation of operation.

3.5.1 State transition matrix of Rx FSM


Table 3.6 State Transition Matrix for Rx FSM
i/p Sequence ST1 ST2 ST3 ST4 ST5 ST6
0 1
1 1
0 1
1 1
0 1
1 1
0 1
1 1
0 1
1 1
0 1
1 1
Total 4 1 2 1 2 2

3.5.2 Power calculation of original Rx FSM

Total no. of combination = No. of states * input sequence

Page | 40
= 6* 12

=72

Frequency (f)= P(1-P) where P= Transition Probability of


states P(ST1) = (4/72) ln (72/4) = 0.1605
P(ST2)= (1/72) ln (72/1)

= 0.0593

P(ST3)=(2/72) ln(72/2)
=0.0995
P(ST4)=(1/72) ln (72/1)
=0.0593
P(ST5)=(2/72) ln(72/2)
=0.0995
P(ST6)=(2/72) ln(72/2)

=0.0995

Total Switching probability of FSM= Sum of switching probability of all states

= P(ST1) + P(ST2) + P(ST3) + P(ST4) + P(ST5) + P(ST6)

= 0.1605+0.0593+0.0995+0.0593+0.0995+0.0995

=0.5776 f
=P*(1-P)
=0.2439
Let C= 1mF, Vdd = 5V

Power of Tx_fsm = C*V2*f = 1* 52* 0.2439 mW

=6.0975 mW

Page | 41
3.5.3 Decomposition of Rx FSM

Figure 3.6 Decomposition of DMA Rx FSM

Above figure shows original FSM with dashed lines which decompose it into
two parts upper half & lower half [24]. Decomposition is done on the basis of first
digit of state codes of states for e.g. in state codes of ST1 , ST2 , ST3 are having 0 as
the first bit so these belongs to sub machine M1.Similarly other states are having 1 as
first bit so these belong to sub machine M2.

Page | 42
3.5.3.1 Power calculation of upper Rx FSM

Figure 3.7 Upper Sub machine of Rx M1


Table 3.7: State Transition Matrix for Rx upper FSM
i/p ST1 ST2 ST3 ST4 ST5 ST6
Sequence
0 1
1 1
0 1
1 1
0 1
Total 2 1 2
Transitions

P (ST1) = (2/72) in (72/4)


= 0.0995
P (ST2) = (1/72) in (72/1) =
0.0593
P (ST3) = (2/72) in (72/2)

Page | 43
=0.0995
Total Switching probability of Upper TX FSM M1= P (ST1) + P (ST2) + P (ST3)
=0.2583
f = P* (1-P)
=0.2583*(1-0.2583)
=0.1915

Let C= 1mF, Vdd = 5 V


Power of Upper fsm of Tx = C* V2 * f
= 1 *52 * 0.1915 mW
=2.2875 mW

3.5.3.2 Power calculation of lower Rx FSM

Figure 3.8 Lower Sub machine of Rx M2


Table 3.8 State Transition Matrix for Rx Lower FSM M2
i/p ST1 ST2 ST3 ST4 ST5 ST6
Sequence
1 1
0 1

Page | 44
1 1
0 1
Total 2 2
Transitions

P (ST4) = 0
P (ST5) = (2/72) in (72/2)
= 0.0995
P (ST6) = (2/72) in (72/2)
= 0.0995
Total Switching probability of Lower Rx FSM M2= P (ST4) + P (ST5) + P (ST6)
= 0+ 0.0995+0.0995
=0.199 f =P
*(1-P)
=0.199 * (1-0.199)
=0.1593
Let C= 1mF, Vdd = 5 V
Power of Lower fsm of Rx M2 = C* V2 * f
= 1*52 *0.1593 mW
= 3.9825 mW

3.5.3.3 Power Saving of Rx FSM

Power Consumption by Rx Decomposition = Average of M1 Rx and M2 Rx


= (2.2875+3.9825)/2
=3.135
Percentage of power saving in Rx fsm=( Poriginal – P sub fsm)/ Poriginal
= (6.0975-3.135)/6.0975
= 48.58%
Percentage of power saving in Tx fsm = 33.15%

Page | 45
Page | 46

You might also like