0% found this document useful (0 votes)
13 views

CH13 DRAM Controller

The document discusses DRAM controller strategies including row buffer management policies, address mapping techniques, and command scheduling. Row buffer management policies include open-page, close-page, and hybrid policies which balance row buffer hits and precharges. Address mapping aims to minimize bank conflicts and maximize row hits and parallelism. Command scheduling issues DRAM commands to banks each cycle.

Uploaded by

洪啟恩
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

CH13 DRAM Controller

The document discusses DRAM controller strategies including row buffer management policies, address mapping techniques, and command scheduling. Row buffer management policies include open-page, close-page, and hybrid policies which balance row buffer hits and precharges. Address mapping aims to minimize bank conflicts and maximize row hits and parallelism. Command scheduling issues DRAM commands to banks each cycle.

Uploaded by

洪啟恩
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Memory

Systems
CH13 DRAM
Controller
Prof. Ren-Shuo Liu
Outline
• DRAM controller model
• Controller's strategies
• Row buffer management
• Address mapping
• Command scheduling

2
DRAM Controller Model

• DRAM controller accepts physical address requests from clients, such as


processors and I/O devices
• Arbiter schedules transactions to enter into the controller

3
DRAM Controller Model

• A physical address request is mapped (translated) to a DRAM address


location and then converted to a sequence of DRAM commands
• DRAM commands are placed in queues
• Depending the command scheduling policy, commands are scheduled to
the DRAM devices

4
Row Buffer Management
• Sense amplifiers can act as buffers (i.e., row buffers)
to provide temporary data storage
• Row buffer significantly affects DRAM system
performance
• Accessing an opened row is fast (i.e., row buffer hit)
• Accessing another row requires a precharge and an
activation

ACT Read PRE ACT Read PRE


time

ACT Read Read PRE


time
5
Row Buffer Management
• Policies
• Open-page
• Close-page
• Hybrid (dynamic)

6
Open-Page Policy
• Leave a row opened after a row command
• Anticipate future temporally and spatially adjacent
memory accesses to the same row
• Exploit applications' locality
• Achieve the minimal row hit latency (tCAS)

7
Close-Page Policy
• Close a page immediately after a row command
• Favor systems with low degrees of access locality
• Precharge is performed as soon as possible
• Reduce the row miss latency

8
Hybrid (Dynamic) Policy
• Neither a strictly open-page policy nor a strictly
close-page policy achieve the best performance
• Modern controllers typically adopt a dynamic
combination of the two policies
• Runtime behaviors, including access locality and request
rate, are both considered
• Example
• Hit rate-aware hybrid policy
• Time-aware hybrid policy

9
Hit Rate-Aware Hybrid Policy
• Controller switches between the two policies
• If the row-hit probability is greater than a threshold
 switch to open page
• Otherwise
 switch to close page

10
Hit Rate-Aware Hybrid Policy
• Threshold selection based on a simple analysis
Access latency

tRP + tRCD + tCAS


close page
tRCD + tCAS Better

tCAS

0 1 Row-hit probability
𝑡𝑅𝑃
𝑡𝑅𝐶𝐷 + 𝑡𝑅𝑃

11
Time-Aware Hybrid Policy
• Concept
• Prevent opening rows too long, which wastes power
• Mechanism
• A timer is set to a predetermined value when a row is
activated
• The timer counts down
• When the timer reaches zero, a precharge command is
issued to precharge the bank
• In case of a row buffer hit to an open bank, the counter
is reset to a higher value

12
Address Mapping

1….N

1….M

Channel, rank,
bank, row, column
13
Address Mapping
• Consideration
• Minimize the probability of bank conflicts in temporally
adjacent requests
• Maximize the row hit ratio
• Maximize the parallelism
• Available parallelism in memory systems
• Channel
• Bank
• Rank

14
Address Mapping
• Channel-level parallelism
• There are no restrictions from the perspective of the
DRAM memory system on requests issued to different
logical channels
• Mapping consecutive cache lines to different channels
maximizes the parallelism of sequential accesses
• Mapping nearby cache lines to the same row maximizes
the row hit chances

15
Address Mapping
• Rank-level and bank-level parallelism
• Consecutive memory accesses can proceed in parallel to
different ranks or different banks
• But in general, scheduling consecutive accesses to
different banks of a given rank is more efficient than to
different ranks
• Because of the need of rank-to-rank switching latency, tRTRS

16
Address Mapping Examples
Virtual address: 36 bits Physical memory: 4 GB # Banks per rank: 8
Virtual page: 4 KB # Channels: 2 # Columns per row: 8 K
# Bytes per cache line: 64 # Rank per ch. : 2 # Bytes per column: 1

24 12
Virtual address VPN
(TLB)
20 12
Physical address PPN
32
DRAM address

Channel/rank/bank/row/col
addresses
17
Baseline Close-Page Mapping
Virtual address: 36 bits Physical memory: 4 GB # Banks per rank: 8
Virtual page: 4 KB # Channels: 2 # Columns per row: 8 K
# Bytes per cache line: 64 # Rank per ch. : 2 # Bytes per column: 1

24 12
Virtual address VPN
(TLB)
20 12
Physical address PPN
14 7 131 6
DRAM address

rank bank ch lower part of


row higher part of
addr. addr. addr. column addr.
addr. column addr.

18
Baseline Open-Page Mapping
Virtual address: 36 bits Physical memory: 4 GB # Banks per rank: 8
Virtual page: 4 KB # Channels: 2 # Columns per row: 8 K
# Bytes per cache line: 64 # Rank per ch. : 2 # Bytes per column: 1

24 12
Virtual address VPN
(TLB)
20
Physical address PPN
14 13 7 1 6
DRAM address

rank bank higher ch lower part of


row
addr. addr. part of addr. column addr.
addr.
column
addr.
19
Possible Issue
• Stride Collision
char a[256*1024];
char b[256*1024];
char c[256*1024];
24 12 …
Virtual address VPN for(int i…)
a[i]=b[i]+c[i];
(TLB) …
20
Physical address PPN  a[i], b[i], and c[i]
probably all map to the
14 13 7 1 6 same bank
DRAM address

rank bank higher ch lower part of


row
addr. addr. part of addr. column addr.
addr.
column
addr.
20
HW Solution to Stride Collision
• Enlarge the stride
24 12 char a[256*1024];
Virtual address VPN char b[256*1024];
char c[256*1024];
(TLB) …
20 for(int i…)
Physical address PPN a[i]=b[i]+c[i];

XOR

14 13 7 1 6
DRAM address

rank bank higher ch lower part of


row
addr. addr. part of addr. column addr.
addr.
column
addr.
21
Command Scheduling
• Each cycle it is possible to issue a command
• Activation
• Column read/write
• There’s an option to issue an auto-precharge
• Precharge command
• to any bank
• to all bank of a rank
• Power up/down command
• Refresh command

22
Optimization Goal
• Performance (delay)
• Sum of execution times for all involved programs

• Energy-delay product
• Sum of EDPs for all involved programs

• Fairness
• Compute the slowdown for each program, relative to its
single-thread execution
• Fairness metric is the ratio of the max slowdown to the
min slowdown

23
Example Scheduler
• Preliminary schedulers
• First-come, first-serve (FCFS)
• Open-page, first-ready, first-come first serve (FR-FCFS)
• Close-page
• Power-down
• First-ready-round-robin (FRRR)
• Credit-fair
• MLP-aware
• PAR-BS

24
First-Come First-Serve (FCFS)
• Algorithm
• Read queue is ordered by request arrival time
• Every cycle, the scheduler scans the read queue
sequentially until it finds an instruction that can issue in
the current cycle
• When the write queue size exceeds a high water mark,
writes are drained similarly until a low water mark is
reached
• Writes are also drained if there are no pending reads

25
First-Come First-Serve (FCFS)
START

Write Q > HI_WM Y


|| Read Q == 0
N Write drain mode = true
N Write drain mode && Y
Write Q > LO_WM

Find the first Find the first


issuable command issuable command
in the read queue in the write queue

Found Not found


Issue request command

END
26
Close-Page
• Algorithm
• Mainly based on FR-FCFS
• In every idle cycle, the scheduler issues precharge
operations to banks that last serviced a column
read/write

27
Close-Page
START

Write Q > HI_WM Y


|| Read Q == 0
N Write drain mode = true
N Write drain mode Y
&&Write Q > LO_WM

Find the first Find the first


issuable command issuable command
in the read queue in the write queue

Found Not found


Issue request command Try issuing PRECHARGE

END
28
Power-Down
• Algorithm
• Issues powerdown commands in every idle cycle

29
Power-Down
START

Write Q > HI_WM Y


|| Read Q == 0
N Write drain mode = true
N Write drain mode Y
&&Write Q > LO_WM

Find the first Find the first


issuable command issuable command
in the read queue in the write queue

Found Not found


Issue request command Try issuing PWR_DN

END
30
First-Ready-Round-Robin
• Algorithm
• First tries to issue any open row hits with the “correct”
thread-id (as defined by the current round robin flag)
• Then other row hits
• Then row misses with the “correct” thread-id
• Finally, a random request

• Effects
• Combine the benefits of open row hits with the fairness
of a round-robin scheduler

31
Credit-Fair
• Algorithm
• For every channel, this algorithm maintains a set of counters
for credits for each thread
• When scheduling reads, the thread with the most credits is
chosen
• Reads that will be open row hits get a 50% bonus to their
number of credits for that round of arbitration
• When a column read command is issued, that thread’s total
number of credits for using that channel is cut in half
• Each cycle all threads gain one credit
• Write queue draining happens in an FR-FCFS manner
• Effects
• Threads with infrequent DRAM reads will store up their
credits for many cycles so they will have priority when they
need to use them
32
MLP-Aware
• Algorithm
• Assumes that threads with many outstanding misses
(high memory level parallelism, MLP) are not as limited
by memory access time
• Prioritizes requests from low-MLP threads over those
from high-MLP threads
• To support fairness, a request’s wait time in the queue is
also considered
• Writes are handled as in FCFS, with appropriate high and
low water marks

33
Parallelism Aware Batch Scheduling
(PAR-BS)
• Recall FR-FCFS
• Exploit the latency benefit if successive requests hit the
same row buffer
PARBS
• Improve average latency and fairness among
threads
PAR-BS
• More sophisticated scheduling policy
• Improve both fairness and speedup
• Major policies
• Batch formation
• Request prioritization
• Thread ranking
• Misc.
Policy: Batch Formation
Policy: Request Prioritization
Policy: Thread Ranking

3. Remaining ties are broken according to process/thread IDs.


P0 > P1 > P2 .. and so on.

You might also like