CH13 DRAM Controller
CH13 DRAM Controller
Systems
CH13 DRAM
Controller
Prof. Ren-Shuo Liu
Outline
• DRAM controller model
• Controller's strategies
• Row buffer management
• Address mapping
• Command scheduling
2
DRAM Controller Model
3
DRAM Controller Model
4
Row Buffer Management
• Sense amplifiers can act as buffers (i.e., row buffers)
to provide temporary data storage
• Row buffer significantly affects DRAM system
performance
• Accessing an opened row is fast (i.e., row buffer hit)
• Accessing another row requires a precharge and an
activation
6
Open-Page Policy
• Leave a row opened after a row command
• Anticipate future temporally and spatially adjacent
memory accesses to the same row
• Exploit applications' locality
• Achieve the minimal row hit latency (tCAS)
7
Close-Page Policy
• Close a page immediately after a row command
• Favor systems with low degrees of access locality
• Precharge is performed as soon as possible
• Reduce the row miss latency
8
Hybrid (Dynamic) Policy
• Neither a strictly open-page policy nor a strictly
close-page policy achieve the best performance
• Modern controllers typically adopt a dynamic
combination of the two policies
• Runtime behaviors, including access locality and request
rate, are both considered
• Example
• Hit rate-aware hybrid policy
• Time-aware hybrid policy
9
Hit Rate-Aware Hybrid Policy
• Controller switches between the two policies
• If the row-hit probability is greater than a threshold
switch to open page
• Otherwise
switch to close page
10
Hit Rate-Aware Hybrid Policy
• Threshold selection based on a simple analysis
Access latency
tCAS
0 1 Row-hit probability
𝑡𝑅𝑃
𝑡𝑅𝐶𝐷 + 𝑡𝑅𝑃
11
Time-Aware Hybrid Policy
• Concept
• Prevent opening rows too long, which wastes power
• Mechanism
• A timer is set to a predetermined value when a row is
activated
• The timer counts down
• When the timer reaches zero, a precharge command is
issued to precharge the bank
• In case of a row buffer hit to an open bank, the counter
is reset to a higher value
12
Address Mapping
1….N
1….M
Channel, rank,
bank, row, column
13
Address Mapping
• Consideration
• Minimize the probability of bank conflicts in temporally
adjacent requests
• Maximize the row hit ratio
• Maximize the parallelism
• Available parallelism in memory systems
• Channel
• Bank
• Rank
14
Address Mapping
• Channel-level parallelism
• There are no restrictions from the perspective of the
DRAM memory system on requests issued to different
logical channels
• Mapping consecutive cache lines to different channels
maximizes the parallelism of sequential accesses
• Mapping nearby cache lines to the same row maximizes
the row hit chances
15
Address Mapping
• Rank-level and bank-level parallelism
• Consecutive memory accesses can proceed in parallel to
different ranks or different banks
• But in general, scheduling consecutive accesses to
different banks of a given rank is more efficient than to
different ranks
• Because of the need of rank-to-rank switching latency, tRTRS
16
Address Mapping Examples
Virtual address: 36 bits Physical memory: 4 GB # Banks per rank: 8
Virtual page: 4 KB # Channels: 2 # Columns per row: 8 K
# Bytes per cache line: 64 # Rank per ch. : 2 # Bytes per column: 1
24 12
Virtual address VPN
(TLB)
20 12
Physical address PPN
32
DRAM address
Channel/rank/bank/row/col
addresses
17
Baseline Close-Page Mapping
Virtual address: 36 bits Physical memory: 4 GB # Banks per rank: 8
Virtual page: 4 KB # Channels: 2 # Columns per row: 8 K
# Bytes per cache line: 64 # Rank per ch. : 2 # Bytes per column: 1
24 12
Virtual address VPN
(TLB)
20 12
Physical address PPN
14 7 131 6
DRAM address
18
Baseline Open-Page Mapping
Virtual address: 36 bits Physical memory: 4 GB # Banks per rank: 8
Virtual page: 4 KB # Channels: 2 # Columns per row: 8 K
# Bytes per cache line: 64 # Rank per ch. : 2 # Bytes per column: 1
24 12
Virtual address VPN
(TLB)
20
Physical address PPN
14 13 7 1 6
DRAM address
14 13 7 1 6
DRAM address
22
Optimization Goal
• Performance (delay)
• Sum of execution times for all involved programs
• Energy-delay product
• Sum of EDPs for all involved programs
• Fairness
• Compute the slowdown for each program, relative to its
single-thread execution
• Fairness metric is the ratio of the max slowdown to the
min slowdown
23
Example Scheduler
• Preliminary schedulers
• First-come, first-serve (FCFS)
• Open-page, first-ready, first-come first serve (FR-FCFS)
• Close-page
• Power-down
• First-ready-round-robin (FRRR)
• Credit-fair
• MLP-aware
• PAR-BS
24
First-Come First-Serve (FCFS)
• Algorithm
• Read queue is ordered by request arrival time
• Every cycle, the scheduler scans the read queue
sequentially until it finds an instruction that can issue in
the current cycle
• When the write queue size exceeds a high water mark,
writes are drained similarly until a low water mark is
reached
• Writes are also drained if there are no pending reads
25
First-Come First-Serve (FCFS)
START
END
26
Close-Page
• Algorithm
• Mainly based on FR-FCFS
• In every idle cycle, the scheduler issues precharge
operations to banks that last serviced a column
read/write
27
Close-Page
START
END
28
Power-Down
• Algorithm
• Issues powerdown commands in every idle cycle
29
Power-Down
START
END
30
First-Ready-Round-Robin
• Algorithm
• First tries to issue any open row hits with the “correct”
thread-id (as defined by the current round robin flag)
• Then other row hits
• Then row misses with the “correct” thread-id
• Finally, a random request
• Effects
• Combine the benefits of open row hits with the fairness
of a round-robin scheduler
31
Credit-Fair
• Algorithm
• For every channel, this algorithm maintains a set of counters
for credits for each thread
• When scheduling reads, the thread with the most credits is
chosen
• Reads that will be open row hits get a 50% bonus to their
number of credits for that round of arbitration
• When a column read command is issued, that thread’s total
number of credits for using that channel is cut in half
• Each cycle all threads gain one credit
• Write queue draining happens in an FR-FCFS manner
• Effects
• Threads with infrequent DRAM reads will store up their
credits for many cycles so they will have priority when they
need to use them
32
MLP-Aware
• Algorithm
• Assumes that threads with many outstanding misses
(high memory level parallelism, MLP) are not as limited
by memory access time
• Prioritizes requests from low-MLP threads over those
from high-MLP threads
• To support fairness, a request’s wait time in the queue is
also considered
• Writes are handled as in FCFS, with appropriate high and
low water marks
33
Parallelism Aware Batch Scheduling
(PAR-BS)
• Recall FR-FCFS
• Exploit the latency benefit if successive requests hit the
same row buffer
PARBS
• Improve average latency and fairness among
threads
PAR-BS
• More sophisticated scheduling policy
• Improve both fairness and speedup
• Major policies
• Batch formation
• Request prioritization
• Thread ranking
• Misc.
Policy: Batch Formation
Policy: Request Prioritization
Policy: Thread Ranking