CS2115 Chapter-6
CS2115 Chapter-6
2023/2024 Sem A
Chapter 6: Memory
1
Memory
• By the word "memory", we mean
2
The Memory Wall
3
Principle of Locality
4
Memory Hierarchy
• Organize memory system into a hierarchy with faster (but
smaller) memory closer to processor
5
Memory Hierarchy
6
Cache Memory
7
Direct-Mapped Caches
Suppose the cache has 8 lines, the memory has 32 words,
Simplified Example
each cache line stores one word
data
address
8
Direct-Mapped Caches
Suppose the cache has 8 lines, the memory has 32 words,
Simplified Example
each cache line stores one word
Cache_Index = Memory_Address mod Cache_Size
Cache_Tag = Memory_Address/Cache_Size
Index Tag Contents
9
Direct-Mapped Caches
• When access the cache with 2w lines:
– Index into cache with W address bits (the index bits) A cache line is invalid if its value is
inconsistent with the memory any
– Read out valid bit, tag, and data longer, e.g., because the data is
– If valid bit == 1 and tag matches -> Cache Hit modified by other elements (DMA,
other cores…)
– Otherwise -> Cache Miss
– Offset bits: number of memory blocks in each cache line
Valid bit Tag(27 bits) Data (32 bits)
Example: 8-blocks Direct-Mapped cache (W=3) 0
1
2
32-bit BYTE address 3
4
(suppose each cache line stores one word) 5
6
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0
7
OFFSET: 0x0
...
...
...
Hit!
Read from cache: 0x42424242 63 1 0x000058 0xF7324A32
11
Direct-Mapped Caches
64-line direct-mapped cache → 64 indexes → 6 index bits
OFFSET: 0x0
...
...
...
Miss!
Load from memory: 0x12345678 63 1 0x000058 0xF7324A32
Tag updated to 0x000040
12
Direct-Mapped Caches
64-line direct-mapped cache → 64 indexes → 6 index bits
OFFSET: 0x0
...
...
...
HIit! Read Data: 0x42424242
63 1 0x000058 0xF7324A32
Would Read Mem [0x00004008] be hit?
INDEX:0x2 → tag mismatch → miss
What are the addresses of data in indexes 0, 1, and 2?
TAG:0x58 → 0000 0000 0000 0000 0101 1000 iiii ii00 (substitute iiiiii by index 0, 1, 2)
→ 0x5800,0x5804,0x5808
13
Fully-Associative Cache
• A memory block can be stored in any cache line
Cache Memory
000 00000 01000 10000 11000
001 00001 01001 10001 11001
010 00010 01010 10010 11010
011 00011 01011 10011 11011
100 00100 01100 10100 11100
101 00101 01101 10101 11101
110 00110 01110 10110 11110
111 00111 01111 10111 11111
14
Fully-Associative Cache
• A memory block can be stored in any cache line
Example:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0
32-bit BYTE address
each cache line stores 1 word Tag bits Offset
bits
• To check if an access is a cache hit
cache line 1 cache line 2 cache line N
– Need to check all cache lines
tag offset
15
Fully-Associative Cache
• When a cache miss occurs, we need to load the missed data
from memory to cache
• Some data currently stored in cache will be replaced
• Replacement policy decides which data to be replaced
• Many different policies
– Random, FIFO (First-In-First-Out), LRU (Least Recently Used) … …
16
Fully-Associative Cache
• FIFO replacement policy
– Each cache line has an age young a
miss
f
hit
f
b f a b a
– When miss, loaded data is the youngest c b b
old d c c
• Other data’s age increased by 1
– When hit, no change
• LRU replacement policy
miss hit
– Same to FIFO when miss young a f b
b f a b f
– When hit, the accessed data becomes c b a
old d c c
the youngest
– Previously younger ones’ age increase by 1
17
FIFO Algorithm
• FIFO (First-In-First-Out): replaces earliest loaded page
• Example:
Sequence
18
LRU Algorithm
• LRU (Least Recently Used): replaces pages that have not been
used for the longest time
• Example:
Sequence
19
Direct-Mapped v.s. Fully-Associative
• Fully-associative cache is more flexible
– Higher utilization of the resource
• a memory block can be stored in any cache line, so lower chance to conflict
– More difficult to search
Cache Memory
• Need to search the entire cache 000 00000 01000 10000 11000
20
Set-Associative Cache
• Can we combine the easy searching of direct-mapped and the
flexibility of fully-associative caches?
• Yes: set-associative
— Each cache block can go in only 1 set (same modulo addressing)
— But, each set can have multiple blocks (associative searching)
• Easy to search:
— You know which set based on the address
— Only have to check the few cache blocks in the set
• Flexible:
— Can put each cache block anywhere in the set
— Reduces (but doesn’t eliminate) conflict misses
21
Simplified Example
Set-Associative Cache
0 0
1 1
Set 0
— 8 → 001000 → Set 0 24
25
— 9 → 001001 → Set 1 26
27
— 18→010010 → Set 2 28
29
30
31
Tag bits Memory 22
Set-Associative Cache
• Address: suppose the cache has 23 = 8 sets
32-bit BYTE address 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0
23
Set-Associative Cache
Tag Index Byte Offset (2 bits)
= = = =
32 32 32 32
4 to 1 multiplexor
hit Data
24
Quantifying Cache Performance
• When a perfect cache is considered:
CPU time = (CPU clock cycles + Memory stall cycles) × Clock cycle time
25
Example #1
• Assume that the CPI (cycle per instruction) of a computer is 1. If 30% of the instructions access data
memory, miss penalty is 100 cycles and the overall miss rate of accesses to both data and
instructions is 5%, how much faster the computer be if all instructions were cache hits?
• If all memory accesses are cache hits
CPU timeideal = CPU clock cycles × Clock cycle time IC (instruction count): the
= IC × CPI × Clock cycle time total number of instructions
= IC × 1.0 × Clock cycle time
• With cache misses
Memory stall cycles= IC × (1.0 + 0.3) × miss rate × miss penalty
= IC × 1.3 × 0.05 × 100 × Clock cycle time
CPU timeMemStall = (IC × 1.0 + IC × 6.5) × Clock cycle time
= 7.5 × IC × Clock cycle time
Speedup = 7.5
26
Memory Hierarchy
27
RAM
• RAM: Random-Access Memory
– allows data items to be read or written in almost the same amount of
time irrespective of the physical location of data inside the memory
(there are other types of memory which are not the case)
• Two major types:
– SRAM (static random-access memory)
– DRAM (dynamic random-access memory)
• SRAM and DRAM are both volatile memory
– The data is lost when powered off
28
SRAM vs DRAM
• SRAM (static random-access memory)
– Use flip-flops to store bits
– Usually used for Cache and registers
29
SRAM vs DRAM
30
Hard Disk Drive
• HDD: Hard Disk Drive
– electro-mechanical data storage device using magnetic storage and one
or more rigid rapidly rotating platters coated with magnetic material
– HDDs are a type of non-volatile memory (NVM), retaining stored data
even when powered off
– Used to store mass data
Solid State Disk
• SSD: Solid State Disk
– uses integrated circuit assemblies to store data persistently, typically
using flash memory
32
Flash Memory
• Toshiba developed flash memory in the 1980s
• Two main types of flash memory: NAND (for data) and NOR
(for code)
• Started to be heavily used because of digital cameras
• Exponential growth with smartphones/tablets
33
Comparison of SSD and HDD
2.0
1.5
1.0
0.5
0.0
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
SSD HDD
34
Basic NAND Flash Cell
• Store charges in the floating gate
• Threshold voltage (Vth) represents data
Floating-Gate Cell
Notices, high voltage is 0, low voltage is 1
35
Flash Cell Organization
BL BL BL BL BL BL
36
Read from Flash
Vpass = 5.0 V
3.0V Pass
3.8V (5V) 3.9V 4.8V Page 1
Vread = 2.5 V
3.5V Read
2.9V(2.5V)2.4V 2.1V Page 2
Vpass = 5.0 V
2.2V Pass
4.3V (5V) 4.6V 1.8V Page 3
Vpass = 5.0 V
3.5V Pass
2.3V (5V) 1.9V 4.3V Page 4
Control Gate e e e e e
Gate Oxide
e e e e Read voltage
Floating Gate
Tunnel Oxide 1 0
S D Vth
Substrate Erased Programmed
Floating-Gate Cell
38
Garbage Collection
• Three basic operations: read, write (page) and erase (block)
• Out-of-place update
• Garbage collection to reclaim space
– To get continuous space
39
SLC Threshold Voltage Distribution
• SLC: single level cell
– –
Vread –
Lower Voltage Higher Voltage
1 0
Threshold Voltage
40
MLC Threshold Voltage Distribution
• MLC: multi-level cell
V1 V2 V3
– – – –
– –
Lowest Voltage Highest Voltage
LSB 1 1 0 0
MSB 1 0 0 1
Threshold Voltage
41
Different Densities of Flash
eeeee e e e e e eeeee
V V1 V2 V3
1 0 1 1 0 0
Vth 1 0 0 1 Vth
Erased Programmed
V1 V2 V3 V4 V5 V6 V7
1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1
1 0 0 1 1 0 0 1
1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1
1 1 0 0 0 0 1 1
1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1
1 1 1 1 0 0 0 0 Vth Vth
1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0
42
Comparison among Different Types
Types of flash SLC MLC TLC QLC
bit/cell 1 2 3 4
P/E cycles (K) 100 3 1 <1
Read time (µs) 30 50 75 110
Program time (µs) 300 600 1000 2000
Erase time (µs) 1500 3000 4500 5000
Price ($/GB) 0.37 0.33 0.2 0.12
43
Exercise
• If you have 100$ budget to build the memory hierarchy for a
computer. The cost of SRAM is 10$/Mbyte, the cost of DRAM is
1$/Mbyte and the cost of Flash is 0.1$/Mbyte.
• There are 3 options:
– Option 1: 50$ for SRAM, 49$ for DRAM and 1$ for Flash
– Option 2: 10$ for SRAM, 20$ for DRAM and 70$ for Flash
– Option 3: 60$ for SRAM, 5$ for DRAM and 35$ for Flash
• Which option do you think is the best? Why?
44
Exercise
• In a flash memory, the high voltage (2.5V-5V) represents 0 and low voltage (0V-2.5V)
represents 1. Suppose currently the voltage in each cell is shown in the following
figure. What voltages should we add to page 1, 2, 3 and 4 respectively, in order to
read the value store in the cell in the blue circle in the picture?
45