Lec8 Memory
Lec8 Memory
Memories: Review
• SRAM:
– value is stored on a pair of inverting gates
– very fast but takes up more space than DRAM (4 to 6 transistors)
• DRAM:
– value is stored as a charge on capacitor (must be refreshed)
– very small but slower than SRAM (factor of 5 to 10)
2
Exploiting Memory Hierarchy
SRAM access times are 2 - 25ns at cost of $100 to $250 per Mbyte.
DRAM access times are 60-120ns at cost of $5 to $10 per Mbyte. 1997
Disk access times are 10 to 20 million ns at cost of $.10 to $.20 per Mbyte.
Increasing distance
Level 1
from the CPU in
access time
Level n
Locality
• If an item is referenced,
4
Cache
• Two issues:
– How do we know if a data item is in the cache?
– If it is, how do we find it?
• Our first example:
– block size is one word of data
– "direct mapped"
e.g., lots of items at the lower level share locations in the upper level
C a ch e
000
001
010
011
111
100
101
110
00 00 1 0 01 01 01 00 1 0 11 01 10 00 1 1 01 01 1 1 00 1 11 1 01
M e m ory
6
Direct Mapped Cache
In de x V alid T ag D a ta
0
1
2
10 21
10 22
10 23
20 32
16 12 2 Byte
Hit Tag Data
offset
Index Block offset
16 bits 128 bits
V Tag Data
4K
entries
16 32 32 32 32
Mux
32
8
Hits vs. Misses
• Read hits
– this is what we want!
• Read misses
– stall the CPU, fetch block from memory, deliver to cache, restart
• Write hits:
– can replace data in cache and memory (write-through)
– write the data only into the cache (write-back the cache later)
• Write misses:
– read the entire block into the cache, then write the word
Hardware Issues
CPU CPU C PU
M ultiplexor
Cache Cache
Cache
B us Bus Bus
10
Performance
35%
30%
25%
Miss rate
20%
15%
10%
5%
0%
4 16 64 256
Block size (bytes) 1 KB
8 KB
16 KB
64 KB
256 KB
Performance
• Simplified model:
12
Set Associative Caches
13
1 1 1
Tag Tag Tag
2 2 2
14
Decreasing miss ratio with associativity
One-way set associative
(direct mapped)
Block Tag Data
0
Two-way set associative
1
Set Tag Data Tag Data
2
3 0
4 1
5 2
6 3
An implementation
A d dress
31 30 12 11 10 9 8 3 2 1 0
22 8
253
254
255
22 32
4 - to - 1 m u ltip le x o r
H it D a ta
16
Performance
15%
12%
9%
Miss rate
6%
3%
0%
One-way Two-way Four-way Eight-way
Associativity 1 KB 16 KB
2 KB 32 KB
4 KB 64 KB
8 KB 128 KB
17
• Advantages:
– Miss ratio decreases as associativity increases
• Disadvantages
– Extra memory needed for extra tag bits in cache
– Extra time for associative search
18
Block Replacement Policies
19
Example
Cache size = 4 one word blocks
Replacement Policy = LRU
Sequence of memory references 0,8,0,6,8
Set associativity = 4 (Fully Associative); Number of Sets = 1
0 M 0
8 M 0 8
0 H 0 8
6 M 0 8 6
8 H 0 8 6
20
Example cont’d
Cache size = 4 one word blocks
Replacement Policy = LRU
Sequence of memory references 0,8,0,6,8
Set associativity = 2 ; Number of Sets = 2
0 M 0
8 M 0 8
0 H 0 8
6 M 0 6
8 M 8 6
21
Example cont’d
Cache size = 4 one word blocks
Replacement Policy = LRU
Sequence of memory references 0,8,0,6,8
Set associativity = 1 (Direct Mapped Cache)
Address Hit/Miss 0 1 2 3
0 M 0
8 M 8
0 M 0
6 M 0 6
8 M 8 6
22
Decreasing miss penalty with multilevel caches
• Example:
– CPI of 1.0 on a 500Mhz machine with a 5% miss rate, 200ns DRAM access
– Adding 2nd level cache with 20ns access time decreases miss rate to 2%
23
100
Im p r o v e m e n t f a c t o r
10
1
0 2
8 8 84 86 8
8
9
0
9
2 94 9
6
19 19 19 19 19 19 19 19 19
Year C P U ( fa s t)
C P U ( s lo w )
DRAM
24
14%
12%
10%
Miss rate per type
8%
6%
4%
2% Capacity
0%
1 2 4 8 16 32 64 128
25
Virtual Memory
• Main memory can act as a cache for the secondary storage (disk)
Disk addresses
• Advantages:
– illusion of having more physical memory
– program relocation
– protection
26
Recall: Each MIPS program has an address
space of size 232 bytes
$sp 7fff ffff hex Stack
Dynamic data
Text
pc 0040 0000 hex
Reserved
0
27
Virtual address
31 30 29 28 27 15 14 13 12 11 10 9 8 3210
Translation
29 28 27 15 14 13 12 11 10 9 8 3210
Physical address
28
Page Tables
Virtual page
number
Page table
Physical memory
Physical page or
Valid disk address
1
1
1
1
0
1
1
0
1 Disk storage
1
0
1
29
Page Tables
Virtual address
31 30 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0
20 12
Page table
18
If 0 then page is not
present in memory
29 28 27 15 14 13 12 11 10 9 8 3 2 1 0
Physical address
30
Making Address Translation Fast
TLB
Virtual page Physical page
number Valid Tag address
1
1 Physical memory
1
1
0
1
Page table
Physical page
Valid or disk address
1
1
1 Disk storage
1
0
1
1
0
1
1
0
1
31
Virtual addre ss
TLB acce ss
TLB miss No Y es
TLB hit?
exception Ph ysical a ddre ss
No Y es
W rite ?
Try to read da ta
from cache No Write a cc ess Y es
bit on?
Deliver da ta
to the C PU
32
Modern Systems
• Very complicated memory systems:
Characteristic Intel Pentium Pro PowerPC 604
Virtual address 32 bits 52 bits
Physical address 32 bits 32 bits
Page size 4 KB, 4 MB 4 KB, selectable, and 256 MB
TLB organization A TLB for instructions and a TLB for data A TLB for instructions and a TLB for data
Both four-way set associative Both two-way set associative
Pseudo-LRU replacement LRU replacement
Instruction TLB: 32 entries Instruction TLB: 128 entries
Data TLB: 64 entries Data TLB: 128 entries
TLB misses handled in hardware TLB misses handled in hardware
Some Issues
• Trends:
– synchronous SRAMs (provide a burst of data)
– redesign DRAM chips to provide higher bandwidth or processing
– restructure code to increase locality
– use prefetching (make cache visible to ISA)
34