Memory Hierarchy Design
Memory Hierarchy Design
Chapter 2
Memory Hierarchy Design
Increase in
memory Increase in
requests/sec memory
acesses/sec
= 409.6 GB/s!
MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024
CACHE
MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024
MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024
CACHE
MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024
MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024
CACHE
017 018 019 020
CACHE MISS!
MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024
Block 0
Block 1
Block 2
CACHE
Block 3
MEMORY
Block 2 009 010 011 012
Block 3 013 014 015 016
Block 4 017 018 019 020
Block 5 021 022 023 024
MEMORY
Block 2 009 010 011 012
Block 3 013 014 015 016
Block 4 017 018 019 020
Block 5 021 022 023 024
MEMORY
Block 2 009 010 011 012
Block 3 013 014 015 016
Block 4 017 018 019 020
Block 5 021 022 023 024
Set 0: B4
11101111 1000
1 10001001 1001
1010
Set 1: B5
CACHE 1011
01110010 1100
Set 0: B6
<2> <1> <1> 10100011 1101
1110
Block Set 1: B7
1111
TAG Index
Offset <8>
address
INDEX: selects set
TAG: used to check for cache hit if valid bit (VB) is 1.
= Set 0: B4
11101111 1000
1 10001001 1001
1010
CACHE HIT Set 1: B5
CACHE 1011
01110010 1100
Set 0: B6
<2> <1> <1> 10100011 1101
1110
Block Set 1: B7
1111
TAG Index
Offset <8>
address
INDEX: selects set
TAG: used to check for cache hit if valid bit (VB) is 1.
= Set 0: B4
11101111 1000
1 10001001 1001
1010
CACHE MISS Set 1: B5
WHY?
CACHE 1011
01110010 1100
Set 0: B6
<2> <1> <1> 10100011 1101
1110
Block Set 1: B7
1111
TAG Index
Offset <8>
address
INDEX: selects set
TAG: used to check for cache hit if valid bit (VB) is 1.
≠ Set 0: B4
11101111 1000
1 10001001 1001
1010
CACHE MISS Set 1: B5
CACHE 1011
01110010 1100
Set 0: B6
<2> <1> <1> 10100011 1101
1110
Block Set 1: B7
1111
TAG Index
Offset <8>
address
INDEX: selects set
TAG: used to check for cache hit if valid bit (VB) is 1.
CACHE
Block 1 017 018 019 020
CACHE HIT!
Block 2
Block 3
018
MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024
CACHE
Block 1 017 018 019 020
CACHE HIT!
Block 2
Block 3
MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024
CACHE
Block 1 017 018 019 020
CACHE MISS!
Block 2
Block 3
018
MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024
CACHE
Block 1 021 022 023 024
Block 2
Block 3
MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024
IC*(0.014*26+(0.336+0.35)*1+0.006*51+(0.144+0.15)*26)*CC = 9 * IC * CC
IC*(0.007*26+(0.334+0.35)*1+0.003*51+(0.147+0.15)*26)*CC = 8.741 * IC * CC
IC*(0.014*11+(0.336+0.35)*1+0.006*21+(0.144+0.15)*11)*CC = 4.2 * IC * CC
The figure on slide 36 shows that the access time for the 4-way L1 cache is approximately 1.4 that of
2-way associative.
Avg. access time for 4-way associative = HitTime + MissRate * MissPenalty
= 1.4 * 1 + 0.037 * 15 = 1.955
No write
buffering
Write buffering
Register prefetch
Loads data into register
Cache prefetch
Loads data into cache
Cycle time
Minimum time between unrelated request
to memory
SRAMs don’t need to refresh => cycle
time and access times are very close.
Some optimizations:
Multiple accesses to same row (using row buffer)
Synchronous DRAM (SDRAM)
Added clock to DRAM interface
Burst mode with critical word first
Wider interfaces (DDR2 and DDR3)
Double data rate (DDR): transfer data at the rising and falling
edges of the clock cycle.
Multiple independent banks on each DRAM device =>
address = (bank #, row #, column #)
Role of architecture:
Provide user mode and supervisor mode
Protect certain aspects of CPU state
Provide mechanisms for switching between user
mode and supervisor mode
Provide mechanisms to limit memory accesses
Provide TLB to translate addresses