Large and Fast: Exploiting Memory Hierarchy
Large and Fast: Exploiting Memory Hierarchy
Architecture
L1 L1
cache cache
L2 cache
to mm
Memory hierarchy operation
main
(4) Else, get it from I/O (Chapter 8)
memory
block
Searching the cache
Need a way to determine whether the
desired instruction or data is held in the
cache
Need a scheme for replacing blocks when a
new block needs to be brought in on a miss
block
Cache organization alternatives
Direct mapped: each block can be placed in
only one cache location
Set Set # 0
Searching a direct mapped cache
Need log2 number of sets of the
Set
address bits (the index) to
select the block location
block offset used to select the
desired byte, half-word, or
word within the block
Remaining bits (the tag) used
to determine if this is the
desired block or another that
assume data cache shares the same cache location
with 16 byte blocks
8 sets
4 block offset bits
block
3 index bits tag index offset
25 tag bits memory address
Searching a direct mapped cache
Block is placed in the set index
Set
number of sets = cache
size/block size
?? ?? ?? ??
0 A 0 B 0 B 0 B
1 D 1 D 1 C 1 D
?? ?? ?? ??
Access pattern 2: A, A, B, A
0 A 0 A 0 B 0 A
1 1 1 1
?? ?? ?? ??
Reducing capacity misses
Increase the cache size
More cache blocks can be simultaneously held in the
cache
Drawback: increased access time
Block replacement policy
Determines what block to replace on a cache
miss to make room for the new block
Least recently used (LRU)
Pickthe one that has been unused for the longest time
Based on temporal locality
Requires ordering bits to be kept with each set
Too expensive beyond 4-way
Random
Pseudo-randomly pick a block
Generally not as effective as LRU (higher miss rates)
Simple even for highly associative organizations
Tradeoffs in increasing
page size are similar as
for cache block size
Virtual and physical addresses
The virtual addresses in your program are
translated during program execution into
the physical addresses used to address
memory
Some virtual addresses may refer to data
that is not in memory (not memory
resident)
Address translation
The virtual page number (vpn) part of the
virtual address is translated into a physical
page number (ppn) that points to the
desired page
The low-order page offset bits point to
which byte is being accessed within a page
The ppn + page offset form the physical
address
4GB of virtual
address space
location of first
byte in page
2pageoffset bytes
location of
addressed byte
in page
page
Address translation
Address translation is performed by the
hardware and the operating system (OS)
address is
offset into the page table register + vpn
page table
located in memory
requires loads/stores to
access!
The TLB: faster address
translation
Major problem: for each instruction or data
access we have to first access the page
table in memory to get the physical address
Solution: cache the address translations in a
Translation Lookaside Buffer (TLB) in
hardware
The TLB holds the ppns of the most recently
accessed pages
Hardware first checks the TLB for the
vpn/ppn pair; if not found (TLB miss), then
the page table is accessed to get the ppn
The ppn is loaded into the TLB for later
access