Computer Architecture: Cache Memory
Computer Architecture: Cache Memory
Cache Memory
Computer Architecture 1
Outline
Cache Memory Introduction
Memory Hierarchy
Direct-Mapped Cache
Set-Associative Cache
Cache Performance
2
Introduction
Memory access time is important to performance!
Users want large memories with fast access times
ideally unlimited fast memory
principle of locality:
This principle states that programs access a relatively small
portion of their address space at any instant of time.
3
Levels of the Memory Hierarchy
CPU
4
Cache
Processor does all memory
Processor operations with cache.
words Miss – If requested word is not in
cache, a block of words containing
Cache the requested word is brought to
small, fast cache, and then the processor
request is completed.
memory
Hit – If the requested word is in
blocks cache, read or write operation is
performed directly in cache, without
accessing main memory.
Main memory Block – minimum amount of data
large, inexpensive transferred between cache and
(slow) main memory.
9
Direct-Mapped Placement
A block can only go into one place in the cache
The index number for block placement is usually given by some low-
order bits of block’s address.
10
Cache of 8 blocks Main Memory
0000xx
Cache 0001xx Two low order bits
define the byte in the
0010xx
Index Valid Tag Data word
0011xx
00 0100xx
01 0101xx
10 0110xx
11 0111xx
Q2: How do we find it?
1000xx
1001xx
1010xx
Use next 2 low order
Q1: Is it there? 1011xx
memory address bits
1100xx
– the index – to
Compare the cache 1101xx determine which
tag to the high order 2 1110xx cache block (i.e.,
memory address bits 1111xx modulo the number of
to tell if the memory blocks in the cache)
block is in the cache
(block address) modulo (# of blocks in the cache)
Direct-Mapped Cache
(block address) modulo (# of blocks in the
00000
00001
cache)
00100
00101
00110
Cache of 8 blocks
00111
01000
Block size = 1 word
01001
01010
tag
01011
01100
01101
01110
00 000
01111 10 001
11 010
10000
01 011
10001 01 100
10010
00 101
10011
10 110
10100 11 111
10101
10110
10111
11000
11001
11010
11011 cache address:
11100
11101 tag
11110
11111
Main index
11 101 → memory address
memory
00100
tag
01101
01110
01111
00 00
10000 11 01
10001 00 10
10010 10 11
10011
0 1
10100
10101
block offset
10110
10111
cache address:
11000
11001
tag
11010
11011
index
11100
Main block offset
11101
11110
11 10 1 → memory address
11111 memory
tag
01010 00
01011 00
01100 00
01101 00
00 000
01110 00
10 001
01111 00
11 010
01 011
10000 00
01 100
10001 00
00 101
10010 00
10 110 01010
10011 00
11 111
10100 00
10101 00
10110 00
10111 00
LRU block
11000 00
11001 00 cache address:
11010 00
11011 00 tag
11100 00
11101 00
11110 00 Main 11101 00 → memory address
11111 00
memory
byte offset
55:035 Computer Architecture and Organization 15
Two-Way Set-Associative Cache
00000 00
00001 00
00010 00 This block is needed
00011 00
00100 00
32-word word-addressable memory
index
tags
01001 00
01010 00
01011 00
01100 00
01101 00
01110 00 000 | 011
00
01111 00 100 | 001
01
110 | 101
10
10000 00 010 | 111
11
10001 00
10010 00
10011 00
10100 00
10101 00 LRU block
10110 00
10111 00
11000 00
11001 00 cache address:
11010 00
11011 00 tag
11100 00
11101 00 index
11110 00
11111 00
Main 111 01 00 → memory address
memory
byte offset
55:035 Computer Architecture and Organization 16
Page Replacement Algorithms
17
The FIFO Policy
Treats
page frames allocated to a
process as a circular buffer:
When the buffer is full, the oldest page is
replaced. Hence first-in, first-out:
A frequently used page is often the oldest,
so it will be repeatedly paged out by FIFO.
Simple to implement:
requires
only a pointer that circles through
the page frames of the process.
FIFO Page Replacement
A. Frank - P. Weisberg
Optimal Page Replacement
The Optimal policy selects for
replacement the page that will not be used
for longest period of time.
Impossible to implement (need to
know the future) but serves as a
standard to compare with the other
algorithms we shall study.
A. Frank - P. Weisberg
Optimal Page Replacement
A. Frank - P. Weisberg
The LRU Policy
A. Frank - P. Weisberg
Comparison of OPT with LRU
Example:
Comparison of FIFO with LRU
Cache Performance
When caches were originally introduced, the typical system had a single
cache. More recently, the use of multiple caches has become the norm.
Due to the typically slow bus speed and slow memory access time, this
results in poor performance. On the other hand, if an L2 SRAM (static RAM)
cache is used, then frequently the missing information can be quickly
retrieved. If the SRAM is fast enough to match the bus speed, then the data
can be accessed using a zero-wait state transaction, the fastest type of bus
transfer.
Some high performance systems also include additional L3 cache which sits
between L2 and main memory . It has different arrangement but principle
same.
The cache is placed both physically closer and logically closer to the CPU
than the main memory.