04 - Cache Memory
04 - Cache Memory
Computer Organization
and Architecture
7th Edition
Chapter 4
Cache Memory
Characteristics
• Location
• Capacity
• Adressable units (2A=N) (A:Adr.Line bits)
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical characteristics
• Organisation
Location
• CPU
• Internal (main)
• External (secondary)
Capacity
• Word size
—The natural unit of organisation
• Number of words
—or Bytes
Unit of Transfer
• Internal
—Usually governed by data bus width
• External
—Usually a block which is much larger than a
word
• Addressable unit
—Smallest location which can be uniquely
addressed
—Word internally
Access Methods (1)
• Sequential
—Start at the beginning and read through in
order
—Access time depends on location of data and
previous location
—e.g. tape
• Direct
—Individual blocks have unique address
—Access is by jumping to vicinity plus
sequential search
—Access time depends on location and previous
location
—e.g. disk
Access Methods (2)
• Random
—Individual addresses identify locations exactly
—Access time is independent of location or
previous access
—e.g. RAM
• Associative
—Random in nature. Data is located by a
comparison with contents of a portion of the
store
—Access time is independent of location or
previous access
—e.g. cache
Performance
• Access time
—Time between presenting the address and
getting the valid data (Random type), time to
position read-write mechanism (non random)
• Memory Cycle time
—Time may be required for the memory to
“recover” before next access
—Cycle time is access + recovery
• Transfer Rate
—Rate at which data can be moved (for random
access =1/cycle time)
Method of Accessing Units of Data
Sequential Direct Random
Associative
access access access
•Decraesing frequency of
access by the processor
•Increasing access time
•Decraesing cost/bit
•Increasing capacity
T1+T2
T1
Cache
Sizes of
Some
Processors
a Two values
separated by a
slash refer to
instruction and data
caches.
Mapping
Example
Direct Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words
or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory =
• 2s+ w/2w = 2s
• Number of lines in cache = m = 2r
• Size of tag = (s – r) bits
Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
—If a program accesses 2 blocks that map to
the same line repeatedly, cache misses are
very high (trashing)
Associative Mapping
• A main memory block can load into any
line of cache
• Memory address is interpreted as tag and
word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Cache searching gets expensive
Fully Associative Cache Organization
Associative
Mapping Example
+
Associative
Mapping
Example
Associative Mapping
Address Structure
s w
Word
Tag 22 bit 2 bit
s
Set Associative Mapping
Example
• 13 bit set number
• Block number in main memory is modulo
213
• 000000, 008000, ..., FF8000 map to same
set
k - Way Set Associative Cache
Organization
Set Associative Mapping
Address Structure
Word
Tag 9 bit Set 13 bit 2 bit
Once the cache has been filled, when a new block is brought
into the cache, one of the existing blocks must be replaced
For direct mapping there is only one possible line for any
particular block and no choice is possible
If at least one write operation has been A more complex problem occurs when
performed on a word in that line of the multiple processors are attached to the
cache then main memory must be same bus and each processor has its
updated by writing the line of cache out own local cache - if a word is altered in
to the block of memory before bringing one cache it could conceivably
in the new block invalidate a word in other caches
Write through
• All writes go to main memory as well as
cache
• Multiple CPUs can monitor main memory
traffic to keep local (to CPU) cache up to
date
• Lots of traffic
• Slows down writes
Write back
• Updates initially made in cache only
• Update bit for cache slot is set when
update occurs
• If block is to be replaced, write to main
memory only if update bit is set
• Other caches get out of sync
• I/O must access main memory through
cache (portions of memory are invalid)
• 15% of memory references are writes
(in HPC %33 - % 50)
Possible approaches to cache coherency:
•Bus watching with write through
Each cache controller monitors the address
lines to detect write operations to memory
by other bus masters
•Hardware transparency
Additional hardware is used to ensure that
all updates to memory are reflected to all
caches
•Noncacheable memory
More than one processor share a portion of
memory which is designed to be
noncacheable.
Line Size
•As the block size increases from very small to
larger sizes, the hit ratio will at first increase
(principle of locality)
•The hit ratio will begin to decrease, as the block
becomes even bigger (the probability of reusing
the newly fetched information becomes less than
the one that has to be replaced)
•Larger blocks reduce the number of blocks that
fit into cache
•As a block becomes larger, each additional word
is farther from the requested one.
Line Size
When a block of
data is retrieved Two specific effects
and placed in the come into play:
cache not only the As the block size • Larger blocks reduce the
desired word but increases more number of blocks that fit into a
also some number useful data are cache
• As a block becomes larger
of adjacent words brought into the each additional word is farther
are retrieved cache from the requested word
The on-chip cache reduces the processor’s external bus activity and
speeds up execution time and increases overall system performance
When the requested instruction or data is found in the on-chip cache, the bus
access is eliminated
On-chip cache accesses will complete appreciably faster than would even zero-
wait state bus cycles
During this period the bus is free to support other transfers
Two-level cache:
Internal cache designated as level 1 (L1)
External cache designated as level 2 (L2)
Potential savings due to the use of an L2 cache depends on the hit rates
in both the L1 and L2 caches
The use of multilevel caches complicates all of the design issues related
to caches, including size, replacement algorithm, and write policy
Hit Ratio (L1 & L2)
For 8 Kbyte and 16 Kbyte L1
Split Caches
• Recently, it has become common to split the
cache into two: one dedicated to instructions, one
dedicated to data.
• Two potential advantages of unified cache:
- It balances the load between instruction and
data fetches automatically.
-Only one cache needs to be designed and
implemented
• The key advantage of split cache is that it
eliminates contention between instruction
fetch/decode unit end ececution unit in
superscalar machines (parallel instruction
execution and the prefetching of predicted future
instructions, pipelining)
Pentium 4 Cache
• 80386 – no on chip cache
• 80486 – 8k using 16 byte lines and four way set
associative organization
• Pentium (all versions) – two on chip L1 caches
— Data & instructions
• Pentium III – L3 cache added off chip
• Pentium 4
— L1 caches
– 8k bytes
– 64 byte lines
– four way set associative
— L2 cache
– Feeding both L1 caches
– 256k
– 128 byte lines
– 8 way set associative
— L3 cache on chip
Intel Cache Evolution Processor on which feature
Problem Solution first appears
External memory slower than the system bus. Add external cache using faster 386
memory technology.
Increased processor speed results in external bus becoming a Move external cache on-chip, 486
bottleneck for cache access. operating at the same speed as the
processor.
Internal cache is rather small, due to limited space on chip Add external L2 cache using faster 486
technology than main memory