Ch 4A_cache Memory (2)
Ch 4A_cache Memory (2)
1
The memory system
• The memory unit is an essential component in
any digital computer since it is needed for
storing programs and data
• Not all accumulated information is needed by
the CPU at the same time
• Therefore, it is more economical to use low-
cost storage devices to serve as a backup for
storing the information that is not currently
used by CPU
Characteristics of Computer Memory
• Location
• Capacity
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical
characteristics
• Organization
Location
• Associative
– This is a random access type of memory that enables one to make a
comparison of desired bit locations within a word for a specified match
– Access time is independent of location or previous access
– a word is retrieved based on a portion of its contents rather than its address
– e.g. many cache systems
Performance
• Three Performance parameters are used
• Access time (TA)
– the time it takes to perform a read or write operation
• Memory Cycle time
– Time may be required for the memory to “recover”
before next access
– Cycle time = recovery time + access time
• Transfer Rate (R = bits / second)
– rate at which data can be transferred into or out of a
memory unit. For random-access memory, it is equal
to 1/(cycle time).
Physical Types
• Semiconductor
• RAM (DRAM or SRAM)
• Magnetic
– Disk & Tape
• Optical
– CD & DVD
• Others
– Bubble
– Hologram
Physical Characteristics
• Volatility
– Volatile memory: without power, memory is
erased
• Magnetic-surface memories are nonvolatile
• Semiconductor can be volatile or non-volatile
• Erasable
– Non-erasable memory cannot be altered
– E.g. ROM = Read-Only Memory
Organization
• For random-access memory, the organization is a key design
issue
• Organization refers to the physical arrangement of bits to
form words.
The Memory Hierarchy
– decreasing cost/bit,
– Increasing capacity,
– increasing access time/ slower access time
– Decreasing frequency of access of the memory by the
processor
The main memory occupies a central position by being able to
communicate directly with the CPU and with auxiliary memory
devices through an I/O processor.
Cache Memory
• Cache is a small amount of fast memory
• Purpose is to exploit Locality of Reference
• Contains copies of portions of main memory
• Sits between normal main memory and CPU
• May be located on CPU chip or module
Single cache
Cache and Main Memory
Cache operation - overview
• CPU requests a word (contents of a memory location)
• Check cache for this word
• If present (= cache hit), get from cache (fast)
• If not present (= cache miss), read a block from main memory,
that contains this word, to cache
• Simultaneously, deliver word to CPU
• Cache includes tags to identify which block of main memory is
in each cache slot
Cache Read Operation
Cache/Main Memory Structure
• Because there are fewer cache lines than main memory blocks, an
algorithm is needed for mapping main memory blocks into cache
lines
• Three techniques can be used: direct, associative, and set associative
Direct Mapping pros & cons
• Each block of main memory maps to only one cache line
– i.e. if a block is in cache, it must be in one specific
place
• Simple
• Inexpensive
• Fixed location for given block
– A given block always maps to the same line in cache
– If a program accesses 2 blocks that map to the same
line repeatedly, cache misses are very high
– This is called thrashing
Associative Mapping
• A main memory block can load into any line of cache
• The line is determined by a replacement algorithm
• Memory address is interpreted as tag and word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
– This is done in parallel for all tags
– Requires complex circuitry
• Cache searching gets expensive
Set Associative Mapping
• Combination of direct and associative mapping
• Cache is divided into a number of sets
• Each set contains a number of lines
• Use direct (mod) mapping to identify a set, replacement algorithm
within the set
• Thus a block maps to any line in a given set
– e.g. Block B can be in any line of set i
• E.g. 2 lines per set
– Called 2-way associative mapping
– A given block can be in one of 2 lines in a particular
set
Replacement Algorithms for:
Associative & Set Associative Mapping
• Once the cache has been filled, when a new block is brought into the cache, one
of the existing blocks must be replaced
• How do we determine which block in cache should be replaced?
• For the associative and set associative techniques , a replacement algorithm is
needed.
• Least Recently used (LRU)
– replace the block that has not had a hit the longest time
• First in first out (FIFO)
– replace block that has been in cache longest
• Least frequently used (LFU)
– replace block which has had fewest hits
• Random
– isn’t significantly inferior
Cache Write Policy
• In addition to determining which victim to select for replacement,
designers must also decide what to do with so-called dirty blocks of cache,
or blocks that have been modified.
• Need to insure coherence of cache and main memory
• Must not overwrite a cache block with a main memory block unless main
memory is up to date
• Main memory might be out of date for a number of reasons
– Multiple CPUs may have individual caches
– I/O may address main memory directly
• Techniques
– Write through
– Write back
Write through
• when the processor writes into a cache block, the
corresponding frame in the primary memory is also written
with the new data.
• write-through policy updates both the cache and the main
memory simultaneously on every write
• This is slower than write-back, but ensures that the cache is
consistent with the main system memory.
• All writes go to main memory as well as cache
• Lots of traffic
• Slows down writes
Write back
• when the processor writes something into a cache block, that
cache block is tagged as a dirty block, using a 1-bit tag. Before
a dirty block is replaced by a new frame, the dirty block is
copied into the primary memory
• Minimize memory writes
• Updates initially made in cache only
• UPDATE bit for a cache slot is set when update occurs
Line Size
• Larger line (block) sizes and therefore fewer lines
can be used
– Beneficial because hit ratio is increased
– Too large line size will replace useful existing cache
content with less useful (farther words in large block)
– Probability of needing data in cache becomes less than
probability of needing data not in cache
– This will reduce hit ratio
• There is no “universal optimum” between block size and hit ratio
– Depends on program characteristics
– 8 to 32 byte block size seems to work well in most
situations
Number of caches
• When caches were originally introduced, the typical system had a
single cache
• recently, the use of multiple caches has become the norm.
• Two aspects of this design issue
– multilevel caches (L1 vs L2)
– unified versus split caches
• multilevel caches
– two-level cache, with the internal cache designated as
level 1 (L1) and the external cache designated as level 2
(L2)
– on-chip cache (internal cache) reduces the processor’s
external bus activity and therefore speeds up execution
times and increases overall system performance
Unified vs. Split Cache
• Unified cache: data and instructions are cached in the same cache
• Split cache: separate caches for data and instructions
• Advantages of unified cache
– Balances possible imbalance between amount of data
and instructions in a program
– Only one cache needs to be manufactured
• Advantage of split cache
– Eliminates contention between INS fetch and execute
units
– Supports pipelining and speculative execution