0% found this document useful (0 votes)
4 views

Ch 4A_cache Memory (2)

Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Ch 4A_cache Memory (2)

Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 38

Chapter 4

Computer Memory System

1
The memory system
• The memory unit is an essential component in
any digital computer since it is needed for
storing programs and data
• Not all accumulated information is needed by
the CPU at the same time
• Therefore, it is more economical to use low-
cost storage devices to serve as a backup for
storing the information that is not currently
used by CPU
Characteristics of Computer Memory
• Location
• Capacity
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical
characteristics
• Organization
Location

• Location refers to weather memory is internal


or external to the computer
• Internal
– Main memory (RAM), cache memory
• External
– Secondary memory – hard disks
– Removable media – ZIP, CD-ROM, Tape
Capacity
• Word size
– The natural unit of organization of memory
– For internal memory, capacity is expressed in terms of bytes (1 byte = 8 bits) or
words
– For example, on a 32 bit machine, the word is 32 bits long
– The word size determines the address space
– Common word lengths: 8, 16, 32, 64 bits
– External memory capacity is typically expressed in terms of
bytes.
• Addressable units are typically words
– N = 2A, where N = size of address space, A = address length
(in bits)
– On disks the addressable units are blocks or clusters
– E.g The most memory you can address with 32 bits is 4 GB.
Unit of Transfer
• For main memory, Unit of transfer is
– the number of bits read out of or written into
memory at a time.
– Determined by data bus width
– May be different from word or addressable unit
• For External memory
– Usually a block which is much larger than a word
Access Methods
• Sequential
– Start at the beginning and read through in order
– Access time depends on location of data and
previous location
– e.g. tape
• Direct
– Individual blocks or records have unique address based on physical
location.
– Access is accomplished by direct access to reach a general vicinity
plus sequential searching, counting, or waiting to reach the final
location. And access time is variable
– Access time depends on location and previous location, but much
faster than sequential access
– e.g. disk
Access Methods
• Random
– Each addressable location in memory has a unique, physically wired-in
addressing mechanism.
– The time to access a given location is independent of the sequence of prior
accesses and is constant
– Thus, any location can be selected at random and directly addressed and
accessed.
– e.g. main memory, some caches

• Associative
– This is a random access type of memory that enables one to make a
comparison of desired bit locations within a word for a specified match
– Access time is independent of location or previous access
– a word is retrieved based on a portion of its contents rather than its address
– e.g. many cache systems
Performance
• Three Performance parameters are used
• Access time (TA)
– the time it takes to perform a read or write operation
• Memory Cycle time
– Time may be required for the memory to “recover”
before next access
– Cycle time = recovery time + access time
• Transfer Rate (R = bits / second)
– rate at which data can be transferred into or out of a
memory unit. For random-access memory, it is equal
to 1/(cycle time).
Physical Types

• Semiconductor
• RAM (DRAM or SRAM)
• Magnetic
– Disk & Tape
• Optical
– CD & DVD
• Others
– Bubble
– Hologram
Physical Characteristics
• Volatility
– Volatile memory: without power, memory is
erased
• Magnetic-surface memories are nonvolatile
• Semiconductor can be volatile or non-volatile
• Erasable
– Non-erasable memory cannot be altered
– E.g. ROM = Read-Only Memory
Organization
• For random-access memory, the organization is a key design
issue
• Organization refers to the physical arrangement of bits to
form words.
The Memory Hierarchy

• The design constraints on a computer’s memory can be summed up


by three questions:
– How much? –capacity
– How fast? ----performance
– How expensive?----cost of memory
• There is a trade-off among the three key characteristics of memory:
capacity, access time, and cost
• The following relationships hold:
• Faster access time, greater cost per bit
• Greater capacity, smaller cost per bit
• Greater capacity, slower access time
Memory Hierarchy
• Computer Memory Hierarchy is a pyramid
structure that is commonly used to illustrate the
significant differences among memory types.
• The memory unit that directly communicate with
CPU is called the main memory
• Devices that provide backup storage are called
auxiliary memory
• The memory hierarchy system consists of all
storage devices employed in a computer system
from the slow by high-capacity auxiliary
memory to a relatively faster main memory, to
an even smaller and faster cache memory
Memory Hierarchy - Diagram
• Lower cost
• Bigger capacity
• Slower access /
Less frequent access
As one goes down the memory hierarchy, one finds

– decreasing cost/bit,
– Increasing capacity,
– increasing access time/ slower access time
– Decreasing frequency of access of the memory by the
processor
The main memory occupies a central position by being able to
communicate directly with the CPU and with auxiliary memory
devices through an I/O processor.
Cache Memory
• Cache is a small amount of fast memory
• Purpose is to exploit Locality of Reference
• Contains copies of portions of main memory
• Sits between normal main memory and CPU
• May be located on CPU chip or module

Single cache
Cache and Main Memory
Cache operation - overview
• CPU requests a word (contents of a memory location)
• Check cache for this word
• If present (= cache hit), get from cache (fast)
• If not present (= cache miss), read a block from main memory,
that contains this word, to cache
• Simultaneously, deliver word to CPU
• Cache includes tags to identify which block of main memory is
in each cache slot
Cache Read Operation
Cache/Main Memory Structure

For mapping purpose


M = 2n / K blocks in RAM
C << M blocks in cache

n= address length (in bits)


K=no of words in a block
C=cache block /lines
M=memory block
Cache Design
• Main memory consists of up to 2n addressable words, with each word
having a unique n-bit address.
• For mapping purposes, this memory is considered to consist of a number
of fixed-length blocks of K words each. That is, there are M = 2n/K
blocks in main memory.
• The cache consists of m blocks, called lines. Each line contains K words,
plus a tag of a few bits.
• The length of a line, not including tag and control bits, is the line size.
• The line size may be as small as 32 bits, with each “word” being a single
byte; in this case the line size is 4 bytes
• The number of lines is considerably less than the number of main
memory blocks (m<<M).
• Because there are more blocks than lines, an individual line cannot be
uniquely and permanently dedicated to a particular block. Thus, each line
includes a tag that identifies which particular block is currently being
stored.
Typical Cache Organization
Elements of Cache Design

Elements of Cache Design


Cache Addressing
• Where does cache sit?
– Between processor and virtual memory management unit
– Between MMU and main memory
• Logical cache (virtual cache) stores data using virtual addresses
– Processor accesses cache directly, not thorough physical cache
– Cache access faster, before MMU address translation
– Virtual addresses use same address space for different applications
• Must flush cache on each context switch
• Physical cache stores data using main memory physical addresses
• For reads to and writes from main memory, a
hardware memory management unit (MMU) translates
each virtual address into a physical address in main
memory
Logical
and
Physical
Caches
Cache Size
• Cost
– Larger cache is more expensive
• Speed
– Larger cache is faster ,But only up to a point
– Beyond which access slows down
– Because checking cache for data takes more time
– large caches tend to be slightly slower than small
ones
Mapping Function

• Because there are fewer cache lines than main memory blocks, an
algorithm is needed for mapping main memory blocks into cache
lines
• Three techniques can be used: direct, associative, and set associative
Direct Mapping pros & cons
• Each block of main memory maps to only one cache line
– i.e. if a block is in cache, it must be in one specific
place
• Simple
• Inexpensive
• Fixed location for given block
– A given block always maps to the same line in cache
– If a program accesses 2 blocks that map to the same
line repeatedly, cache misses are very high
– This is called thrashing
Associative Mapping
• A main memory block can load into any line of cache
• The line is determined by a replacement algorithm
• Memory address is interpreted as tag and word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
– This is done in parallel for all tags
– Requires complex circuitry
• Cache searching gets expensive
Set Associative Mapping
• Combination of direct and associative mapping
• Cache is divided into a number of sets
• Each set contains a number of lines
• Use direct (mod) mapping to identify a set, replacement algorithm
within the set
• Thus a block maps to any line in a given set
– e.g. Block B can be in any line of set i
• E.g. 2 lines per set
– Called 2-way associative mapping
– A given block can be in one of 2 lines in a particular
set
Replacement Algorithms for:
Associative & Set Associative Mapping
• Once the cache has been filled, when a new block is brought into the cache, one
of the existing blocks must be replaced
• How do we determine which block in cache should be replaced?
• For the associative and set associative techniques , a replacement algorithm is
needed.
• Least Recently used (LRU)
– replace the block that has not had a hit the longest time
• First in first out (FIFO)
– replace block that has been in cache longest
• Least frequently used (LFU)
– replace block which has had fewest hits
• Random
– isn’t significantly inferior
Cache Write Policy
• In addition to determining which victim to select for replacement,
designers must also decide what to do with so-called dirty blocks of cache,
or blocks that have been modified.
• Need to insure coherence of cache and main memory
• Must not overwrite a cache block with a main memory block unless main
memory is up to date
• Main memory might be out of date for a number of reasons
– Multiple CPUs may have individual caches
– I/O may address main memory directly
• Techniques
– Write through
– Write back
Write through
• when the processor writes into a cache block, the
corresponding frame in the primary memory is also written
with the new data.
• write-through policy updates both the cache and the main
memory simultaneously on every write
• This is slower than write-back, but ensures that the cache is
consistent with the main system memory.
• All writes go to main memory as well as cache
• Lots of traffic
• Slows down writes
Write back
• when the processor writes something into a cache block, that
cache block is tagged as a dirty block, using a 1-bit tag. Before
a dirty block is replaced by a new frame, the dirty block is
copied into the primary memory
• Minimize memory writes
• Updates initially made in cache only
• UPDATE bit for a cache slot is set when update occurs
Line Size
• Larger line (block) sizes and therefore fewer lines
can be used
– Beneficial because hit ratio is increased
– Too large line size will replace useful existing cache
content with less useful (farther words in large block)
– Probability of needing data in cache becomes less than
probability of needing data not in cache
– This will reduce hit ratio
• There is no “universal optimum” between block size and hit ratio
– Depends on program characteristics
– 8 to 32 byte block size seems to work well in most
situations
Number of caches
• When caches were originally introduced, the typical system had a
single cache
• recently, the use of multiple caches has become the norm.
• Two aspects of this design issue
– multilevel caches (L1 vs L2)
– unified versus split caches
• multilevel caches
– two-level cache, with the internal cache designated as
level 1 (L1) and the external cache designated as level 2
(L2)
– on-chip cache (internal cache) reduces the processor’s
external bus activity and therefore speeds up execution
times and increases overall system performance
Unified vs. Split Cache
• Unified cache: data and instructions are cached in the same cache
• Split cache: separate caches for data and instructions
• Advantages of unified cache
– Balances possible imbalance between amount of data
and instructions in a program
– Only one cache needs to be manufactured
• Advantage of split cache
– Eliminates contention between INS fetch and execute
units
– Supports pipelining and speculative execution

You might also like