0% found this document useful (0 votes)
11 views

Ch 4A_cache Memory (2)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Ch 4A_cache Memory (2)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 38

Chapter 4

Computer Memory System

1
The memory system
• The memory unit is an essential component in
any digital computer since it is needed for
storing programs and data
• Not all accumulated information is needed by
the CPU at the same time
• Therefore, it is more economical to use low-
cost storage devices to serve as a backup for
storing the information that is not currently
used by CPU
Characteristics of Computer Memory
• Location
• Capacity
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical
characteristics
• Organization
Location

• Location refers to weather memory is internal


or external to the computer
• Internal
– Main memory (RAM), cache memory
• External
– Secondary memory – hard disks
– Removable media – ZIP, CD-ROM, Tape
Capacity
• Word size
– The natural unit of organization of memory
– For internal memory, capacity is expressed in terms of bytes (1 byte = 8 bits) or
words
– For example, on a 32 bit machine, the word is 32 bits long
– The word size determines the address space
– Common word lengths: 8, 16, 32, 64 bits
– External memory capacity is typically expressed in terms of
bytes.
• Addressable units are typically words
– N = 2A, where N = size of address space, A = address length
(in bits)
– On disks the addressable units are blocks or clusters
– E.g The most memory you can address with 32 bits is 4 GB.
Unit of Transfer
• For main memory, Unit of transfer is
– the number of bits read out of or written into
memory at a time.
– Determined by data bus width
– May be different from word or addressable unit
• For External memory
– Usually a block which is much larger than a word
Access Methods
• Sequential
– Start at the beginning and read through in order
– Access time depends on location of data and
previous location
– e.g. tape
• Direct
– Individual blocks or records have unique address based on physical
location.
– Access is accomplished by direct access to reach a general vicinity
plus sequential searching, counting, or waiting to reach the final
location. And access time is variable
– Access time depends on location and previous location, but much
faster than sequential access
– e.g. disk
Access Methods
• Random
– Each addressable location in memory has a unique, physically wired-in
addressing mechanism.
– The time to access a given location is independent of the sequence of prior
accesses and is constant
– Thus, any location can be selected at random and directly addressed and
accessed.
– e.g. main memory, some caches

• Associative
– This is a random access type of memory that enables one to make a
comparison of desired bit locations within a word for a specified match
– Access time is independent of location or previous access
– a word is retrieved based on a portion of its contents rather than its address
– e.g. many cache systems
Performance
• Three Performance parameters are used
• Access time (TA)
– the time it takes to perform a read or write operation
• Memory Cycle time
– Time may be required for the memory to “recover”
before next access
– Cycle time = recovery time + access time
• Transfer Rate (R = bits / second)
– rate at which data can be transferred into or out of a
memory unit. For random-access memory, it is equal
to 1/(cycle time).
Physical Types

• Semiconductor
• RAM (DRAM or SRAM)
• Magnetic
– Disk & Tape
• Optical
– CD & DVD
• Others
– Bubble
– Hologram
Physical Characteristics
• Volatility
– Volatile memory: without power, memory is
erased
• Magnetic-surface memories are nonvolatile
• Semiconductor can be volatile or non-volatile
• Erasable
– Non-erasable memory cannot be altered
– E.g. ROM = Read-Only Memory
Organization
• For random-access memory, the organization is a key design
issue
• Organization refers to the physical arrangement of bits to
form words.
The Memory Hierarchy

• The design constraints on a computer’s memory can be summed up


by three questions:
– How much? –capacity
– How fast? ----performance
– How expensive?----cost of memory
• There is a trade-off among the three key characteristics of memory:
capacity, access time, and cost
• The following relationships hold:
• Faster access time, greater cost per bit
• Greater capacity, smaller cost per bit
• Greater capacity, slower access time
Memory Hierarchy
• Computer Memory Hierarchy is a pyramid
structure that is commonly used to illustrate the
significant differences among memory types.
• The memory unit that directly communicate with
CPU is called the main memory
• Devices that provide backup storage are called
auxiliary memory
• The memory hierarchy system consists of all
storage devices employed in a computer system
from the slow by high-capacity auxiliary
memory to a relatively faster main memory, to
an even smaller and faster cache memory
Memory Hierarchy - Diagram
• Lower cost
• Bigger capacity
• Slower access /
Less frequent access
As one goes down the memory hierarchy, one finds

– decreasing cost/bit,
– Increasing capacity,
– increasing access time/ slower access time
– Decreasing frequency of access of the memory by the
processor
The main memory occupies a central position by being able to
communicate directly with the CPU and with auxiliary memory
devices through an I/O processor.
Cache Memory
• Cache is a small amount of fast memory
• Purpose is to exploit Locality of Reference
• Contains copies of portions of main memory
• Sits between normal main memory and CPU
• May be located on CPU chip or module

Single cache
Cache and Main Memory
Cache operation - overview
• CPU requests a word (contents of a memory location)
• Check cache for this word
• If present (= cache hit), get from cache (fast)
• If not present (= cache miss), read a block from main memory,
that contains this word, to cache
• Simultaneously, deliver word to CPU
• Cache includes tags to identify which block of main memory is
in each cache slot
Cache Read Operation
Cache/Main Memory Structure

For mapping purpose


M = 2n / K blocks in RAM
C << M blocks in cache

n= address length (in bits)


K=no of words in a block
C=cache block /lines
M=memory block
Cache Design
• Main memory consists of up to 2n addressable words, with each word
having a unique n-bit address.
• For mapping purposes, this memory is considered to consist of a number
of fixed-length blocks of K words each. That is, there are M = 2n/K
blocks in main memory.
• The cache consists of m blocks, called lines. Each line contains K words,
plus a tag of a few bits.
• The length of a line, not including tag and control bits, is the line size.
• The line size may be as small as 32 bits, with each “word” being a single
byte; in this case the line size is 4 bytes
• The number of lines is considerably less than the number of main
memory blocks (m<<M).
• Because there are more blocks than lines, an individual line cannot be
uniquely and permanently dedicated to a particular block. Thus, each line
includes a tag that identifies which particular block is currently being
stored.
Typical Cache Organization
Elements of Cache Design

Elements of Cache Design


Cache Addressing
• Where does cache sit?
– Between processor and virtual memory management unit
– Between MMU and main memory
• Logical cache (virtual cache) stores data using virtual addresses
– Processor accesses cache directly, not thorough physical cache
– Cache access faster, before MMU address translation
– Virtual addresses use same address space for different applications
• Must flush cache on each context switch
• Physical cache stores data using main memory physical addresses
• For reads to and writes from main memory, a
hardware memory management unit (MMU) translates
each virtual address into a physical address in main
memory
Logical
and
Physical
Caches
Cache Size
• Cost
– Larger cache is more expensive
• Speed
– Larger cache is faster ,But only up to a point
– Beyond which access slows down
– Because checking cache for data takes more time
– large caches tend to be slightly slower than small
ones
Mapping Function

• Because there are fewer cache lines than main memory blocks, an
algorithm is needed for mapping main memory blocks into cache
lines
• Three techniques can be used: direct, associative, and set associative
Direct Mapping pros & cons
• Each block of main memory maps to only one cache line
– i.e. if a block is in cache, it must be in one specific
place
• Simple
• Inexpensive
• Fixed location for given block
– A given block always maps to the same line in cache
– If a program accesses 2 blocks that map to the same
line repeatedly, cache misses are very high
– This is called thrashing
Associative Mapping
• A main memory block can load into any line of cache
• The line is determined by a replacement algorithm
• Memory address is interpreted as tag and word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
– This is done in parallel for all tags
– Requires complex circuitry
• Cache searching gets expensive
Set Associative Mapping
• Combination of direct and associative mapping
• Cache is divided into a number of sets
• Each set contains a number of lines
• Use direct (mod) mapping to identify a set, replacement algorithm
within the set
• Thus a block maps to any line in a given set
– e.g. Block B can be in any line of set i
• E.g. 2 lines per set
– Called 2-way associative mapping
– A given block can be in one of 2 lines in a particular
set
Replacement Algorithms for:
Associative & Set Associative Mapping
• Once the cache has been filled, when a new block is brought into the cache, one
of the existing blocks must be replaced
• How do we determine which block in cache should be replaced?
• For the associative and set associative techniques , a replacement algorithm is
needed.
• Least Recently used (LRU)
– replace the block that has not had a hit the longest time
• First in first out (FIFO)
– replace block that has been in cache longest
• Least frequently used (LFU)
– replace block which has had fewest hits
• Random
– isn’t significantly inferior
Cache Write Policy
• In addition to determining which victim to select for replacement,
designers must also decide what to do with so-called dirty blocks of cache,
or blocks that have been modified.
• Need to insure coherence of cache and main memory
• Must not overwrite a cache block with a main memory block unless main
memory is up to date
• Main memory might be out of date for a number of reasons
– Multiple CPUs may have individual caches
– I/O may address main memory directly
• Techniques
– Write through
– Write back
Write through
• when the processor writes into a cache block, the
corresponding frame in the primary memory is also written
with the new data.
• write-through policy updates both the cache and the main
memory simultaneously on every write
• This is slower than write-back, but ensures that the cache is
consistent with the main system memory.
• All writes go to main memory as well as cache
• Lots of traffic
• Slows down writes
Write back
• when the processor writes something into a cache block, that
cache block is tagged as a dirty block, using a 1-bit tag. Before
a dirty block is replaced by a new frame, the dirty block is
copied into the primary memory
• Minimize memory writes
• Updates initially made in cache only
• UPDATE bit for a cache slot is set when update occurs
Line Size
• Larger line (block) sizes and therefore fewer lines
can be used
– Beneficial because hit ratio is increased
– Too large line size will replace useful existing cache
content with less useful (farther words in large block)
– Probability of needing data in cache becomes less than
probability of needing data not in cache
– This will reduce hit ratio
• There is no “universal optimum” between block size and hit ratio
– Depends on program characteristics
– 8 to 32 byte block size seems to work well in most
situations
Number of caches
• When caches were originally introduced, the typical system had a
single cache
• recently, the use of multiple caches has become the norm.
• Two aspects of this design issue
– multilevel caches (L1 vs L2)
– unified versus split caches
• multilevel caches
– two-level cache, with the internal cache designated as
level 1 (L1) and the external cache designated as level 2
(L2)
– on-chip cache (internal cache) reduces the processor’s
external bus activity and therefore speeds up execution
times and increases overall system performance
Unified vs. Split Cache
• Unified cache: data and instructions are cached in the same cache
• Split cache: separate caches for data and instructions
• Advantages of unified cache
– Balances possible imbalance between amount of data
and instructions in a program
– Only one cache needs to be manufactured
• Advantage of split cache
– Eliminates contention between INS fetch and execute
units
– Supports pipelining and speculative execution

You might also like