UNIT V (Memory System)
UNIT V (Memory System)
UNIT-V(Memory System)
The Memory System
• the execution speed of programs is highly dependent on the speed
with which instructions and data can be transferred between the
processor and the memory.
• Ideally, the memory would be fast, large, and inexpensive.
Unfortunately, it is impossible to meet all three of these requirements
simultaneously.
• The memory of a computer comprises a hierarchy, including a cache,
the main memory, and secondary storage.
• Here we describe the most common components and organizations
used to implement these units, so as to improve the effective speed
and size of the memory
Basic Concepts
Memory
Measurement
Units
Address Space
• The number of locations represents the size of the address space of the
computer.
• Example: Machines whose instructions generate 32-bit addresses can
utilize a memory that contains up to 232 = 4G (giga) locations
• The memory is usually designed to store and retrieve data in word-length
quantities
• Example:
• Consider a byte-addressable computer whose instructions generate 32-bit addresses.
• the high order 30 bits determine which word will be accessed.
• If a byte quantity is specified, the low-order 2 bits of the address specify which byte
location is involved.
Memory and Processor Connections
• Since its normal operation involves only reading the stored data, a
memory of this type is called a read-only memory (ROM).
ROM Cell
• A logic value 0 is stored in the cell if the
transistor is connected to ground at point P;
otherwise, a 1 is stored.
• The state of the connection to ground in each
cell is determined when the chip is
manufactured
• The bit line is connected through a resistor to
the power supply.
• To read the state of the cell, the word line is
activated to close the transistor switch.
• If there is no connection to ground, the bit line
remains at the high voltage level, indicating a 1.
• A sense circuit at the end of the bit line
generates the proper output value. ,
Types of
ROM
PROM(Programmable ROM)
• Some ROM designs allow the data to be loaded by the user, thus providing a
programmable ROM (PROM).
• Programmability is achieved by inserting a fuse at point P in Figure.
• Before it is programmed, the memory contains all 0s.
• The user can insert 1s at the required locations by burning out the fuses at these
locations using high-current pulses.
• Of course, this process is irreversible.
• PROMs provide flexibility and convenience not available with ROMs
• It provides a more convenient and considerably less expensive approach, because
memory chips can be programmed directly by the user.
EPROM (Erasable PROM)
• It allows the stored data to be erased and
new data to be written into it.
• It provides considerable flexibility during
the development phase of digital systems.
• Erasure is done by exposing the chip to
ultraviolet light, which erases the entire
contents of the chip.
• To make this possible, EPROM chips are
mounted in packages that have
transparent windows.
EPROM Working
• An EPROM cell has a structure similar to the ROM cell
• However, the connection to ground at point P is made through a
special transistor.
• The transistor is normally turned off, creating an open switch. It can
be turned on by injecting charge into it that becomes trapped inside.
• Erasure requires dissipating the charge trapped in the transistors that
form the memory cells.
• This can be done by exposing the chip to ultraviolet light, which
erases the entire contents of the chip.
EEPROM (Electrically Erasable PROM)
• An EPROM must be physically removed from the circuit for reprogramming.
• Also, the stored information cannot be erased selectively. The entire
contents of the chip are erased when exposed to ultraviolet light.
• EEPROM can be programmed, erased, and reprogrammed electrically.
• It does not have to be removed for erasure.
• Moreover, it is possible to erase the cell contents selectively.
• One disadvantage of EEPROMs is that different voltages are needed for
erasing, writing, and reading the stored data, which increases circuit
complexity.
• Due to its many advantages, they have replaced EPROMs in practice.
EPROM Vs EEPROM
Flash Memory
• Similar to EEPROM technology
• The key difference is that, in a flash device, it is only possible to write an entire
block of cells.
• Prior to writing, the previous contents of the block are erased.
• Flash devices have greater density, which leads to higher capacity and a lower
cost per bit.
• They require a single power supply voltage, and consume less power in their
operation.
• The low power consumption of flash memories makes them attractive for use in
• portable, battery-powered equipment.
• Typical applications include hand-held computers, cell phones, digital cameras,
and MP3 music players
Flash Memories Vs SDD(Static Storage Device)
• Similarities:
• both are faster than HDDs and do not have
moving parts like HDDs.
• Both are also forms of non-volatile memory, so
they retain any information saved to them even
after you shut down your computer
• and flash and SSD storage are easily
rewriteable.
• Differences:
• the flash memory in flash drives is often much
slower than the flash used in SSDs.
• while most SSDs do use flash memory, not all
SSDs do. (SSD indicates no moving parts)
HDD Vs SDD
(HDD has
moving parts,
but SSD
doesn’t have
them)
Memory Hierarchy
Cache Memories
• The cache is a small and very fast memory, interposed
between the processor and the main memory.
• Its purpose is to make the main memory appear to the
processor to be much faster than it actually is.
• The effectiveness of this approach is based on a property of
computer programs called locality of reference.
• many instructions in localized areas of the program are
executed repeatedly during some time period
• This behaviour manifests itself in two ways: temporal and
spatial.
Temporal and Spatial Locality
• Temporal locality:
• suggests that whenever an information item, instruction or data, is first
needed, this item should be brought into the cache, because it is likely to be
needed again soon.
• Spatial locality:
• suggests that instead of fetching just one item from the main memory to the
cache, it is useful to fetch several items that are located at adjacent addresses
as well.
• The term cache block(cache line) refers to a set of contiguous address
locations of some size.
Working of
Cache Memory
• When the processor issues a
Read request, the contents of
a block of memory words
containing the location
specified are transferred into
the cache.
• Subsequently, when the
program references any of
the locations in this block, the
desired contents are read
directly from the cache.
Replacement Algorithm
• The correspondence between the main memory blocks and those in
the cache is specified by a mapping function.
• When the cache is full and a memory word (instruction or data) that
is not in the cache is referenced, the cache control hardware must
decide which block should be removed to create space for the new
block that contains the referenced word.
• The collection of rules for making this decision constitutes the cache’s
replacement algorithm.
Cache Hit
• The cache control circuitry determines whether the word requested by the processor
currently exists in the cache.
• If it does, the Read or Write operation is performed on the appropriate cache location.
• In this case, cache hit is said to have occurred.
• The main memory is not involved when there is a cache hit in a Read operation.
• For a Write operation, the system can proceed in one of two ways:
• write-through protocol: both the cache location and the main memory location are
updated.
• write-back, or copy-back protocol:
• update only the cache location and to mark the block containing it with an associated flag bit,
often called the dirty or modified bit.
• The main memory location of the word is updated later, when the block containing this
marked word is removed from the cache to make room for a new block(cache replacement).
• Both the protocols result in unnecessary write operations in the main memory
• The write-back protocol is used most often, to take advantage of the high speed with
which data blocks can be transferred to memory chips.
Cache Miss
• Read Miss - A Read operation for a word that is not in the cache:
• It causes the block of words containing the requested word to be copied from
the main memory into the cache.
• After this, the particular word requested is forwarded to the processor.
• Alternatively, this word may be sent to the processor as soon as it is read from
the main memory(called load-through, or early restart)
• Write miss - A Write operation for a word that is not in the cache
• occurs in a computer that uses the write-through protocol, the information is
written directly into the main memory
• For the write-back protocol, the block containing the addressed word is first
brought into the cache, and then the desired word in the cache is overwritten
with the new information.
Separate Instruction and Data Caches
• When memory operand fetch for the current instruction and the
subsequent instruction fetch, is being done at the same time,
instruction fetch is delayed until the data access operation is
completed.
• the instruction pipeline is stalled during this operation
• To avoid stalling the pipeline, many processors use separate caches
for instructions and data, making it possible for the two operations to
proceed in parallel
Lecture-32
(UNIT-V)
Cache Mapping Techniques
• There are several methods for determining where memory blocks are
placed in the cache.
• Example: Assume that the memory system has
• A cache consisting of 128 blocks of 16 words each, total (2048 (2K)) words.
• The main memory consisting of 4K blocks of 16 words each, total 64K words
• consecutive addresses refer to consecutive words.
• Techniques:
• Direct Mapping
• Associative Mapping
• Set - Associative Mapping
Direct Mapping
• In this technique, block j of the main
memory maps onto block j modulo
128 (j%128) of the cache
• Mappings: MB -> CB
• 0->0, 128->0, 256->0, … 3968->0
• 1->1, 129->1, 257->1, … 3969->1
• When accessing a memory word, the
corresponding TAG fields are
𝑇𝑎𝑔 𝑏𝑖𝑡𝑠 =
𝑛𝑜. 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘𝑠 𝑖𝑛 𝑚𝑎𝑖𝑛 𝑚𝑒𝑚𝑜𝑟𝑦 compared
𝑛𝑜. 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘𝑠 𝑖𝑛 𝑐𝑎𝑐ℎ𝑒 𝑚𝑒𝑚𝑜𝑟𝑦 • Match -> hit and Mismatch ->
miss
• Contention is resolved using
overwriting the new block on the old
one (e. g. M1 and M257, no choice:
direct replacement)
• May lead to poor performance if both
blocks are required frequently
Associative Mapping
• It is the most flexible mapping method, in which a
main memory block can be placed into any cache
block position
• In the example, 12 tag bits are required to identify
a memory block when it is resident in the cache
• The tag bits of an address received from the
processor are compared to the tag bits of each
block of the cache to see if the desired block is
present.
• When a new block is brought into the cache, it
replaces (ejects) an existing block only if the cache
is full.
• In this case, we need a replacement algorithm to
select the block to be replaced.
• The complexity of an associative cache is higher
than that of a direct-mapped cache, because of the
need to search all 128 tag patterns.
• To avoid a long delay, the tags must be searched in
parallel. A search of this kind is called an
associative search
Set - Associative Mapping
• It is a combination of the direct- and associative-
mapping techniques.
• Figure shows Set-associative-mapped cache with
two blocks per set.
• The blocks of the cache are grouped into sets, and
the mapping allows a block of the main memory
to reside in any block of a specific set.
• Hence, the contention problem of the direct
method is eased by having a few choices for block
placement
• At the same time, the hardware cost is reduced by
decreasing the size of the associative search(lesser
tag bits and lesser comparisons of tag fields).
• two-way associative search is required in the
example.
• The number of blocks per set is a parameter that
can be selected to suit the requirements of a
particular computer.
Stale Data in cache and Valid Bit
• A control bit, usually called the valid bit, must be provided for each
cache block to indicate whether the data in that block are valid(1) or
invalid(0).
• When power is first turned on, the cache contains no valid data and bits
of all the cache blocks are set to 0.
• If the memory blocks being updated (using DMA) are currently in the
cache, the valid bits of the corresponding cache blocks are set to 0.
• As program execution proceeds, the valid bit of a given cache block is
set to 1 when a memory block is loaded into that location.
• The processor fetches data from a cache block only if its valid bit is
equal to 1.
• The use of the valid bit in this manner ensures that the processor will
not fetch stale data from the cache.
Cache Coherence
• Consider a system that uses the write-back protocol for cache.
• Under this protocol, new data written into the cache are not written to the memory at the same
time.
• Hence, data in the memory do not always reflect the changes that may have been made in the
cached copy.
• It is important to ensure that such stale data in the memory are not transferred to the disk.
• One solution is to flush the cache, by forcing all dirty blocks to be written back to the memory
before performing the transfer to the disk.
• The operating system can do this by issuing a command to the cache before initiating the DMA
operation that transfers the data to the disk.
• Flushing the cache does not affect performance greatly, because such disk transfers do not occur
often
• The need to ensure that two different entities (the processor and the DMA subsystems in this
case) use identical copies of the data is referred to as a cache-coherence problem.