0% found this document useful (0 votes)
33 views

Memory

This document summarizes memory organization and the memory hierarchy. It discusses the different types of memory like RAM, ROM, DRAM, and SRAM. It then describes the memory hierarchy as a pyramid with cache, main memory, and disk. The key aspects of cache memory like hit rate, miss rate, hit time, and miss penalty are defined. Finally, it covers the different cache mapping schemes like direct mapping and how cache addresses are divided into tag, block, and word fields.

Uploaded by

dokumbaro30
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Memory

This document summarizes memory organization and the memory hierarchy. It discusses the different types of memory like RAM, ROM, DRAM, and SRAM. It then describes the memory hierarchy as a pyramid with cache, main memory, and disk. The key aspects of cache memory like hit rate, miss rate, hit time, and miss penalty are defined. Finally, it covers the different cache mapping schemes like direct mapping and how cache addresses are divided into tag, block, and word fields.

Uploaded by

dokumbaro30
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

Chapter 6

Memory

1
6.1 Introduction

• Memory lies at the heart of the stored-program


computer.
• In previous chapters, we studied the components from
which memory is built and the ways in which memory
is accessed by various ISAs.
• In this chapter, we focus on memory organization. A
clear understanding of these ideas is essential for the
analysis of system performance.

2
6.2 Types of Memory

• There are two kinds of main memory: random access


memory, RAM, and read-only-memory, ROM.
• There are two types of RAM, dynamic RAM (DRAM)
and static RAM (SRAM).
• Dynamic RAM consists of capacitors that slowly leak
their charge over time. Thus they must be refreshed every
few milliseconds to prevent data loss.
• DRAM is “cheap” memory owing to its simple design.

3
Cont….

• SRAM consists of circuits similar to the D flip-flop.


• SRAM is very fast memory and it doesn’t need to be
refreshed like DRAM does. (Once it holds data, it remains
stable as long as power is supplied.)
• It is used to build cache memory, which we will discuss
in detail later.
• ROM also does not need to be refreshed, either. In fact,
it needs very little charge to retain its memory.
• ROM is used to store permanent, or semi-permanent data
that persists even while the system is turned off. (PROM,
EPROM, EEPROM)
4
6.3 The Memory Hierarchy

*Generally speaking, faster memory is more expensive than


slower memory.
•Today’s computer systems use a combination of memory
types to provide the best performance at the best cost. This
approach is called hierarchical memory.
•The memories are classified based on its “distance” from
the processor, with distance measured by the number of
machine cycles required for access.
•The closer memory is to the processor, the faster it should
be.
•As memory gets further from the main processor, it can
afford longer access times. 5
Cont….

• This storage organization can be thought of as a pyramid:

6
Cont….

• To access a particular piece of data, the CPU first


sends a request to its nearest memory, usually cache.
• If the data is not in cache, then main memory is
queried. If the data is not in main memory, then the
request goes to disk.
• Once the data is located, then the data, and a number
of its nearby data elements are fetched into cache
memory.

7
Cont….

• This leads us to some definitions.


❖A hit is when data is found at a given memory level.
❖A miss is when it is not found.
❖The hit rate is the percentage of memory accesses
found in a given level of memory.
❖The miss rate The percentage of memory accesses not
found in a given level of memory.
Miss rate = 1 - hit rate.
❖The hit time is the time required to access data at a
given memory level.
❖The miss penalty is the time required to process a miss,
including the time that it takes to replace a block of
memory plus the time it takes to deliver the data to the
8
processor.
Cont….

• An entire blocks of data is copied after a hit because the


principle of locality tells us that once a byte is accessed, it
is likely that a nearby data element will be needed soon.
• The principle is that the instruction currently being
fetched/executed is very close in memory to the
instruction to be fetched/executed next.
OR it is the tendency of a processor to access the same set of
memory locations repetitively over a short period of time.
• principle of locality tells the phenomenon of the same
value or related storage locations being frequently
accessed.
9
Cont….

• There are three forms of locality:


▪ Temporal locality- Recently-accessed data elements
tend to be accessed again.(the reuse of specific data
and/or resources within relatively small time
durations)
⮚ It is likely that the same location will be referenced again in the near
future.
▪ Spatial locality - refers to the use of data elements
within relatively close storage locations.
⮚ it is likely that nearby memory locations will be referenced in the near future.
▪ Sequential locality - a special case of spatial locality,
occurs when data elements are arranged and accessed
sequentially. 10
6.4 Cache Memory

• Cache memory is a small, high speed (and thus high-cost)


type of memory that serves as a buffer for frequently
accessed data. (its access time is a fraction of that of main
memory)
• The purpose of cache memory is to speed up acces
• ses by storing recently used data closer to the CPU, instead
of storing it in main memory.
• Unlike main memory, which is accessed by address, cache
is typically accessed by content; hence, it is often called
content addressable memory.
• Because of this, a single large cache memory isn’t always
desirable-- it takes longer to search. 11
Cont….

• The size of cache memory can vary enormously. A typical


personal computer’s level 2 (L2) cache is 256K or 512K.
• Level 1 (L1) cache is smaller, typically 8K or 16K and it
resides on the processor, so it is faster than L2 cache
whereas L2 cache resides between the CPU and main
memory.
• Main memory is typically composed of DRAM whereas
cache is typically composed of SRAM, providing faster
access with a much shorter cycle time than DRAM

12
Cont….

Cache Mapping Schemes


• The CPU uses a specific mapping scheme that “converts” the
main memory address into a cache location.
•This address conversion is done by giving special significance
to the bits in the main memory address. We first divide the bits
into distinct groups we call fields.
•The fields into which a memory address is divided provide a
many-to-one mapping between larger main memory and the
smaller cache memory.
•Many blocks of main memory map to a single block of cache.
A tag field in the cache block distinguishes one cached memory
block from another. 13
Cont….

• In order to copy the data from Main memory to cache first


Main memory and cache are both divided into the same size
blocks.
• When a memory address is generated, cache is searched first
to see if the required word exists there. When the requested
word is not found in cache, the entire main memory block in
which the word resides is loaded into cache.(Because of
principle of locality )

14
Cont….

The three main cache mapping schemes


1. Direct mapped cache
•The simplest cache mapping scheme is direct mapped cache.
(uses a modular approach)
•In a direct mapped cache consisting of N blocks of cache,
block X of main memory maps to cache block.
• Once a block of memory is copied into its slot in cache, a
valid bit is set for the cache block to let the system know that
the block contains valid data.

15
Cont….

• The diagram below is a schematic of what cache looks


like.

• Block 0 contains multiple words from main memory,


identified with the tag 00000000. Block 1 contains words
identified with the tag 11110101.
• The other two blocks are not valid.

16
Cont….

Example.
The size of each field into which a memory address is
divided depends on the size of the cache.
•Suppose our memory consists of 214 words, cache has 16 =
24 blocks, and each block holds 8 words.
– Thus memory is divided into 214 / 2 3 = 211 blocks.
•For our field sizes, we know we need 4 bits for the block, 3
bits for the word, and the tag is what’s left over:

17
Cont….

• The word field (sometimes called the offset field) uniquely


identifies a word from a specific block.
• block field selects a unique block of cache.
• The tag field is whatever is left over. (The tag is a special
group of bits derived from the main memory address that is
stored with its corresponding block in cache)
• When a block of main memory is copied to cache, this tag is
stored with the block and uniquely identifies this block.

18
Cont….

Example, suppose a program generates the address 1AA. In


14-bit binary, this number is: 00000110101010.
•The first 7 bits of this address go in the tag field, the next 4
bits go in the block field, and the final 3 bits indicate the
word within the block.

19
Cont….

• If subsequently the program generates the address 1AB, it


will find the data it is looking for in block 0101, word 011.

• However, if the program generates the address, 3AB,


instead, the block loaded for address 1AA would be
evicted from the cache, and replaced by the blocks
associated with the 3AB reference.
N.B. the disadvantage of direct mapping is that two
words with the same index in their address but with
different tag values cannot reside in cache memory at
the same time.
20
Cont….

• Suppose a program generates a series of memory


references such as: 1AB, 3AB, 1AB, 3AB, . . . The
cache will continually evict and replace blocks.
• The theoretical advantage offered by the cache is lost in
this extreme case.
• This is the main disadvantage of direct mapped cache.
• Other cache mapping schemes are designed to prevent
this kind of thrashing.

21
Cont….

E.g Suppose a computer using direct mapped cache has 232


words of main memory, and a cache of 1024 blocks, where
each cache block contains 32 words.

a. How many blocks of main memory are there?


b. What is the format of a memory address as seen by the
cache, (what are the sizes of the tag, block, and word fields)?
c. To which cache block will the memory reference
000063FA16 map?

22
Cont….

a. 232/25 = 227
b. 32 bit addresses with 17 bits in the tag field, 10 in
the block field, and 5 in the word field
c. 000063FA = 00000000000000000 1100011111
11010, which implies Block 799

23
Cont….

2. Fully associative cache


•Instead of placing memory blocks in specific cache
locations based on memory address, we could allow a
block to go anywhere in cache.
•In this way, cache would have to fill up before any
blocks are evicted.
•This is how fully associative cache works.
•A memory address is partitioned into only two fields:
the tag and the word.

24
Cont….

• Suppose, as before, we have 14-bit memory addresses


and a cache with 16 blocks, each block of size 8. The
field format of a memory reference is:

• When the cache is searched, all tags are searched in


parallel to retrieve the data quickly.
• This requires special, costly hardware.

25
Cont….

• You will recall that direct mapped cache evicts (remove) a


block whenever another memory reference needs that block.
• With fully associative cache, we have no such mapping, thus we
must devise an algorithm to determine which block to evict
from the cache.
• The block that is evicted is the victim block.
• There are a number of ways to pick a victim.

26
Cont….

3. N-way Set associative cache


•Set associative cache combines the ideas of direct mapped cache
and fully associative cache.
•Owing to its speed and complexity, associative cache is very
expensive.
•An N-way set associative cache mapping is like direct mapped
cache in that a memory reference maps to a particular location in
cache.
•The important difference with direct mapping is a memory reference
maps to a set of several cache blocks, similar to the way in which
fully associative cache works.
•Instead of mapping anywhere in the entire cache, a memory
reference can map only to the subset of cache slots. 27
Cont….

• The number of cache blocks per set in set associative


cache varies according to overall system design.
• For example, a 2-way set associative cache can be
conceptualized as shown in the schematic below.
• Each set contains two different memory blocks.

28
Cont….

• In set associative cache mapping, a memory reference is


divided into three fields: tag, set, and word, as shown
below.
• As with direct-mapped cache, the word field chooses the
word within the cache block, and the tag field uniquely
identifies the memory address.
• The set field determines the set to which the memory
block maps.

29
Cont….

• Suppose we have a main memory of 214 words


• This memory is mapped to a 2-way set associative cache
having 16 blocks where each block contains 8 words.
• Since this is a 2-way cache, each set consists of 2 blocks,
and there are 8 sets.
• Thus, we need 3 bits for the set, 3 bits for the word,
giving 8 leftover bits for the tag:

30
Cont….

Replacement Policies
There are several popular replacement policies.
•With fully associative and set associative cache, a
replacement policy is invoked when it becomes necessary to
evict a block from cache.
•The algorithm for determining replacement is called the
replacement policy.
❖Optimal algorithm --We like to keep values in cache that
will be needed again soon, and throw out blocks that won’t
be needed again, or that won’t be needed for some time.

31
Cont….

• The replacement policy that we choose depends upon the


locality that we are trying to optimize-- usually, we are
interested in temporal locality.
• A least recently used (LRU) algorithm keeps track of the last
time that a block was assessed (assign a timestamp
to the block) and evicts(remove) the block that has been used
least recently.
e.g. Assume here are the different pages(blocks) which are
requested by the CPU
3 2 1 3 4 1 6 2 4 3 4
• The disadvantage of this approach is its complexity: LRU requires the
system to keep a history of accesses for every cache block, which requires
significant space and slows down the operation of the cache.
32
Cont….

❖ First-in, first-out (FIFO) is a popular cache replacement


policy.
• In FIFO, the block that has been in cache the longest
(regardless of how recently it has been used) would be
selected as the victim to be removed from cache memory.
❖ A random replacement policy does what its name implies:
It picks a block at random and replaces it with a new
block.
• Random replacement can certainly evict a block that will
be needed often or needed soon, but it never thrashes.

33
Cont….

EAT(Effective access time)


•The performance of hierarchical memory is measured by its
effective access time (EAT).
•EAT is a weighted average that takes into account the hit
ratio and relative access times of successive levels of
memory.
•The EAT for a two-level memory is given by:
EAT = H × AccessC + (1-H) × AccessMM.
where H is the cache hit rate and AccessC and AccessMM
are the access times for cache and main memory,
respectively.
34
Cont….

* For example, consider a system with a main memory


access time of 200ns supported by a cache having a 10ns
access time and a hit rate of 99%.
•The EAT is:
0.99(10ns) + 0.01(200ns) = 9.9ns + 2ns =
11.9ns.

35
Cont….

Cache Write Policies


•Cache replacement policies must also take into account dirty
blocks, those blocks that have been updated while they were in
the cache.
•Dirty blocks must be written back to memory. A write policy
determines how this will be done.
•If a cache block is modified, the cache write policy
determines when the actual main memory block is updated
to match the cache block.
•There are two types of write policies,
write through and write back. 36
Cont….

• The simplest and most commonly used procedure is to


update main memory with every memory write
operation, with cache memory being updated in
parallel if it contains the word at the specified address.
This is called the write-through method.
• The disadvantage of write through is that memory
must be updated with each cache write, which slows
down the access time on updates.

37
Cont….

• Write back(also called copyback)


It only updates blocks in main memory when the cache block is
selected as a victim and must be removed from cache.
• In this method, when data is modified in the cache, the
corresponding main memory location is not immediately
updated. Instead, the updated data is marked as "dirty" in the
cache, indicating that it differs from the corresponding data in
the main memory.
• The advantage of write back is that memory traffic is minimized,
but its disadvantage is here is data availability risk because the
cache could fail (and so suffer from data loss) before the data is
persisted to the backing store. This result in the data being lost.
38
6.5 Virtual Memory
• Cache memory enhances performance by providing faster
memory access speed.
• Virtual memory enhances performance by providing greater
memory capacity, without the expense of adding main memory.
• Instead, a portion of a disk drive serves as an extension of main
memory.
• If a system uses paging, virtual memory partitions main
memory into individually managed page frames, that are
written (or paged) to disk when they are not immediately
needed.
• The easiest way to think about virtual memory is to
conceptualize it as an imaginary memory location in which all
addressing issues are handled by the operating system. 39
Cont….

*virtual memory allows a program to run when only specific


pieces are present in memory. The parts not currently being
used are stored in the page file on disk.
❖Virtual address—The logical or program address that the
process uses. Whenever the CPU generates an address, it is
always in terms of virtual address space.
❖ Physical address—The real address in physical memory.
❖Mapping—The mechanism by which virtual addresses are
translated into physical ones.
Programs create virtual addresses that are mapped to physical
addresses by the memory management unit(which is a
hardware device).
❖Page frames—The equal-size chunks or blocks into which 40
main memory (physical memory) is divided.
Cont….

❖pages—The chunks or blocks into which virtual memory (the


logical address space) is divided, each equal in size to a page
frame. Virtual pages are stored on disk until needed.
❖Paging—The process of copying a virtual page from disk to a
page frame in main memory.
❖Fragmentation—Memory that becomes unusable.(when the
paging process results in the creation of small, unusable clusters
of memory addresses)
❖Page fault—An event that occurs when a requested page is
not in main memory and must be copied into memory from disk.

41
Cont….

• Main memory and virtual memory are divided into equal


sized pages.
• The entire address space required by a process need not
be in memory at once. Some parts can be on disk, while
others are in main memory.
• Further, the pages allocated to a process do not need to
be stored contiguously-- either on disk or in memory.
• In this way, only the needed pages are in memory at any
time, the unnecessary pages are in slower disk storage.

42
Cont….

• Information concerning the location of each page, whether


on disk or in memory, is maintained in a data structure
called a page table (shown below).
• There is one page table for each active process.

43
Cont….

• When a process generates a virtual address, the operating


system translates it into a physical memory address.
• To accomplish this, the virtual address is divided into two
fields: A page field, and an offset field.
• The page field determines the page location of the address,
and the offset indicates the location of the address within
the page.
• The logical page number is translated into a physical page
frame through a lookup in the page table.

44
Cont….

• If the valid bit is zero in the page table entry for the logical
address, this means that the page is not in memory and must
be fetched from disk.
▪ This is a page fault.
▪ If necessary, a page is evicted from memory and is
replaced by the page retrieved from disk, and the valid
bit is set to 1.
• If the valid bit is 1, the virtual page number is replaced by
the physical frame number.
• The data is then accessed by adding the offset to the
physical frame number.
45
Cont….

• As an example, suppose a system has a virtual address


space of 8K and a physical address space of 4K, and the
system uses byte addressing and a page size of 1K words.
▪ We have 213/210 = 23 virtual pages.
• A virtual address has 13 bits (8K = 213) with 3 bits for the
page field and 10 for the offset, because the page size is
1024.
• A physical memory address requires 12 bits, the first two
bits for the page frame and the trailing 10 bits the offset.

46
Cont….

Suppose we have the page table shown below.


•What happens when CPU generates address 545910 =
10101010100112?

47
Cont….

• The address 10101010100112 is converted to physical


address 010101010011 because the page field 101 is
replaced by frame number 01 through a lookup in the page
table.

48
Cont….

• What happens when the CPU generates address


10000000001002?

N.B In this case virtual address 410010 generating a page fault; page 4 = 1002 is
not valid in the page table.
49
Cont….

• Since page tables are read constantly, it makes sense to keep them
in a special cache called a translation look-aside buffer (TLB).
• TLBs are a special associative cache that stores the mapping of
virtual pages to physical pages.
• They are special caches used to keep track of recently used
transactions.
• We can speed up the page table lookup by storing the most recent
page lookup values in a TLB.

50
Cont….

* Current State of the TLB for the previous page table

51
Cont….

52
Cont….

• Another approach to implement virtual memory is the use


of segmentation.
• Instead of dividing memory into equal-sized pages, virtual
address space is divided into variable-length segments,
often under the control of the programmer.
• Physical memory isn’t really divided or partitioned into
anything.
• When a segment needs to be copied into physical
memory, the operating system looks for a chunk of free
memory large enough to store the entire segment.

53
Cont….

• A segment is located through its entry in a segment table,


which contains the segment’s memory location and a
bounds limit that indicates its size.
• Both paging and segmentation can cause fragmentation.
• Paging is subject to internal fragmentation because a
process may not need the entire range of addresses
contained within the page. Thus, there may be many pages
containing unused fragments of memory.
• Segmentation is subject to external fragmentation, which
occurs when contiguous chunks of memory become broken
up as segments are allocated and deallocated over time.

54
Cont….

• Paging is not the same as segmentation.


• Paging is based on a purely physical value: The program and
main memory are divided up into the same physical size
chunks.
• Segmentation, on the other hand, allows for logical portions
of the program to be divided into variable-sized partitions.
• Paging is easier to manage: allocation, freeing, swapping, and
relocating are easy when everything’s the same size.

55
6.6 Real-World Example
• The Pentium architecture supports both paging and
segmentation, and they can be used in various
combinations including unpaged unsegmented,
segmented unpaged, and unsegmented paged.
• The processor supports two levels of cache (L1 and L2),
both having a block size of 32 bytes.
• The L1 cache is next to the processor, and the L2 cache
sits between the processor and memory.
• The L1 cache is in two parts: the Pentium (like many
other machines) separates L1 cache into cache used to
hold instructions called instruction cache (I-cache) and a
data cache (D-cache). 56
The next slide shows this organization schematically.
Cont….

57

You might also like