0% found this document useful (0 votes)

32 views

Memory Technology & Hierarchy: Caching and Virtual Memory Parallel System Architectures

This document discusses cache memory and cache design. It begins by summarizing that caches are small, fast memories that store recently used data close to the processor. As memory access times have increased, processors now commonly use multiple levels of cache, such as L1, L2, and L3 caches. Caches operate transparently to retrieve frequently accessed data but programmers must be aware of cache behavior when designing code. The document then provides details on cache organization, mapping, and addressing.

Uploaded by

vamshi krishna

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Memory Technology & Hierarchy: Caching and Virtual Memory Parallel System Architectures

Uploaded by

vamshi krishna

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

1.

Memory technology
& Hierarchy
Caching and Virtual Memory

Parallel System Architectures

Andy D. Pimentel
Caches and their design
cf. Henessy & Patterson, Chap. 5
Caching - summary
Caches are small fast memories that store recently used data close to the
processor (usually on-chip)
As the memory wall has grown, the number of levels of cache between main
memory and the processor has increased
from 0 to 1 to 2 and now many systems use 3 levels
Caches are largely transparent to the programmer
but programmers must be aware of the cache while
designing code to ensure regular access patterns

Parallel System Architectures, Andy D. Pimentel

The processor’s memory hierarchy
100MHz GHz
1/cycle time clocks clocks
(bandwidth)

Registers - on-chip SRAM <1 <1

L1 cache - on-chip SRAM 1-2 1-6
L2 cache - on-chip SRAM 2-6 ≃10
off-chip SRAM 4-8 ≥10
L3 cache - off-chip SRAM ≃10 ≥100
Main memory - DRAM ≥10 ≥1000
Distributed memory ≥100 ≥10000
Size
Parallel System Architectures, Andy D. Pimentel
Cache operation at multiple levels
Caches contain copies of blocks of data from main memory - cache lines
Reads to memory go down the memory hierarchy 
- at each level a check is made to determine if the data is present at that level
Cache hit - the required data is in the cache: the data is taken from that level and
propagated up the hierarchy
Cache miss - the required data is not in the cache: request goes down a level until found
A cache miss at any level may overwrite old data when the requested new data is propagated
up the hierarchy - “thrashing” occurs when the old data is needed shortly
When data is written to the cache, it is written back to main memory either immediately, when
space is required in the cache, or, in a multi-processor system, when another processor requires it

Parallel System Architectures, Andy D. Pimentel

Caching principles
Caches provide reuse of recently fetched data transparently to the programmer or compiler
Shorter delay of access to same data after the first access to a
longer delay memory
Caches rely on the principle of locality:
Temporal locality - information that has just been used is likely to be used again in the
future
Spatial locality - because a cache line contains more than one word of data, words close to
the original miss will now be resident in the cache and may be accessed without further penalty
The former requires frequent access to the same data,  
the latter requires regular access patterns to memory  
e.g. regular small strides through memory – e.g. consecutive words

Parallel System Architectures, Andy D. Pimentel

Cache design issues
Caches can be:
Unified or separate w.r.t. data and instructions
L1 cache normally separate and L2/L3 normally unified
Write through - data is written to cache and also sent to the lower level
Write around (no-write-allocate) - data is sent to lower level but not written to cache
Write/Copy back - data is written to cache but not sent down the hierarchy: the lower level memories
may become inconsistent with respect to program state
Copy back is used in multi-processor systems: a write around/through strategy can consume a large
amount of bus or network bandwidth
How to maintain coherence between multiple copies?
Higher levels of cache are normally write around/through

Parallel System Architectures, Andy D. Pimentel

Mapping from memory to cache
The line or block size is the unit of data managed by the cache, typically 32-256 bytes
each line has a tag (from its address) stored in the cache and used to determine which memory
block is mapped to the cache line
A cache mapping determines which line(s) in a cache an address in memory can mapped to:
Direct mapped (simplest) yields a unique line in cache for any given block in memory -
based on its address
Fully associative (most complex) allows any memory block to be mapped to any cache line
Set-associative cache gives a compromise between these extremes
for example, a “4-way set associative” cache has sets of 4 lines where a line may be mapped to
Associative mapping requires concurrent tag matching to find a line in a single memory cycle

Parallel System Architectures, Andy D. Pimentel

Cache lines

state tag Data

The tag comprises enough address information to identify which block of

memory the cache line holds
The bits required depend on the mapping strategy
State used in algorithm to replace lines e.g. valid/invalid

Parallel System Architectures, Andy D. Pimentel

Cache mapping - example
Fully Direct 2-way
associative mapped set associative
Block no 12345678 Block no 12345678 Block no 12345678

Set Set Set Set

0 1 2 3

For the memory address 386, 32-byte cache lines and an 8 line cache: 
<block addr> = floor(<mem addr> / <cache line size>) = floor(386 / 32) = 12
Direct mapped: line = <block addr> mod <nr. of lines> = 12 mod 8 = 4
2-way set associative: <nr. of sets> = <nr. of lines> / <set associativity> 
set = <block addr> mod <nr. of sets> = 12 mod 8/2 = 0
Fully associative: one set of 8 lines, so anywhere in cache
Parallel System Architectures, Andy D. Pimentel
Direct mapped caches
Cache line number 000 001 010 011 100 101 110 111

Cache line size

...00001 ...00101 ...01001 ...01101 ...10001 ...10101 ...11001 ...11101 ...00001 ...00101

Memory address

Parallel System Architectures, Andy D. Pimentel

Direct-mapped caches
A direct mapped cache is simple and fast
…but has problems from its inflexibility in mapping
Address strides (differences between consecutive addresses) of a multiple of the
cache line size map subsequent accesses (to different memory blocks) all to the
same cache line – even though other lines may be empty!
This is called a pathological access pattern
Direct mapped cache is often used as 2nd or 3rd level cache which is much larger and
hence has less contention but the programmer must still be aware of this restriction

Parallel System Architectures, Andy D. Pimentel

Direct-mapped cache addressing
31... 19 18... 5 4... 0
13-bit tag 0 .. 214-1 0 .. 32

tag line address byte

in line
E.g. a 32-bit byte address into a direct-mapped cache of size of 512KBytes and a line size of 32
Bytes (i.e. 16K lines) the address fields above comprise:
5 bits of byte address (0..4) gives the byte offset in the cache line
14 bits of cache line address (5..18) give cache line (16K direct mapped)
the remaining 13 bits (19..31) determine which block from the 8K possible memory blocks
is mapped to the cache line
tags stored in cache line, matched with the address from the processor to check hits

Parallel System Architectures, Andy D. Pimentel

Example 4-byte access in DM cache
31... 19 18... 5 4 2 1 0

Address

Address word
cache hit Data
Address line

Valid tag
0
1
.
line data .
.
.
.
.

16382
16383

Cache-hit logic

Parallel System Architectures, Andy D. Pimentel

8-way set associative cache addressing
31... 19 18... 5 4... 0
16-bit tag 0 .. 211-1 0 .. 32

tag set address byte

in line
E.g. a 32-bit byte address into an 8-way set associative cache of size of 512KBytes and a line size
of 32 Bytes (i.e. 16K lines):
5 bits of address (0..4) gives the byte offset in the cache line
11 bits (5..15) address 2K sets of 8 cache lines (16K lines total)
16 bit tag (16..31) determines which block from the 64K possible memory blocks is
mapped to one of the cache lines in that set
stored as tag in the cache line and matched with the address from the processor

Parallel System Architectures, Andy D. Pimentel

4-byte access in 8-way set associative cache
31... 16 15... 5 4 2 1 0

Address

Data
Address set
cache hit
Valid tag data
0
1
.
.
.
.
.
.
.
2046
2047

set of lines

Parallel System Architectures, Andy D. Pimentel

Line sets in associative caches
Tag from address Data put on bus by matching line

=
=
= hit

=
+
=
=
=
=

8 tags compared in parallel

Parallel System Architectures, Andy D. Pimentel

Fully associative cache addressing
31... 5 4... 0
27-bit tag 0 .. 32

byte
tag in line

E.g. a 32-bit byte address into an fully associative cache of size of 16KBytes and a line
size of 32 Bytes (i.e. 512 lines - fully associative means each line requires a comparator):
5 bits of address (0..4) gives the byte offset in the cache line
27 bits (5..31) determine which block from the 128M possible memory blocks is
mapped to one of the cache line in that set
stored as tag in the cache line and matched with the address from the processor

Parallel System Architectures, Andy D. Pimentel

Access to fully associative cache
31... 5 4 0
Tag Address

Data put on bus by matching line

0
. =
. …
.
. =
.
. = hit
.
. =
+
. =
.
510
=
511 =
=

Parallel System Architectures, Andy D. Pimentel

Virtual Memory
cf. Henessy & Patterson, App. C4
Virtual Memory
It is easier for the programmer to have a large virtual memory than to program
explicit I/O due to memory limitations
Also in multi-programming memory is shared between many programs, some
suspended or inactive for a while
only a small fraction of virtual memory is used at any
one time in a multi-programming environment
Virtual memory uses main memory to store only part of the larger virtual memory
space and the remainder is held on external storage, e.g. discs
The unit exchanged between memory and disc is called a page
Parallel System Architectures, Andy D. Pimentel
Virtual memory mapping

Pages
in main
Pages memory
Virtual stored
address space externally
Parallel System Architectures, Andy D. Pimentel
VM Terminology
The address produced by the processor is called a virtual address
This gets translated by a MMU via a page table  
into a physical address (PT hit) or page fault (PT miss)
The page table is in main memory but has a special cache called a TLB
(translation look-aside buffer)
Page faults usually managed by a software trap to an operating
system
This mapping process is called address translation
Parallel System Architectures, Andy D. Pimentel
VM Address translation
31... 12 11 ... 0

Virtual page number Page offset

Translation

29... 12 11 ... 0

Physical page number Page offset

This shows address mapping from a 4 GiB virtual address space onto in
a 1 GiB physical address space using 4KiB memory pages
The translation is performed using a 1M entries (3MiB) table in
memory, addressed by the virtual page number
Parallel System Architectures, Andy D. Pimentel
Virtual memory issues
Need flexibility in page placement to avoid costly page misses
Unlike cache mapping, VM mapping is implemented as a table in main memory - allows arbitrary
mapping
indexed by virtual address
that yields the physical address
Page misses are handled by software and incur a large penalty
Pages must be sufficiently large to amortize this large overhead  
and to minimize the mapping table size
4 to 64KByte is a typical page size 
with variable size pages can be as large as 1MByte

Parallel System Architectures, Andy D. Pimentel

Replacement, processes and protection

Sophisticated algorithms for placement can be coded in software

pages known to be often required can be locked down
Each process has its own virtual address space and page translation
this means programs can not interfere (read/write) the memory of any other
To achieve protection, user code must be prevented from altering the page tables
This is normally achieved by having different modes of operation  
(eg. user mode vs. kernel mode)
alternatively, using security capabilities on the page table data

Parallel System Architectures, Andy D. Pimentel

Page table
31... 12 11 ... 210

Page table register Virtual page number Page offset

Valid
bit
Physical page number Note: the page table,
the PC and the state of
the registers all
contribute to the state
of a program

Page fault 29... 12 11 ... 210

logic
Physical page number Page offset

Parallel System Architectures, Andy D. Pimentel

Translation Look-aside buffers
Translation Look-aside buffers (TLB) cache the page table in small fast memory
NB: The page table is too large to be held entirely in fast memory
Without the TLB, access to memory would be twice as slow
One access to the page table for address translation
One to the data itself
Address translation and L1 cache access can be performed in one or two processor cycles  
(as long as we get a cache hit)
Big question: which memory space do we cache:  
Virtual or Physical?
Parallel System Architectures, Andy D. Pimentel
Physically addressed caches
Addresses translated by memory management unit (MMU) before
cache lookup
Sequential - even with a TLB and cache hit, 
access can be slow as it requires sequential memory accesses
MMU

Processor Cache Main memory

Parallel System Architectures, Andy D. Pimentel

Virtually addressed caches
Addresses translated by MMU in parallel with cache lookup
Aliasing – is where the same virtual address in different processes maps to the same
location in cache
Context switching therefore requires a full cache invalidation (time expensive)  
or a process identifier in the tag (space expensive)
Aliasing is averted if all processes share the same virtual address space
Processor Cache Main memory

MMU

Parallel System Architectures, Andy D. Pimentel

Virtually-indexed, physically tagged cache
Virtually vs physically addressed cache (cont’d)
niversity
of
msterdam
Cache indexing during address translation
Virtually-indexed, physically tagged cache
Page offset bits in virtual address used for cache index
Cache indexing during translation
Page offset bits in VA used as cache index
Number of sets in cache limited (dependent on page
Number of sets in cache limited (dependent on page
size)!
size)
Solutions: large cache sets or page colouring (OS
Solutions:
support)larger sets or page coloring (OS support) 
Use as cache index

Page offset
identical bits bits
for VA and PA
CSP
Parallel System Architectures, Andy D. Pimentel
Page table size
The example earlier was for 32-bit addresses and yielded a 3MiB table
For a 64-bit architecture and say a 48-bit virtual address and 4KiB pages we get:
48 12 36 39
table size = 2 /2 = 2 entries = 2 bytes = 512GiB!!
and this is replicated for each process (!!)
Solution is to grow page table as required
keep limit and check limit on each access
increase (e.g. double size) on each overflow

Parallel System Architectures, Andy D. Pimentel

Page table size
Address usage may be sparse
Another solution is to use a multi-level page table
as this takes advantage of sparseness
e.g. use very large pages and keep a table of these
within a large page keep a table of smaller pages  
(e.g. 4KiB)

Parallel System Architectures, Andy D. Pimentel

Multi-level page table

Parallel System Architectures, Andy D. Pimentel

Summary
Memory system summary
The memory hierarchy is a critical component of both computer and algorithm design in
determining performance
A major problem is the rapidly increasing on-chip or processor clock rates and the relatively slow
change in memory cycle times
DRAMs are designed for density not speed
Caching works well with regular accesses to memory, but some applications do not possess this
property - in this case we see the performance of the main memory system which may be 10-100
times slower than the processor performance!
New architectures will be needed, as the memory wall gets taller, 
that exploit latency tolerance to avoid memory-limited performance

Parallel System Architectures, Andy D. Pimentel

Cache
No ratings yet
Cache
36 pages
Chapter 03
No ratings yet
Chapter 03
57 pages
Memory Organisation
No ratings yet
Memory Organisation
34 pages
Memory Unit Bindu Agarwalla
No ratings yet
Memory Unit Bindu Agarwalla
62 pages
Understand CPU Caching Concepts
No ratings yet
Understand CPU Caching Concepts
14 pages
Mc9211unit 5 PDF
No ratings yet
Mc9211unit 5 PDF
89 pages
Lec2 Mem
No ratings yet
Lec2 Mem
12 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
dlcoa question bank
No ratings yet
dlcoa question bank
18 pages
Seth 740 Fall13 Module3.5 Main Memory Part1
No ratings yet
Seth 740 Fall13 Module3.5 Main Memory Part1
69 pages
Chapter 5-The Memory System
No ratings yet
Chapter 5-The Memory System
84 pages
L14_ The Memory Hierarchy
No ratings yet
L14_ The Memory Hierarchy
41 pages
L07-MemoryII
No ratings yet
L07-MemoryII
27 pages
Unit-III: Memory: Topics
No ratings yet
Unit-III: Memory: Topics
54 pages
Lecture 7 Main Memory
No ratings yet
Lecture 7 Main Memory
36 pages
Cache Memory Unit-4
No ratings yet
Cache Memory Unit-4
32 pages
Cache Design
No ratings yet
Cache Design
59 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
Chap 7
No ratings yet
Chap 7
98 pages
Chapter 3
No ratings yet
Chapter 3
16 pages
chapter 5
No ratings yet
chapter 5
60 pages
Basic Components of A Parallel (Or Serial) Computer: Processors
No ratings yet
Basic Components of A Parallel (Or Serial) Computer: Processors
14 pages
Unit 5 COA
No ratings yet
Unit 5 COA
34 pages
Unit 5 Notes (1)
No ratings yet
Unit 5 Notes (1)
26 pages
N06_MemoryOrganization
No ratings yet
N06_MemoryOrganization
12 pages
Cpu Concepts-2
No ratings yet
Cpu Concepts-2
52 pages
Module2 Part2 Memory
No ratings yet
Module2 Part2 Memory
113 pages
Embedded Systems Unit 8 Notes
No ratings yet
Embedded Systems Unit 8 Notes
17 pages
Comp Arch Chapter 6
No ratings yet
Comp Arch Chapter 6
93 pages
Components of The Memory System
No ratings yet
Components of The Memory System
11 pages
Introduction To Multicore Programming: University of Western Ontario, London, Ontario (Canada)
No ratings yet
Introduction To Multicore Programming: University of Western Ontario, London, Ontario (Canada)
60 pages
Shashank Aca Assignment
No ratings yet
Shashank Aca Assignment
21 pages
Slides04 05
No ratings yet
Slides04 05
25 pages
Conspect of Lecture 7
No ratings yet
Conspect of Lecture 7
13 pages
Cache&Virtual Memory
No ratings yet
Cache&Virtual Memory
50 pages
Computer Organization: Hierarchical Speed
No ratings yet
Computer Organization: Hierarchical Speed
25 pages
Lecture 3 (Memory Hierarchy and Caches)
No ratings yet
Lecture 3 (Memory Hierarchy and Caches)
88 pages
ACA - Memory
No ratings yet
ACA - Memory
26 pages
Memory System
No ratings yet
Memory System
41 pages
CH04
No ratings yet
CH04
46 pages
Pentium Cache
No ratings yet
Pentium Cache
5 pages
Com Achitecture Exams-1
No ratings yet
Com Achitecture Exams-1
6 pages
Chapter03 Assembly Part1
No ratings yet
Chapter03 Assembly Part1
8 pages
06 Memory
No ratings yet
06 Memory
37 pages
CSA Cache
No ratings yet
CSA Cache
62 pages
AC14L08 Memory Hierarchy
No ratings yet
AC14L08 Memory Hierarchy
20 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
38 pages
The Memory System: Chapter Objectives
No ratings yet
The Memory System: Chapter Objectives
70 pages
COUPDATED
No ratings yet
COUPDATED
27 pages
Unit IV Memory System Notes
No ratings yet
Unit IV Memory System Notes
13 pages
co unit 4
No ratings yet
co unit 4
42 pages
Chap 5 Memory System p1
No ratings yet
Chap 5 Memory System p1
30 pages
UNIT 4_ee4604ab899872c2fb7aa8d7693177f7
No ratings yet
UNIT 4_ee4604ab899872c2fb7aa8d7693177f7
36 pages
CPU Cache
No ratings yet
CPU Cache
19 pages
Mapping Functions
No ratings yet
Mapping Functions
23 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
Lec 22
No ratings yet
Lec 22
14 pages
Cache AN3544
No ratings yet
Cache AN3544
12 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Memory Basics Explained
From Everand
Memory Basics Explained
Alisa Turing
No ratings yet
Parallel System Architectures: System and Network Engineering Lab, Uva E-Mail: A.D.Pimentel@Uva - NL
No ratings yet
Parallel System Architectures: System and Network Engineering Lab, Uva E-Mail: A.D.Pimentel@Uva - NL
6 pages
Chapter 01: Introduction: Technological Trends, Measuring and Improving Performance
No ratings yet
Chapter 01: Introduction: Technological Trends, Measuring and Improving Performance
31 pages
The Performance Equation
No ratings yet
The Performance Equation
4 pages
This Unit: - Metrics
No ratings yet
This Unit: - Metrics
7 pages
Present Perfect and Simple Past
No ratings yet
Present Perfect and Simple Past
22 pages
Buy D-Link 0.51 MM (24 AWG) 8 Meter Online - GeM
No ratings yet
Buy D-Link 0.51 MM (24 AWG) 8 Meter Online - GeM
3 pages
Philosophy Bullshit and Peer Review
No ratings yet
Philosophy Bullshit and Peer Review
74 pages
Thomas Thanga Durai Jonus Gladwin Raj
No ratings yet
Thomas Thanga Durai Jonus Gladwin Raj
1 page
Installation Guide
No ratings yet
Installation Guide
89 pages
Process Plant Layout PDF
0% (2)
Process Plant Layout PDF
22 pages
OMPC002 Final Edition YnR
100% (1)
OMPC002 Final Edition YnR
18 pages
NCP (Diarrhea)
83% (30)
NCP (Diarrhea)
2 pages
The Ecumenical Councils and Their Chief Doctrines
No ratings yet
The Ecumenical Councils and Their Chief Doctrines
4 pages
SampleTron 2 User Manual
100% (1)
SampleTron 2 User Manual
35 pages
Kovesi A Câștigat La CEDO Împotriva României: Decizia CCR de Revocare A Ei de La Șefia DNA, Abuzivă
No ratings yet
Kovesi A Câștigat La CEDO Împotriva României: Decizia CCR de Revocare A Ei de La Șefia DNA, Abuzivă
62 pages
Employment Offices - 2024
No ratings yet
Employment Offices - 2024
2 pages
Manajemen Limbah Industri Di Indonesia
No ratings yet
Manajemen Limbah Industri Di Indonesia
26 pages
Testi I Pare Ylberi
No ratings yet
Testi I Pare Ylberi
6 pages
22 UPPER OTTAWA Fall2022
No ratings yet
22 UPPER OTTAWA Fall2022
2 pages
Air Intelligence Report, V1N22
No ratings yet
Air Intelligence Report, V1N22
9 pages
Case Digest Art 12-15
No ratings yet
Case Digest Art 12-15
23 pages
FREE WORKSHEET - Articles
No ratings yet
FREE WORKSHEET - Articles
3 pages
Naukri AnirudhaDutta (5y 0m)
No ratings yet
Naukri AnirudhaDutta (5y 0m)
2 pages
DILG Memo Circular 2014130 Bc08a262c0
No ratings yet
DILG Memo Circular 2014130 Bc08a262c0
2 pages
Spring Build
100% (4)
Spring Build
47 pages
Scholastic Mega-Fun Card-Game Math Grade 1-3
No ratings yet
Scholastic Mega-Fun Card-Game Math Grade 1-3
49 pages
Marasmus
No ratings yet
Marasmus
3 pages
Security Incidents and Event Management With Qradar (Foundation)
No ratings yet
Security Incidents and Event Management With Qradar (Foundation)
4 pages
47th IChO-Theoretical Official English Version For Students Final
No ratings yet
47th IChO-Theoretical Official English Version For Students Final
41 pages
Oral Thrush
No ratings yet
Oral Thrush
2 pages
Subject: English Lesson Name: - Class: Vii Day/ Date
No ratings yet
Subject: English Lesson Name: - Class: Vii Day/ Date
2 pages
CSP Final
No ratings yet
CSP Final
31 pages
DLL Math 7 Quarter 2 Week 2
No ratings yet
DLL Math 7 Quarter 2 Week 2
13 pages
Cornell CS578: Introduction
No ratings yet
Cornell CS578: Introduction
11 pages