0% found this document useful (0 votes)
18 views33 pages

18 VM Details

Uploaded by

techinaldk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views33 pages

18 VM Details

Uploaded by

techinaldk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Carnegie Mellon

Virtual Memory: Details


15-213/15-513: Introduction to Computer Systems
18th Lecture, July 13, 2022

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 1


Carnegie Mellon

Review: Virtual Addressing


 Each process has its own virtual address space
 Page tables map virtual to physical addresses
 Physical memory can be shared among processes

0 Address 0
Physical
Virtual
Address VP 1
translation Address
Space for VP 2 PP 2 Space
Process 1: ... (DRAM)
N-1
(e.g., read-only
PP 6 library code)
0
Virtual PP 8
Address VP 1
Space for ...
Process 2: ...
VP k
N-1 M-1
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 2
Carnegie Mellon

Today
 Multi-level page tables
 Translation lookaside buffers
 Activity 1
 Concrete examples of virtual memory systems
▪ “Simple memory system” from CSAPP 9.6.4
▪ Intel Core i7
 Activity 2
 Nifty things virtual memory makes possible
▪ Paging/swapping (disk as extra RAM)
▪ Memory-mapped files (RAM as cache for disk)
▪ Copy-on-write sharing
 Activity 3

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 3


Carnegie Mellon

The problem (with one-level page tables)

One 64-bit array element


for each 4096-byte page
248 byte 48
= 2 ൗ4096 ⋅ 8 bytes
address
space = 239 bytes
= 512 gigabytes
for one page table

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 4


Carnegie Mellon

A Two-Level Page Table Hierarchy


32-bit address space, 4-byte PTEs, 4096-byte pages
Virtual
memory
PTE 0 (null) PTE 0
VP 0…1023 1024 unallocated pages
PTE 1 ...
(unmapped)
PTE 2 PTE 1023
VP 1024
...
PTE 0
VP 2047 2048 allocated pages
...
VP 2048 for code and data
1020 more
PTE 1023
null PTEs ...
VP 3072
1023
null PTEs
VP 3073… 1021 · 1024 + 1023
PTE 1023 PTE 1023 1048575
(unmapped)
unallocated pages

Level 1 Level 2
1 allocated page
page table page tables VP 1048576
for the stack

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 5


Carnegie Mellon

Translating with a k-level Page Table

Page table
base register
(PTBR)
VIRTUAL ADDRESS
n-1 p-1 0
VPN 1 VPN 2 ... VPN k VPO

the Level 1 a Level 2 a Level k


page table page table page table
... ...

PP
N

m-1 p-1 0
PPN PPO
PHYSICAL ADDRESS

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 6


Carnegie Mellon

The problem (with k-level page tables)

Page table
base register
(PTBR)
VIRTUAL ADDRESS
n-1 p-1 0
VPN 1 VPN 2 ... VPN k VPO

the Level 1 a Level 2 a Level k


page table page table page table
... ...

PP
N

Cache Cache Cache Cache


miss! miss! miss! miss!
Cache
miss!
PPN PPO

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 7


Carnegie Mellon

Speeding up Translation with a TLB


⬛ Page table entries (PTEs) are cached
like any other memory word
▪ PTEs may be evicted by other data references
▪ PTE hit still costs cache delay
⬛ Solution: Translation Lookaside Buffer (TLB)
▪ Dedicated cache for page table entries
▪ TLB hit = page table not consulted
▪ Can be fairly small: one TLB entry covers 4k or more

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 8


Carnegie Mellon

Accessing the TLB


⬛ MMU uses the VPN portion of the virtual address to
access the TLB: T = 2t sets
VPN
TLBT matches tag
of line within set n-1 p+t p+t-1 p p-1 0
TLB tag (TLBT) TLB index (TLBI) VPO

Set 0 v tag PTE v tag PTE


TLBI selects the set
Set 1 v tag PTE v tag PTE

Set T-1 v tag PTE v tag PTE

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 9


Carnegie Mellon

TLB Hit

CPU Chip
TLB
2 PTE
VPN 3

1
VA PA
CPU MMU
4 Cache/
Memory

Data
5

A TLB hit eliminates memory accesses to the page table

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 10


Carnegie Mellon

TLB Miss

CPU Chip
TLB
4
2 PTE
VPN

1 3
VA PTEA
CPU MMU
Cache/
PA Memory
5

Data
6

A TLB miss incurs additional memory accesses (PTE lookup)


Fortunately, TLB misses are rare. Why?
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 11
Carnegie Mellon

Today
 Multi-level page tables
 Translation lookaside buffers
 Activity 1
 Concrete examples of virtual memory systems
▪ “Simple memory system” from CSAPP 9.6.4
▪ Intel Core i7
 Activity 2
 Nifty things virtual memory makes possible
▪ Paging/swapping (disk as extra RAM)
▪ Memory-mapped files (RAM as cache for disk)
▪ Copy-on-write sharing
 Activity 3

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 12


Carnegie Mellon

Simple Memory System Example


 Addressing
▪ 14-bit virtual addresses
▪ 12-bit physical address
▪ Page size = 64 bytes
13 12 11 10 9 8 7 6 5 4 3 2 1 0

VPN VPO
Virtual Page Number Virtual Page Offset

11 10 9 8 7 6 5 4 3 2 1 0

PPN PPO
Physical Page Number Physical Page Offset

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 13


Carnegie Mellon

Simple Memory System TLB


 16 entries
 4-way associative
TLBT TLBI
13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 1 0 1
VPN VPO
VPN = 0b1101 = 0x0D

Translation Lookaside Buffer (TLB)


Set Tag PPN Valid Tag PPN Valid Tag PPN Valid Tag PPN Valid
0 03 – 0 09 0D 1 00 – 0 07 02 1
1 03 2D 1 02 – 0 04 – 0 0A – 0
2 02 – 0 08 – 0 06 – 0 03 – 0
3 07 – 0 03 0D 1 0A 34 1 02 – 0

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 14


Carnegie Mellon

Simple Memory System Page Table


 Only showing the first 16 entries (out of 256)

VPN PPN Valid VPN PPN Valid


00 28 1 08 13 1
01 – 0 09 17 1
02 33 1 0A 09 1
03 02 1 0B – 0
04 – 0 0C – 0
05 16 1 0D 2D 1 0x0D → 0x2D
06 – 0 0E 11 1
07 – 0 0F 0D 1

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 15


Carnegie Mellon

Simple Memory System Cache


 16 lines, 4-byte cache line size
 Physically addressed
V[0b00001101101001] = V[0x369]
 Direct mapped P[0b101101101001] = P[0xB69] = 0x15
CT CI CO
11 10 9 8 7 6 5 4 3 2 1 0

1 0 1 1 0 1 1 0 1 0 0 1
PPN PPO
Idx Tag Valid B0 B1 B2 B3 Idx Tag Valid B0 B1 B2 B3
0 19 1 99 11 23 11 8 24 1 3A 00 51 89
1 15 0 – – – – 9 2D 0 – – – –
2 1B 1 00 02 04 08 A 2D 1 93 15 DA 3B
3 36 0 – – – – B 0B 0 – – – –
4 32 1 43 6D 8F 09 C 12 0 – – – –
5 0D 1 36 72 F0 1D D 16 1 04 96 34 15
6 31 0 – – – – E 13 1 83 77 1B D3
7 16 1 11 C2 DF 03 F 14 0 – – – –

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 16


Carnegie Mellon

Address Translation Example

Virtual Address: 0x03D4


TLBT TLBI
13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 0 1 1 1 1 0 1 0 1 0 0
VPN VPO
0x0F
VPN ___ 0x3 TLBT ____
TLBI ___ 0x03 Y
TLB Hit? __ N
Page Fault? __ 0x0D
PPN: ____

Set Tag PPN Valid Tag PPN Valid Tag PPN Valid Tag PPN Valid
TLB
0 03 – 0 09 0D 1 00 – 0 07 02 1
1 03 2D 1 02 – 0 04 – 0 0A – 0
2 02 – 0 08 – 0 06 – 0 03 – 0
3 07 – 0 03 0D 1 0A 34 1 02 – 0

Physical Address
11 10 9 8 7 6 5 4 3 2 1 0

0 0 1 1 0 1 0 1 0 1 0 0
PPN PPO

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 17


Carnegie Mellon

Intel Core i7 Memory System

Processor package
Core x4
Instruction MMU
Registers
fetch (addr translation)

L1 d-cache L1 i-cache L1 d-TLB L1 i-TLB


32 KB, 8-way 32 KB, 8-way 64 entries, 4-way 128 entries, 4-way

L2 unified cache L2 unified TLB


256 KB, 8-way 512 entries, 4-way
To other
QuickPath interconnect cores
4 links @ 25.6 GB/s each To I/O
bridge
L3 unified cache DDR3 Memory controller
8 MB, 16-way 3 x 64 bit @ 10.66 GB/s
(shared by all cores) 32 GB/s total (shared by all cores)

Main memory

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 18


Carnegie Mellon

End-to-end Core i7 Address Translation

32/64
CPU L2, L3, and
Result
Virtual address (VA) main memory
36 12
VPN VPO L1 L1
hit miss
32 4
TLBT TLBI
L1 d-cache
TLB (64 sets, 8 lines/set)
hit
TLB ...

...
miss
L1 TLB (16 sets, 4 entries/set)
9 9 9 9 40 12 40 6 6
VPN1 VPN2 VPN3 VPN4
PPN PPO CT CI CO
Physical
CR3 address
PTE PTE PTE PTE (PA)

Page tables

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 19


Carnegie Mellon

Core i7 Level 1-3 Page Table Entries

63 62 52 51 12 11 9 8 7 6 5 4 3 2 1 0
XD Unused Page table physical base address Unused G PS A CD WT U/S R/W P=1

Available for OS (page table location on disk) P=0

Each entry references a 4K child page table. Significant fields:


P: Child page table present in physical memory (1) or not (0).
R/W: Read-only or read-write access access permission for all reachable pages.
U/S: user or supervisor (kernel) mode access permission for all reachable pages.
WT: Write-through or write-back cache policy for the child page table.
A: Reference bit (set by MMU on reads and writes, cleared by software).
PS: Page size either 4 KB or 4 MB (defined for Level 1 PTEs only).
Page table physical base address: 40 most significant bits of physical page table
address (forces page tables to be 4KB aligned)
XD: Disable or enable instruction fetches from all pages reachable from this
PTE.

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 20


Carnegie Mellon

Core i7 Level 4 Page Table Entries

63 62 52 51 12 11 9 8 7 6 5 4 3 2 1 0
XD Unused Page physical base address Unused G D A CD WT U/S R/W P=1

Available for OS (page location on disk) P=0

Each entry references a 4K child page. Significant fields:


P: Child page is present in memory (1) or not (0)
R/W: Read-only or read-write access permission for child page
U/S: User or supervisor mode access
WT: Write-through or write-back cache policy for this page
A: Reference bit (set by MMU on reads and writes, cleared by software)
D: Dirty bit (set by MMU on writes, cleared by software)
G: Global page (don’t evict from TLB on task switch)
Page physical base address: 40 most significant bits of physical page address
(forces pages to be 4KB aligned)
XD: Disable or enable instruction fetches from this page.

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 21


Carnegie Mellon

Core i7 Page Table Translation

9 9 9
VPN 1 VPN 2 VPN 3 VPN 4
9 12
VPO
Virtual
address
L1 PT L2 PT L3 PT L4 PT
Page global Page upper Page middle Page
40 directory 40 directory 40 directory 40 table
CR3 / / / /
Physical
address Offset into
of L1 PT / 12 physical and
L1 PTE L2 PTE L3 PTE L4 PTE virtual page
Physical
address
512 GB 1 GB 2 MB 4 KB of page
region region region region
per entry per entry per entry per entry

40
/

40 12
PPN PPO
Physical
address

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 22


Carnegie Mellon

Cute Trick for Speeding Up L1 Access


CT Tag Check

40 6 6
Physical CT CI CO
address
(PA) PPN PPO

Address No
Translation Change
Virtual
CI
address
(VA)
VPN VPO L1
36 12 Cache
 Observation
▪ Bits that determine CI identical in virtual and physical address
▪ Can index into cache while address translation taking place
▪ Generally we hit in TLB, so PPN bits (CT bits) available quickly
▪ “Virtually indexed, physically tagged”
▪ Cache carefully sized to make this possible

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 23


Carnegie Mellon

Today
 Multi-level page tables
 Translation lookaside buffers
 Activity 1
 Concrete examples of virtual memory systems
▪ “Simple memory system” from CSAPP 9.6.4
▪ Intel Core i7
 Activity 2
 Nifty things virtual memory makes possible
▪ Paging/swapping (disk as extra RAM)
▪ Memory-mapped files (RAM as cache for disk)
▪ Copy-on-write sharing
 Activity 3

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 24


Carnegie Mellon

Paging (aka Swapping)


 Use (part of) disk as additional working memory
 Adds another layer to the memory hierarchy, but…
▪ “Main memory” is 10–1000x slower than the caches
▪ Disk is 10,000x slower than main memory
▪ Enormous miss penalty drives design

 Consequences
▪ Large page (block) size: 4KB and bigger
▪ Always write-back and fully associative
▪ Managed entirely in software
▪ Plenty of time to execute complex replacement algorithms

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 25


Carnegie Mellon

Locality to the Rescue Again!


 Paging is terribly inefficient
 Only works because of locality
 At any point in time, programs tend to access a set of
active virtual pages called the working set
▪ Programs with good temporal locality will have small working sets

 If working set size < main memory size


▪ Good performance after compulsory misses
 If working set size > main memory size
▪ Thrashing: Performance meltdown, computer spends most of its
time copying pages in and out of RAM
▪ In the worst case, no forward progress at all (livelock)

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 26


Carnegie Mellon

Memory-Mapped Files
 Paging = every page of a program’s physical RAM is
backed by some page of disk*
 Normally, those pages belong to swap space
 But what if some pages were backed by … files?

* This is how it used to work 20 years ago.


Nowadays, not always true.

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 27


Carnegie Mellon

Memory-Mapped Files

Process Physical
virtual memory memory

Swap space

File on disk

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 28


Carnegie Mellon

Memory-Mapped Files

Process 2 Process 1 Physical


virtual memory virtual memory memory

Swap space

File on disk

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 29


Carnegie Mellon

Copy-on-write sharing
 fork creates a new Parent Physical
virtual memory memory
process by copying the
entire address space
of the parent process Swap space
▪ That sounds slow
▪ It is slow File on disk

 Clever trick:
▪ Just duplicate the page tables
▪ Mark everything read only
▪ Copy only on write faults
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 30
Carnegie Mellon

Copy-on-write sharing
Child Parent Physical
virtual memory virtual memory memory

Swap space

File on disk

 Clever trick:
▪ Just duplicate the page tables
▪ Mark everything read only
▪ Copy only on write faults
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 31
Carnegie Mellon

Copy-on-write sharing
Child Parent Physical
virtual memory virtual memory memory

Child Swap space


wrote to
this page
File on disk

 Clever trick:
▪ Just duplicate the page tables
▪ Mark everything read only
▪ Copy only on write faults
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 32
Carnegie Mellon

Today
 Multi-level page tables
 Translation lookaside buffers
 Activity 1
 Concrete examples of virtual memory systems
▪ “Simple memory system” from CSAPP 9.6.4
▪ Intel Core i7
 Activity 2
 Nifty things virtual memory makes possible
▪ Paging/swapping (disk as extra RAM)
▪ Memory-mapped files (RAM as cache for disk)
▪ Copy-on-write sharing
 Activity 3

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 33

You might also like