Memory Technology & Hierarchy: Caching and Virtual Memory Parallel System Architectures
Memory Technology & Hierarchy: Caching and Virtual Memory Parallel System Architectures
Memory technology
& Hierarchy
Caching and Virtual Memory
For the memory address 386, 32-byte cache lines and an 8 line cache:
<block addr> = floor(<mem addr> / <cache line size>) = floor(386 / 32) = 12
Direct mapped: line = <block addr> mod <nr. of lines> = 12 mod 8 = 4
2-way set associative: <nr. of sets> = <nr. of lines> / <set associativity>
set = <block addr> mod <nr. of sets> = 12 mod 8/2 = 0
Fully associative: one set of 8 lines, so anywhere in cache
Parallel System Architectures, Andy D. Pimentel
Direct mapped caches
Cache line number 000 001 010 011 100 101 110 111
...00001 ...00101 ...01001 ...01101 ...10001 ...10101 ...11001 ...11101 ...00001 ...00101
Memory address
Address
Address word
cache hit Data
Address line
Valid tag
0
1
.
line data .
.
.
.
.
16382
16383
Cache-hit logic
Address
Data
Address set
cache hit
Valid tag data
0
1
.
.
.
.
.
.
.
2046
2047
set of lines
=
=
= hit
=
+
=
=
=
=
byte
tag in line
E.g. a 32-bit byte address into an fully associative cache of size of 16KBytes and a line
size of 32 Bytes (i.e. 512 lines - fully associative means each line requires a comparator):
5 bits of address (0..4) gives the byte offset in the cache line
27 bits (5..31) determine which block from the 128M possible memory blocks is
mapped to one of the cache line in that set
stored as tag in the cache line and matched with the address from the processor
0
. =
. …
.
. =
.
. = hit
.
. =
+
. =
.
510
=
511 =
=
Pages
in main
Pages memory
Virtual stored
address space externally
Parallel System Architectures, Andy D. Pimentel
VM Terminology
The address produced by the processor is called a virtual address
This gets translated by a MMU via a page table
into a physical address (PT hit) or page fault (PT miss)
The page table is in main memory but has a special cache called a TLB
(translation look-aside buffer)
Page faults usually managed by a software trap to an operating
system
This mapping process is called address translation
Parallel System Architectures, Andy D. Pimentel
VM Address translation
31... 12 11 ... 0
Translation
29... 12 11 ... 0
This shows address mapping from a 4 GiB virtual address space onto in
a 1 GiB physical address space using 4KiB memory pages
The translation is performed using a 1M entries (3MiB) table in
memory, addressed by the virtual page number
Parallel System Architectures, Andy D. Pimentel
Virtual memory issues
Need flexibility in page placement to avoid costly page misses
Unlike cache mapping, VM mapping is implemented as a table in main memory - allows arbitrary
mapping
indexed by virtual address
that yields the physical address
Page misses are handled by software and incur a large penalty
Pages must be sufficiently large to amortize this large overhead
and to minimize the mapping table size
4 to 64KByte is a typical page size
with variable size pages can be as large as 1MByte
Valid
bit
Physical page number Note: the page table,
the PC and the state of
the registers all
contribute to the state
of a program
MMU
VA
Page offset
identical bits bits
for VA and PA
CSP
Parallel System Architectures, Andy D. Pimentel
Page table size
The example earlier was for 32-bit addresses and yielded a 3MiB table
For a 64-bit architecture and say a 48-bit virtual address and 4KiB pages we get:
48 12 36 39
table size = 2 /2 = 2 entries = 2 bytes = 512GiB!!
and this is replicated for each process (!!)
Solution is to grow page table as required
keep limit and check limit on each access
increase (e.g. double size) on each overflow