Lecture 5
Lecture 5
Architecture
Lecture five:
Multiprocessor architectures
2024
1
Contents
1. Background
2. Multiprocessor architectures
3. Cache coherence problem
4. Cache coherence solution
2
Background
• Multiprocessors were use in extremely high-performance
computing (servers and supercomputers) in the 90s
• Due to limits in ILP and power consumption, multiprocessors were
required in desktop computing (circa 2004)
• The highest clock in 2004 was 3.8 GHz (Pentium 4), and currently it
is 5.7 GHz (14900KS) [50% in 20 years]
3
Background
• Multiprocessor is defined as any system or systems with more than
one processor, be it a multiple chips across systems or a single chip
design (multicore).
• single task – parallel processing
• many tasks – multiprogramming
• Multiprocessor characteristics are:
• Tightly coupled processors controlled by one OS
• Shared memory and I/O 4
Background
Advantages:
• Replication instead of unique designs at every iteration
• Efficiency modes of operation for better battery life and heat
dissipation
Disadvantage: memory coherency
5
Multiprocessor architectures
Symmetric/Centralised shared memory (SMP):
• <32 cores
• uniform shared memory access (UMA)
• Some have non-uniform cache access (NUCA)
• Equal access to I/O and main memory
• 30 – 50 clock cycle to send a message
Distributed shared memory (DSM):
• >32 cores
• Non-uniform shared memory access (NUMA)
• Unequal access to I/O and main memory
• 100 – 300 clock cycles to send a message
7
Multiprocessor architectures
Symmetric/Centralised shared memory (SMP):
• <32 cores
• uniform shared memory access (UMA)
• Some have non-uniform cache access (NUCA)
• Equal access to I/O and main memory
• 30 – 50 clock cycle to send a message
Distributed shared memory (DSM):
• >32 cores
• Non-uniform shared memory access (NUMA)
• Unequal access to I/O and main memory
• 100 – 300 clock cycles to send a message
9
Cache coherence - problem
• When multiple processors share access to a memory system, who
decides when which processor should read or write data? Nobody
• The result? Inconsistency and incoherence
• Inconsistency: values are read in a different order to how they
were written
• Incoherence: values may not be what we expect
10
Cache coherence - problem
11
Cache coherence - problem
Going a step further, what if B writes to X after A, and then processor
C reads from X, only to find the value written by A. Now A thinks X
holds its value, but technically it should have the value of B. Yet C has
read the value as A’s!
12
Cache coherence – solution (in theory)
• A read of X by Pn that follows a write to X by Pm, with no writes to X by any
other processor in that time, always returns the value written by Pm
• A read of X by Pn that follows a write to X by Pm always returns the written
value if the two operations are sufficiently separated in time and no other
writes to X occur between the two operations.
• Writes to the same location are serialised. This implies that writes to any
location by any two processors are seen in the same order by all processors.
For example, if the values 1 and then 2 are written to location X, then no
processor should read a value of 2 now but a value of 1 later.
13
Cache coherence – solution (in theory)
Directory-based protocol:
• keeps the sharing status of block in one location (directory)
• SMP can have the directory at a serialisation point like the shared cache
• DSM shouldn’t have a central directory; it would cause a bottleneck. Each
processor has part of a “distributed” directory
Snooping protocol:
• Every cache tracks the status of each block by listening in (snoop) on the
shared bus (SMP) to determine if they have a copy of the affected blocks.
• Cannot be used in a DSM since there is no shared bus 14
SMP DSM
15