Multiprocessors
Multiprocessors
Outline
• MP Motivation
• SISD v. SIMD v. MIMD
• Centralized vs. Distributed Memory
• Challenges to Parallel Programming
• Consistency, Coherency, Write Serialization
• Write Invalidate Protocol
• Example
• Conclusion
CSCI 330 – Computer Architecture
1000
Performance (vs. VAX-11/780)
52%/year
100
10
25%/year
1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
• VAX : 25%/year 1978 to 1986
• RISC + x86: 52%/year 1986 to 2002
• RISC + x86: ??%/year 2002 to present
Other Factors Multiprocessors
• Growth in data-intensive applications
– Data bases, file servers, …
• Growing interest in high-end servers as cloud
computing and SaaS, server performance.
• Increasing desktop performance less important
– Outside of graphics
• Improved understanding in how to use multiprocessors
effectively
– Especially server where significant natural TLP
• Advantage of leveraging design investment by
replication
– Rather than unique design
Flynn’s Taxonomy M.J. Flynn, "Very High-Speed Computers",
Proc. of the IEEE, V 54, 1900-1909, Dec. 1966.
$ $ $ $
Mem Mem
Inter
connection network
Inter
connection network
Memory
Distributed
Memory
Centralized Memory Multiprocessor
• Also called symmetric multiprocessors (SMPs)
because single main memory has a symmetric
relationship to all processors
• Large caches single memory can satisfy
memory demands of small number of processors
• Can scale to a few dozen processors by using a
switch and by using many memory banks
• Although scaling beyond that is technically
conceivable, it becomes less attractive as the
number of processors sharing centralized memory
increases
Distributed Memory Multiprocessor
• Pro: Cost-effective way to scale memory
bandwidth
• If most accesses are to local memory
u :5 u :5 u = 7
I/O devices
1
2
u :5
Memory
Conceptual
Picture Mem
Multiprocessor Cache Coherence
Intuitive Memory Model
P
L1
Reading an address
100:67
should return the last
L2
100:35
value written to that
address
Memory – Easy in uniprocessors,
except for I/O
Disk 100:34
Cache-memory
I/O devices transaction
Mem
u :5 u :5 u = 7
I/O devices
1
2
u :5 u=7
Memory
P1: R R R R R W
P2: R R R R R R
Summary
And in Conclusion …
• “End” of uniprocessors speedup => Multiprocessors
• Parallelism challenges: % parallalizable, long latency to
remote memory
• Centralized vs. distributed memory
– Small MP vs. lower latency, larger BW for Larger MP
• Message Passing vs. Shared Address
– Uniform access time vs. Non-uniform access time
• Snooping cache over shared medium for smaller MP
by invalidating other cached copies on write
• Sharing cached data Coherence (values returned
by a read), Consistency (when a written value will be
returned by a read)
• Shared medium serializes writes
Write consistency
Next Time…
More Multiprocessors