cs668 Lec1 ParallelArch
cs668 Lec1 ParallelArch
registers
1st level 1st level
Instructions Data
2nd Level unified
(Instructions & Data)
IBM Dual Core
An 8-input,
8-output Omega
network of 2x2
switches
Shared Memory
• One or more memories
• Global address space (all system memory visible to all processors)
• Transfer of data between processors is usually implicit, just read (write) to
(from) a given address (OpenMP)
• Cache-coherency protocol to maintain consistency between processors.
Interconnection Network
Interconnection Network
Interconnection Network
Interconnection Network
Network Network Network
Interface Interface Interface
CPU CPU CPU
Memory
Memory
Memory
Single Multiple
SISD SIMD
Uniprocessor Procesor arrays
Single
Instruction stream
MISD MIMD
Systolic array Multiprocessors
Multiple
Multicomputers
Top 500 List
• Some highlights from http://www.top500.org/
– On the new list, the IBM BlueGene/L system, installed at DOE’s
Lawrence Livermore National Laboratory (LLNL), retains the No. 1 spot
with a Linpack performance of 280.6 teraflops (trillions of calculations
per second, or Tflop/s).
– The new No. 2 systems is Sandia National Laboratories’ Cray Red
Storm supercomputer, only the second system ever to be recorded to
exceed the 100 Tflops/s mark with 101.4 Tflops/s. The initial Red Storm
system was ranked No. 9 in the last listing.
– Slipping to No. 3 from No. 2 last June is the IBM eServer Blue Gene
Solution system, installed at IBM’s Thomas Watson Research Center
with 91.20 Tflops/s Linpack performance.
– The new No. 5 is the largest system in Europe, an IBM JS21 cluster
installed at the Barcelona Supercomputing Center. The system reached
62.63 Tflops/s.
Linux/Beowulf cluster basics
• Goal
– Get super computing processing power at the
cost of a few PCs
• How
– Commodity components: PCs and networks
– Free software with open source
CPU nodes
• A typical configuration
– Dual socket
– Dual core AMD or Intel nodes
– 4 GB memory per node
Network Options