5 4 Parallel
5 4 Parallel
Computer Organizations
PARALLEL
Program
Program
TASK 1
CPU
CPU
CPU
CPU
TASK 1
TASK 2
TASK 3
RESULT
RESULT
Program
TASK 2
CPU
RESULT
Processor Designs
Pipelined ALU
Within operations
Across operations
Parallel ALUs
Parallel processors
Parallel Processor
Increase system performance by using
multiple Processors that can execute in
Parallel
Symmetric Multi-Processor
Cluster
Non-Uniform Memory Access (NUMA)
Pipelining
Instruction Level Parallelism causes overlap of
instructions
Loop Level Parallelism in iterations of a loop
Single processor
Single instruction stream
Data stored in single memory
CU
- Control Unit
Uni-processor
IS
- Instruction Stream
PU
DS
MU
- Processing Unit
- Data Stream
- Memory Unit
LM
- Local Memory
MIMD - Overview
General purpose processors
Each can process all instructions
necessary
Further classified by method of processor
communication
SMP
Multiple similar processors within same
computer, interconnected by bus or switching.
Problem is Cache coherance
Symmetric Multiprocessors
A stand alone computer with the following characteristics
Two or more similar processors of comparable capacity
Processors share same memory and I/O
Processors are connected by a bus or other internal
connection such that Memory access time is approximately
the same for each processor
All processors share access to I/O
Either through same channels or different channels giving
paths to same devices
All processors can perform the same functions (hence
symmetric)
System controlled by integrated operating system
providing interaction between processors, threads
scheduling and synchronisation
Interaction at job, task, file and data element levels
SMP Advantages
Performance
If some work can be done in parallel
Availability
Since all processors can perform the same
functions, failure of a single processor does not
halt the system
Incremental growth
User can enhance performance by adding
additional processors
Scaling
Vendors can offer range of products based on
number of processors
IBM z990
Multiprocessor
Structure
Chip Multiprocessing
More than one processor implemented on
a single chip
Cluster
A Group of interconnected whole computers
working together as a unified computing
resource.
Clusters
Alternative to SMP
High performance
High availability
Server applications
Cluster Benefits
Absolute scalability
Incremental scalability
High availability
Superior price/performance
Cluster v. SMP
Both provide multiprocessor support to high demand
applications.
Both available commercially
SMP for longer
SMP:
Easier to manage and control
Closer to single processor systems
Scheduling is main difference
Less physical space
Lower power consumption
Clustering:
Superior incremental & absolute scalability
Superior availability
Redundancy
NUMA
Shared memory multi-processor in which
the access time from a given processor to
a word in memory varies with the location
of the memory word.
Motivation
SMP has practical limit to number of processors
Bus traffic limits to between 16 and 64 processors
CC-NUMA Organization
Vector Computation
Maths problems involving physical processes present
different difficulties for computation
Aerodynamics, seismology, meteorology
Continuous field simulation
High precision
Repeated floating point calculations on large arrays of
numbers
Supercomputers handle these types of problem
Array processor
Alternative to supercomputer
Configured as peripherals to mainframe & mini
Just run vector portion of problems
Vector Processor
Process vectors or arrays of Data
Approaches
to
Vector
Computation
Multiple Processors
Multithreading
One
instruction
stream per
slot
Multithreading
Alleviates some of the
memory latency
problems
Still has problems
What if red thread waits
for data from memory and
there is a cache miss?
Yellow thread waits
unnecessarily
Hyperthreading
More than one
instruction stream
per slot
SMP vs SMT
Cache
Arch. State
Arch. State
Arch. State
Arch. State
Arch. State
Arch. State
APIC
APIC
APIC
APIC
APIC
APIC
Processor
Core
Processor
Core
Processor
Core
Processor
Core
System Bus
Dual Processor
On-Die Cache
Processor
Core
On-Die Cache
System Bus
System Bus
HyperThreading
Dual Core
Multicores
Two or more processors on the same chip
Each has an independent interface to the
front side bus
Both OS and the applications must
support thread-level parallelism
Typically