0% found this document useful (0 votes)
80 views23 pages

PP16 Lec4 Arch3

This document discusses parallel processing architectures and platforms. It covers: - Explicitly parallel processor architectures including SIMD and MIMD systems. - Memory configurations including shared memory, distributed memory, and the differences between physical and logical memory. - Inter-processor communication methods including shared memory, message passing, and different interconnect technologies. - Programming models like SPMD and MPMD and how they apply to different architectures. - Examples of parallel platforms including SMP, clusters, and vector/array processors.

Uploaded by

RohFollower
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views23 pages

PP16 Lec4 Arch3

This document discusses parallel processing architectures and platforms. It covers: - Explicitly parallel processor architectures including SIMD and MIMD systems. - Memory configurations including shared memory, distributed memory, and the differences between physical and logical memory. - Inter-processor communication methods including shared memory, message passing, and different interconnect technologies. - Programming models like SPMD and MPMD and how they apply to different architectures. - Examples of parallel platforms including SMP, clusters, and vector/array processors.

Uploaded by

RohFollower
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

Parallel Processing

sp2016
lec#4
Dr M Shamim Baig

1.1

Explicitly Parallel Processor


architectures:
Task-level Parallelism

1.2

Elements of (Explicit) Parallel


Architectures
Processor configurations:
Instruction/Data Stream based
Memory Configurations:
- Physical & Logical based
- Access-Delay based
Inter-processor communication:
Communication-Interface design
- Data Exchange/ Synch approach
1.3

Example SIMD & MIMD systems


Variants of SIMD have found use in coprocessing units such as the MMX units
in Intel processors, DSP chips such as
the Sharc & Vividias graphic processors
GPUs
Examples of MIMD-platforms include
current generation Sun Ultra Servers,
SGI Origin Servers, multiprocessor PCs,
workstation clusters & IBM SP.

1.4

Ex: Conditional Execution in SIMD Processors


It is often necessary to selectively turn off operations on certain data items. For this, most SIMD programming
paradigms allow for ``activity mask'', which determines if a processor should participate in a computation or not

Executing a conditional statement on an SIMD computer with four processors:


(a) the conditional statement; (b) the execution of the statement in two steps.

1.5

Programing Models: MPMD/ SPMD


There are two programming models for PP called
Multiple Program Multiple-Data (MPMD) execute
different program on different processors. Single
Program Multiple-Data (MPMD/ SPMD) execute same
program on different processors
As SIMD system can execute only one program which
works on different parts of data. MIMD system can
execute same /different programs which also work on
different parts of data.
Hence SIMD supports only SPMD prog-model.
Although MIMD supports both models of programming
(MPMD & SPMD), SPMD is preferred choice due to
software management
1.6

Comparison: SIMD vs MIMD


Control flow: Synchronous in SIMD vs Asynchronous in MIMD
Programming-model:SIMD supports only SPMD prog-model
while MIMD supports both (SPMD & MPMD) prog-models
Cost: SIMD computers require less hardware than
MIMD computers (single control unit).
However, since SIMD processors are specially
designed, they tend to be expensive & have long
design cycles.
In contrast, MIMD processors can be built from
inexpensive off-the-shelf components with relatively
little effort in a short time
Flexibility: SIMD perform very well for specialized /
regular structured applications (eg image proc) but not
for all applications, while MIMD are more flexible &
general purpose.
1.7

Elements of (Explicit) Parallel


Architectures
Processor configurations:
Instruction/Data Stream based
Memory Configurations:
- Physical & Logical based
- Access-Delay based
Inter-processor communication:
Communication-Interface design
- Data Exchange/ Synch approach
1.8

Parallel Platforms:
Memory (Physical vs Logical) Configurations
Physical vs Logical Memory Config
Physical Memory config (SM, DM, CSM)
Logical Address Space config (SAS, NSAS)
Combinations
CSM + SAS (SMP; UMA)
DM + SAS (DSM; NUMA)
DM + NSAD (Multicomputer/Clusters)

1.9

Shared memory (SM) Multiprocessor


It is important to note difference between
terms Shared Memory & Shared Address
Space
Former is physical memory config, while later
is Logical memory address view for program.
It is possible to provide Shared Address
Space using a physically distributed memory.
SM-multiprocessors systems are SAS-based
using physical memory configuration
either as CSM or as (DM DSM)

1.10

UMA vs NUMA
SM-multiprocessors are further categorized based on
memory access delay as UMA (uniform memory
access) & NUMA (non uniform memory access)
UMA system is based on (CSM + SAS) config,
where each processor has same delay for
accessing any memory location
NUMA system is based on (DM+SAS = DSM)
config, where a processor may have different
delay for accessing different memory location.

1.11

UMA & NUMA Arch Block Diagrams

Both are SMmultiprocessors


differing in
Memory Access
Delay format

UMA (CSM+ SAS)

NUMA (DM+ SAS= DSM)

Typical shared-address-space architectures: (a) Uniform-memory access shared-address-space


computer; (b) Uniform-memory-access shared-address-space computer with caches and memories;
(c) Non-uniform-memory-access shared-address-space computer with local memory only.
1.12

Simplistic view of a small shared memory


Symmetric Multi Processor (SMP):
(CSM + SAS + Bus)
Processors

Shared memory

Bus

Examples:
Dual Pentiums
Quad Pentiums
1.13

Quad Pentium Shared


Memory SMP
Processor

Processor

Processor

Processor

L1 cache

L1 cache

L1 cache

L1 cache

L2 Cache

L2 Cache

L2 Cache

L2 Cache

Bus interface

Bus interface

Bus interface

Bus interface

Processor/
memory
bus
I/O interface

Memory controller

I/O bus

Shared memory

Memory
1.14

Multicomputer (Cluster) Platform


Complete computers P (CU + PE), DM with NSAS &
interconnection network interface at I/O bus level.
Interconnection
network
Messages
Processor

Local
memory
Computers

These platforms comprise of a set of processors


and their own (exclusive/ distributed) memory
Instances of such a view come naturally from
non-shared-address space (NSAS)
multicomputers e.g clustered workstations

1.15

Data Exchange/Synch Approaches:


Shared data vs Message-Passing
There are two primary approaches of
data exchange/synch in parallel systems
Shared-data approach
Message-Passing approach
SM-multiprocessors use Shared-Data
approach for data exchange/synch.
Multicomputers (Clusters) use MessagePassing approach for data exchange/
synch.
1.16

DataExchange/Synch Platforms:
Shared-memory vs Message-Passing
Shared memory platforms have low comm
overhead, can support lower grain levels,
while message passing platforms have more
comm overhead & therefore are more suited
for coarse grain levels
SM Multiprocessors are faster, but have poor
scalability
Message passing Multicomputer platforms
are slower but have higher scalability.
1.17

Clusters as a Computing Platform


Clusters: A network of computers became a very
attractive alternative to expensive supercomputers
used for high-performance computing in early 1990s
Several early projects Notably: Berkeley NOW (network of workstations) project.
NASA Beowulf project.

1.18

Advantages of Cluster Computer:


(NOW-like)
Very high performance workstations and
PCs readily available at low cost.
Latest processors can easily be incorporated
into the system as they become available.
Easily scalable
Existing software can be used or easily
modified.

1.19

Beowulf Clusters*
A group of interconnected commodity computers
achieving high performance with low cost.
Typically using commodity interconnects e.g
high speed Ethernet & OS e.g Linux.
* Beowulf comes from name given by NASA Goddard
Space Flight Center cluster project.

1.20

Cluster Interconnects: LAN vs SAN


LANs : fast / Gbits/ 10-Gbits Ethernet
SANs: Myrinet, Quadrics, Infiniband

Comparison LAN vs SAN


Distance: LAN for longer distance few (km vs m),
causing more delay/slower
Reliability: LAN for less reliable networks, so includes
overhead (error correction etc) which adds to delays
Processing Speed: LAN uses OS calls, causing more
processing delays
1.21

Vector/ Array Data Processors


Vector proc:1D-Temporal parallelism using
pipeline Arith unit & Vector chaining
Float add pipe: Comp exp, algn mant, add mant, Normalize

Array proc:1D- Spatial parallelism using


ALU-array as SIMD
Systolic Array: combines 2-D spatial
parallelism with pipelined (computational
wavefront
Block Diagrams of Vector/array & Systolic processing
?????
1.22

Summary: Parallel Platforms;


Memory & Interconnect Configurations
Memory Config (Physical vs Logical)
Physical Memory config (SM, DM, CSM)
Logical Address Space config (SAS, NSAS)
Combinations
CSM + SAS (SMP; UMA)
DM + SAS (DSM; NUMA)
DM + NSAD (Multicomputer/Clusters)

Interconnection Network:
o Interface level: memory bus (using MBEU) in SMmultiprocessors (UMA, NUMA) vs I/O bus (using NIU)
in multicomputer / cluster
o Data Exchange / sync:
Shared Data model vs Message Passing model
1.23

You might also like