0% found this document useful (0 votes)
25 views

5 4 Parallel

The document discusses different types of parallel processing architectures including symmetric multiprocessing (SMP), clusters, and non-uniform memory access (NUMA). It describes the key characteristics of each architecture and compares SMP systems to clusters. Additionally, it covers parallel processor concepts such as pipelining, vector processing, multithreading, and multicore processors.

Uploaded by

Uday Shubhraj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

5 4 Parallel

The document discusses different types of parallel processing architectures including symmetric multiprocessing (SMP), clusters, and non-uniform memory access (NUMA). It describes the key characteristics of each architecture and compares SMP systems to clusters. Additionally, it covers parallel processor concepts such as pipelining, vector processing, multithreading, and multicore processors.

Uploaded by

Uday Shubhraj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Parallel processors

Dr. Mrs. B. Janet,


Department of Computer Applications,
NIT, Trichy 15.

Computer Organizations

SEQUENTIAL & PARALLEL PROCESSING


SEQUENTIAL

PARALLEL

Program

Program
TASK 1

CPU

CPU

CPU

CPU

TASK 1

TASK 2

TASK 3

RESULT
RESULT

Program
TASK 2
CPU
RESULT

MASSIVE PARALLEL COMPUTERS


CAN HAVE THOUSANDS OF CPUs

Processor Designs
Pipelined ALU
Within operations
Across operations

Parallel ALUs
Parallel processors

Parallel Processor
Increase system performance by using
multiple Processors that can execute in
Parallel
Symmetric Multi-Processor
Cluster
Non-Uniform Memory Access (NUMA)

Pipelining
Instruction Level Parallelism causes overlap of
instructions
Loop Level Parallelism in iterations of a loop

Multiple Processor OrganizationTypes of Parallel Processor systems

Single instruction, single data stream - SISD


Single instruction, multiple data stream - SIMD
Multiple instruction, single data stream - MISD
Multiple instruction, multiple data stream- MIMD

Single Instruction, Single Data


Stream - SISD

Single processor
Single instruction stream
Data stored in single memory
CU
- Control Unit
Uni-processor
IS
- Instruction Stream
PU
DS
MU

- Processing Unit
- Data Stream
- Memory Unit

Single Instruction, Multiple Data


Stream - SIMD
Single machine instruction
Controls simultaneous execution of a
number of processing elements on a
Lockstep basis
Each processing element has associated
data memory
Each instruction executed on different set
of data by different processors
Vector and array processors

Parallel Organizations - SIMD

LM

- Local Memory

Multiple Instruction, Single Data


Stream - MISD
Sequence of data is transmitted to set of
processors
Each processor executes different
instruction sequence
Never been implemented

Multiple Instruction, Multiple


Data Stream- MIMD
Set of processors
simultaneously
execute different
instruction
sequences on
different sets of data
SMPs, clusters and
NUMA systems

MIMD - Overview
General purpose processors
Each can process all instructions
necessary
Further classified by method of processor
communication

Parallel Organizations - MIMD


Distributed Memory

Taxonomy of Parallel Processor


Architectures

SMP
Multiple similar processors within same
computer, interconnected by bus or switching.
Problem is Cache coherance

Symmetric Multiprocessors
A stand alone computer with the following characteristics
Two or more similar processors of comparable capacity
Processors share same memory and I/O
Processors are connected by a bus or other internal
connection such that Memory access time is approximately
the same for each processor
All processors share access to I/O
Either through same channels or different channels giving
paths to same devices
All processors can perform the same functions (hence
symmetric)
System controlled by integrated operating system
providing interaction between processors, threads
scheduling and synchronisation
Interaction at job, task, file and data element levels

SMP Advantages
Performance
If some work can be done in parallel

Availability
Since all processors can perform the same
functions, failure of a single processor does not
halt the system

Incremental growth
User can enhance performance by adding
additional processors

Scaling
Vendors can offer range of products based on
number of processors

Block Diagram of Tightly


Coupled Multiprocessor

Tightly Coupled - SMP


Processors share memory
Communicate via that shared memory
Symmetric Multiprocessor (SMP)
Share single memory or pool
Shared bus to access memory
Memory access time to given area of memory
is approximately the same for each processor

Symmetric Multiprocessor Organization

IBM z990
Multiprocessor
Structure

Chip Multiprocessing
More than one processor implemented on
a single chip

Multithreading and Chip


Multiprocessors
Instruction stream divided into smaller
streams (threads)
Executed in parallel
Wide variety of multithreading designs

Cluster
A Group of interconnected whole computers
working together as a unified computing
resource.

Clusters

Alternative to SMP
High performance
High availability
Server applications

A group of interconnected whole computers


Working together as unified resource
Illusion of being one machine
Each computer called a node

Cluster Benefits

Absolute scalability
Incremental scalability
High availability
Superior price/performance

Cluster Configurations - Standby


Server, No Shared Disk

Cluster Configurations Shared Disk

Cluster v. SMP
Both provide multiprocessor support to high demand
applications.
Both available commercially
SMP for longer
SMP:
Easier to manage and control
Closer to single processor systems
Scheduling is main difference
Less physical space
Lower power consumption
Clustering:
Superior incremental & absolute scalability
Superior availability
Redundancy

NUMA
Shared memory multi-processor in which
the access time from a given processor to
a word in memory varies with the location
of the memory word.

Nonuniform Memory Access (NUMA)


Alternative to SMP & clustering
Uniform memory access
All processors have access to all parts of memory
Using load & store

Access time to all regions of memory is the same


Access time to memory for different processors same
As used by SMP

Nonuniform memory access


All processors have access to all parts of memory
Using load & store

Access time of processor differs depending on region of memory


Different processors access different regions of memory at
different speeds

Cache coherent NUMA


Cache coherence is maintained among the caches of the various
processors
Significantly different from SMP and clusters

Motivation
SMP has practical limit to number of processors
Bus traffic limits to between 16 and 64 processors

In clusters each node has own memory


Apps do not see large global memory
Coherence maintained by software not hardware

NUMA retains SMP flavour while giving large


scale multiprocessing
e.g. Silicon Graphics Origin NUMA 1024 MIPS
R10000 processors

Objective is to maintain transparent system wide


memory while permitting multiprocessor nodes,
each with own bus or internal interconnection
system

CC-NUMA Organization

Scalar Processor Approaches


Single-threaded scalar
Simple pipeline
No multithreading
Interleaved multithreaded scalar
Easiest multithreading to implement
Switch threads at each clock cycle
Pipeline stages kept close to fully occupied
Hardware needs to switch thread context between
cycles
Blocked multithreaded scalar
Thread executed until latency event occurs
Would stop pipeline
Processor switches to another thread

Vector Computation
Maths problems involving physical processes present
different difficulties for computation
Aerodynamics, seismology, meteorology
Continuous field simulation

High precision
Repeated floating point calculations on large arrays of
numbers
Supercomputers handle these types of problem

Hundreds of millions of flops


$10-15 million
Optimised for calculation rather than multitasking and I/O
Limited market
Research, government agencies, meteorology

Array processor
Alternative to supercomputer
Configured as peripherals to mainframe & mini
Just run vector portion of problems

Vector Addition Example

Vector Processor
Process vectors or arrays of Data

Approaches
to
Vector
Computation

Symmetric Multiprocessing to the Rescue

Multiple Processors

Multithreading
One
instruction
stream per
slot

Multi threaded Processor


To Replicate some components of the
processor to execute multiple threads
concurrently

Multithreading
Alleviates some of the
memory latency
problems
Still has problems
What if red thread waits
for data from memory and
there is a cache miss?
Yellow thread waits
unnecessarily

Hyperthreading
More than one
instruction stream
per slot

SMP vs SMT

Having Multiple Cores


Cache

Cache

Arch. State

Arch. State

Arch. State

Arch. State

Arch. State

Arch. State

APIC

APIC

APIC

APIC

APIC

APIC

Processor
Core

Processor
Core

Processor
Core

Processor
Core

System Bus
Dual Processor

On-Die Cache

Processor
Core

On-Die Cache

System Bus

System Bus

HyperThreading

Dual Core

Multicores
Two or more processors on the same chip
Each has an independent interface to the
front side bus
Both OS and the applications must
support thread-level parallelism

Typically

You might also like