0% found this document useful (0 votes)

116 views

Parallel Processing: sp2016 Lec#3

The document discusses different types of parallel processing architectures, including implicit and explicit parallelism. It describes pipeline processors, superscalar processors, and VLIW processors as forms of implicit parallelism that exploit instruction level parallelism. It then covers explicit parallel architectures like SIMD and MIMD, explaining their classification based on instruction and data streams. Key aspects like programming models, performance limitations, and comparisons between SIMD and MIMD are summarized.

Uploaded by

RohFollower

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

116 views

Parallel Processing: sp2016 Lec#3

Uploaded by

RohFollower

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 23

Parallel Processing

sp2016
lec#3
Dr M Shamim Baig

1.1

Implicit Parallel Architectures:

ILP processors
Pipelined Processors
Superscalar Processor
VLIW Processor

1.2

Pipeline Performance
Instruction & Arithmetic-unit Pipeline
Ideal pipeline Speed-up calculation & Limits
Chained Pipeline Performance
The speed-up of a pipeline is eventually limited by the
number of stages & time of slowest stage.
For this reason, conventional processors tried on very
deep-pipeline (20 stage pipeline is an example of deep
pipeline compared to normal pipeline of 3-6 stages)

1.3

Pipeline Performance Bottlenecks

Pipeline has following performance bottlenecks
Resource Constraint
Data Dependency
Branch Prediction
Approx every 5-6th instruction is a conditional jump! This
requires very accurate branch prediction.
The penalty of a prediction error grows with the depth of
the pipeline, since a larger number of instructions will
have to be flushed.
Hence need for better solutions (than deep pipeline) ????
1.4

Implicit Parallel Architectures:

ILP processors
Pipelined Processor
Superscalar Processor
VLIW Processor

1.5

Superscalar Processor
One simple way of alleviating the deep pipeline
bottlenecks is to use multiple (concurrent) short
pipelines.
Issue multiple independent instructions
simultaneously
Examples: MIPS1000, PowerPC & Pentium

The question then becomes one of selecting or

scheduling these instructions for simultaneous
issuing.
1.6

Superscalar Scheduler
Superscalar scheduler is in-chip hardware that
looks at number of instructions in an instruction
queue at runtime & selects appropriate number
of instructions to execute concurrently.
Scheduling of instructions concurrently is
determined by a number of factors:
Resolve Data Dependency Issues
Resolve Resource Constraint Issues
Resolve Branch Prediction Issues

Cost/ complexity of Scheduler hardware & its

performance constraints (discussed later) are
important issues of superscalar processors.

1.7

Example: two-way superscalar execution of instructions

// OF not required
IF

// OF & E not required

Execution Unit constraint or data-dependency can cause additional delays than Ideal pipeline

The example illustrates that different instruction mixes with

identical semantics can take significantly different execution time
1.8

Superscalar Execution: Resource Waste

In the above example, there is some wastage of Execution
unit resource

// OF not required

// OF & E not required

1.9

Superscalar Execution:
Efficiency Considerations
Not all functional units can be kept busy at all times.
If during a cycle, no functional units are utilized, this is
referred to as vertical waste.
If during a cycle, only some of the functional units are
utilized, this is referred to as horizontal waste.
Due to limited parallelism in typical instruction traces
(dependencies) & limited time/scope of the scheduler
to extract parallelism, the performance of superscalar
processors is eventually limited.
Conventional microprocessors typically support fourway superscalar execution.
1.10

Superscalar Execution:
Instruction Issue Mechanisms
In the simpler model, instructions can be issued
only in the order in which they are encountered
i.e if the second instruction cannot be issued
because it has a data dependency with the first,
only one instruction is issued in the cycle.
This is called in-order issue.
In a more aggressive model, instructions can be
issued out of order. In this case, if the second
instruction has data dependencies with the first,
but the third instruction does not, the first and
third instructions can be co-scheduled.
This is also called dynamic issue.
Performance of in-order issue is generally limited
1.11

Implicit Parallel Architectures:

ILP processors
Pipelined Processor
Superscalar Processor
VLIW Processor

1.12

Very Long Instruction Word (VLIW)

Processors
Hardware cost /complexity & time/ scope constraint
of runtime scheduling of the superscalar are the
major issues in superscalar design.

To address these issues, VLIW processors rely on

compile time analysis to identify & bundle together
instructions that can be executed concurrently
These instructions are packed & dispatched together
& thus the name very long instruction word
Typical VLIW processors are limited to 4 to 8-way
parallelism. Variants of this concept are employed
in Intel IA64 processors & TI TMS320 C6XXX DSPs

1.13

A high performance DSP:

8-way VLIW processor

TMS320C6x has dual data paths

& orthogonal instruction units
which boost overall performance

Comparison: Superscalar vs
Very Long Instruction Word (VLIW)
Superscalar implements Scheduler as in-chip Hardware,
while VLIW implements it in compiler software.
Superscalar schedules concurrent instructions at runtime,
while VLIW does it at compile-time.
Superscalar scheduler scope is limited to few instructions
from instruction-queue while VLIW scheduler has bigger
context (may be full program) to process.
Due to more time & context VLIW scheduler can use
more powerful algorithms (eg loop unrolling, branch prediction
etc) giving better results, which Superscalar cant afford
Compilers, however, do not have runtime information (eg
cache misses, branch variable state etc), so VLIW Scheduling is
inherently more conservative than Superscalar
1.15

Explicitly Parallel Processor

architectures:
Task-level Parallelism

1.16

Elements of (Explicit) Parallel

Architectures
Processor configurations:
Instruction/Data Stream based
Memory Configurations:
- Physical & Logical based
- Access-Delay based
Inter-processor communication:
Communication-Interface design
- Data Exchange/ Synch approach
1.17

Flynns Classification for

Parallel Processor Architecture
Instruction Stream & Data Streams based
classification (SISD, MISD, SIMD, MIMD)
Processing units in parallel computers either
operate under the centralized control of a
single control unit or work independently.
If there is a single control unit that dispatches
the same instruction to various processors
(that work on different data), the model is
referred to as single instruction stream,
multiple data stream (SIMD).
If each processor has its own control unit,
each processor can execute different
instructions on different data items. This model
is called multiple instruction stream, multiple
data stream (MIMD).
1.18

SIMD and MIMD Processors

DS3

MEMORY

DS2

IS2

DS1

DS2

MEMORY

IS1

DS1

Isn-1
DSn-1

DSn-1
DSn

ISn
DSn

A typical SIMD architecture (a) and a typical MIMD architecture (b).

1.19

SIMD Processors
Some of the earliest parallel computers such as the
Illiac IV, MPP, DAP, CM-2, and MasPar MP-1 belonged to
this class of machines.
Variants of this concept have found use in co-processing
units such as the MMX units in Intel processors,
DSP
chips such as the Sharc & Vividias GPUs.
SIMD relies on the regular structure of computations (such
as those in image processing).
It is often necessary to selectively turn off operations on
certain data items. For this reason, most SIMD
programming paradigms allow for an ``activity mask'',
which determines if a processor should participate in a
computation or not.
1.20

Ex: Conditional Execution in SIMD Processors

Executing a conditional statement on an SIMD computer with four processors:

(a) the conditional statement; (b) the execution of the statement in two steps.

1.21

Programing Models: MPMD/ SPMD

In contrast to SIMD processors, MIMD processors can
execute different programs on different processors
There are two programming models for PP called
Multiple/Single Program Multiple-Data (MPMD/ SPMD)
execute different/same program on different processors
SIMD supports only SPMD model. Although MIMD
supports both models of programming (MPMD & SPMD),
SPMD is preferred choice due to software management
Examples of MIMD-platforms include current generation
Sun Ultra Servers, SGI Origin Servers, multiprocessor
PCs, workstation clusters & IBM SP.

1.22

Comparison: SIMD vs MIMD

Control flow:
Synchronous in SIMD vs Asynchronous in MIMD
Programming-model:SIMD supports only SPMD prog-model
while MIMD supports both (SPMD & MPMD) prog-models
Cost: SIMD computers require less hardware than
MIMD computers (single control unit).
However, since SIMD processors are specially
designed, they tend to be expensive and have long
design cycles.
In contrast, MIMD processors can be built from
inexpensive off-the-shelf components with relatively
little effort in a short time
Flexibility: SIMD perform very well for specialized /
regular applications but Not for all applications, while
MIMD are more flexible & general purpose.
1.23

The Manager's Pocket Guide To Emotional Intelligence.2001
100% (4)
The Manager's Pocket Guide To Emotional Intelligence.2001
143 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
Superscalar and VLIW Architectures
No ratings yet
Superscalar and VLIW Architectures
35 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
Aca Notes
No ratings yet
Aca Notes
23 pages
Pipelining
No ratings yet
Pipelining
5 pages
Advanced Computer Architecture Prof Thriveni T K
No ratings yet
Advanced Computer Architecture Prof Thriveni T K
59 pages
Input Unit: Memory: in Processing Element (PE) or CPU: Output
No ratings yet
Input Unit: Memory: in Processing Element (PE) or CPU: Output
24 pages
Computer Architecture Unit 3
No ratings yet
Computer Architecture Unit 3
8 pages
ACA Mod2
No ratings yet
ACA Mod2
45 pages
Module 3- Processors
No ratings yet
Module 3- Processors
22 pages
CSC 580 - Chapter 2
No ratings yet
CSC 580 - Chapter 2
50 pages
Lec2 ParallelProgrammingPlatforms
No ratings yet
Lec2 ParallelProgrammingPlatforms
26 pages
System-on-Chip Design: 2ECDE54
No ratings yet
System-on-Chip Design: 2ECDE54
24 pages
Module II
No ratings yet
Module II
60 pages
005-SimultaneousMultithreading
No ratings yet
005-SimultaneousMultithreading
50 pages
CA Unit IV Notes Part 1 PDF
No ratings yet
CA Unit IV Notes Part 1 PDF
17 pages
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
No ratings yet
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
17 pages
Me FIRST
No ratings yet
Me FIRST
4 pages
Zareen 13
No ratings yet
Zareen 13
13 pages
08 Parallel algorithms approches
No ratings yet
08 Parallel algorithms approches
12 pages
8. Module3
No ratings yet
8. Module3
49 pages
Pipelining_Lec 2-3-4
No ratings yet
Pipelining_Lec 2-3-4
72 pages
Very Long Instruction Word
No ratings yet
Very Long Instruction Word
19 pages
Processors
100% (4)
Processors
44 pages
Advanced Processor Superscalarclass
50% (2)
Advanced Processor Superscalarclass
73 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
Superscalar vs VLIW
No ratings yet
Superscalar vs VLIW
30 pages
Lecture 2
No ratings yet
Lecture 2
12 pages
Superscalar Processor
No ratings yet
Superscalar Processor
4 pages
CSO Computer Programming
No ratings yet
CSO Computer Programming
73 pages
Coa-Unit - 5 Notes
No ratings yet
Coa-Unit - 5 Notes
38 pages
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
20 pages
Cs2354 Advanced Computer Architecture 2 Marks
No ratings yet
Cs2354 Advanced Computer Architecture 2 Marks
10 pages
Chapter 04 Processors and Memory Hierarchy PDF
No ratings yet
Chapter 04 Processors and Memory Hierarchy PDF
50 pages
Chapter 04 Processors and Memory Hierarchy
75% (8)
Chapter 04 Processors and Memory Hierarchy
50 pages
Xx-Iip & Ilp
No ratings yet
Xx-Iip & Ilp
16 pages
HSE-6-Soc Introduction To The System Design Approach
No ratings yet
HSE-6-Soc Introduction To The System Design Approach
69 pages
SOC
No ratings yet
SOC
71 pages
Parallel Processing
No ratings yet
Parallel Processing
127 pages
Chap2 Slides
No ratings yet
Chap2 Slides
127 pages
ITEC582-Chapter 16m
No ratings yet
ITEC582-Chapter 16m
55 pages
Lecture3 (Form Parallelism&flynn)
No ratings yet
Lecture3 (Form Parallelism&flynn)
12 pages
ch.9 Pipeline MoDIFIED
No ratings yet
ch.9 Pipeline MoDIFIED
76 pages
Computer Architecture Question Bank
No ratings yet
Computer Architecture Question Bank
12 pages
Parallel Processing Parallel Processing
No ratings yet
Parallel Processing Parallel Processing
64 pages
Superscalar Architectures
No ratings yet
Superscalar Architectures
36 pages
Instruction Level Parallelism: Module 5: Chapter 12
No ratings yet
Instruction Level Parallelism: Module 5: Chapter 12
13 pages
15CS72_ACA_Module2FinalCopy
No ratings yet
15CS72_ACA_Module2FinalCopy
29 pages
ACA1
No ratings yet
ACA1
26 pages
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
No ratings yet
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
57 pages
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
No ratings yet
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
42 pages
(123doc) Dien Tu Vien Thong c16 Instructionlevel Parallelism and Superscalar Processors 39 g3 Khotailieu
No ratings yet
(123doc) Dien Tu Vien Thong c16 Instructionlevel Parallelism and Superscalar Processors 39 g3 Khotailieu
71 pages
COA UNIT-III Parallel Processors
No ratings yet
COA UNIT-III Parallel Processors
51 pages
The Microarchitecture of Superscalar Processors: Paper
No ratings yet
The Microarchitecture of Superscalar Processors: Paper
16 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
Lecture 3 Flynn's Classical Taxonomy
No ratings yet
Lecture 3 Flynn's Classical Taxonomy
29 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Oracle 11g Streams Implementer's Guide
From Everand
Oracle 11g Streams Implementer's Guide
Ann L. R. McKinnell
No ratings yet
Parallel Processing: sp2016 Lec#5
No ratings yet
Parallel Processing: sp2016 Lec#5
27 pages
Public Key Infrastructure (PKI) Continued.
No ratings yet
Public Key Infrastructure (PKI) Continued.
23 pages
Parallel Processing: sp2016 Lec#5
No ratings yet
Parallel Processing: sp2016 Lec#5
27 pages
PP16 Lec4 Arch3
No ratings yet
PP16 Lec4 Arch3
23 pages
Network Security (Lec 22) : Ipsec
No ratings yet
Network Security (Lec 22) : Ipsec
34 pages
Network Security (Lec 19 and 20)
No ratings yet
Network Security (Lec 19 and 20)
44 pages
Public Key Infrastructure (PKI)
No ratings yet
Public Key Infrastructure (PKI)
23 pages
Lec 23 and 24
No ratings yet
Lec 23 and 24
33 pages
Network Security (Key Management)
No ratings yet
Network Security (Key Management)
27 pages
Shams Ul Arifeen
No ratings yet
Shams Ul Arifeen
1 page
Network Security Lec 11 (Message Authentication & Hash Functions)
No ratings yet
Network Security Lec 11 (Message Authentication & Hash Functions)
36 pages
CV - Session
No ratings yet
CV - Session
7 pages
16 and 17
No ratings yet
16 and 17
38 pages
Dip Power Epm NP Design Total GPA 4 4 3 3.5 4 63 3.705882 0 0 0 0 0 0 0 0 0 0 0 0
No ratings yet
Dip Power Epm NP Design Total GPA 4 4 3 3.5 4 63 3.705882 0 0 0 0 0 0 0 0 0 0 0 0
1 page
Soal - Procedure Text and Automotive Vocab
100% (2)
Soal - Procedure Text and Automotive Vocab
3 pages
578 CROCHET SHORTS Make Them Any Size
No ratings yet
578 CROCHET SHORTS Make Them Any Size
3 pages
Lecture 9
No ratings yet
Lecture 9
34 pages
5 Lantai Pintu Automatis
No ratings yet
5 Lantai Pintu Automatis
4 pages
Effect of Credit Management On Performance of Commercial Banks in Rwanda A Case Study of Equity Bank Rwanda LTD
No ratings yet
Effect of Credit Management On Performance of Commercial Banks in Rwanda A Case Study of Equity Bank Rwanda LTD
12 pages
Happy Republic Day Jan 2025 Csd All Car Price ..
No ratings yet
Happy Republic Day Jan 2025 Csd All Car Price ..
25 pages
Iv Evs
No ratings yet
Iv Evs
12 pages
Download ebooks file Managerial Accounting: An Integrative Approach 2nd Edition C J Mcnair-Connoly all chapters
100% (4)
Download ebooks file Managerial Accounting: An Integrative Approach 2nd Edition C J Mcnair-Connoly all chapters
40 pages
Automatic Power Factor Correction2
100% (1)
Automatic Power Factor Correction2
40 pages
MEE2003 Laboratory FAT Question Paper
No ratings yet
MEE2003 Laboratory FAT Question Paper
4 pages
1 - Acid Base 2021 Notes
No ratings yet
1 - Acid Base 2021 Notes
32 pages
Learning_Steeting_Remote_JVC
No ratings yet
Learning_Steeting_Remote_JVC
1 page
History Project Real 1
No ratings yet
History Project Real 1
19 pages
Free Fall Notes
No ratings yet
Free Fall Notes
2 pages
Marguerite Maury - Aging and Essential Oils Reflections On Youthful Maturity
No ratings yet
Marguerite Maury - Aging and Essential Oils Reflections On Youthful Maturity
12 pages
8) Copper & Copper Alloy Specifications
No ratings yet
8) Copper & Copper Alloy Specifications
3 pages
Rule 130, Section 40, (One of The Execptions To The Hearsay Rule) Provides
No ratings yet
Rule 130, Section 40, (One of The Execptions To The Hearsay Rule) Provides
2 pages
Identifying The Genre of Materials Viewed: What Is Viewing?
No ratings yet
Identifying The Genre of Materials Viewed: What Is Viewing?
11 pages
"Graph Meaning "Writing Chart"
100% (1)
"Graph Meaning "Writing Chart"
10 pages
Copy of Bragsheet Questions (1)
No ratings yet
Copy of Bragsheet Questions (1)
6 pages
Iare RM Notes 0 PDF
No ratings yet
Iare RM Notes 0 PDF
150 pages
MCQs in Enrichment Course - Bleachung - 114453
No ratings yet
MCQs in Enrichment Course - Bleachung - 114453
4 pages
School Safety Assessment Tool
No ratings yet
School Safety Assessment Tool
15 pages
Complete Download Introductory Chemistry, 2nd Edition Kevin Revell PDF All Chapters
100% (1)
Complete Download Introductory Chemistry, 2nd Edition Kevin Revell PDF All Chapters
47 pages
ANS-KEY_CH-2_2025-26 (1)
No ratings yet
ANS-KEY_CH-2_2025-26 (1)
16 pages
2020 (1)
No ratings yet
2020 (1)
32 pages
PRANIC HEALING-Self Help Exercises
No ratings yet
PRANIC HEALING-Self Help Exercises
2 pages
Iron Age Mirrors A Biographical Approach Jody Joy instant download
100% (2)
Iron Age Mirrors A Biographical Approach Jody Joy instant download
45 pages
Sardine Beverage
No ratings yet
Sardine Beverage
2 pages

Parallel Processing: sp2016 Lec#3

Uploaded by

Parallel Processing: sp2016 Lec#3

Uploaded by

Parallel Processing

Implicit Parallel Architectures:

Pipeline Performance Bottlenecks

Implicit Parallel Architectures:

The question then becomes one of selecting or

Cost/ complexity of Scheduler hardware & its

Example: two-way superscalar execution of instructions

// OF & E not required

The example illustrates that different instruction mixes with

Superscalar Execution: Resource Waste

// OF & E not required

Implicit Parallel Architectures:

Very Long Instruction Word (VLIW)

To address these issues, VLIW processors rely on

A high performance DSP:

TMS320C6x has dual data paths

Explicitly Parallel Processor

Elements of (Explicit) Parallel

Flynns Classification for

SIMD and MIMD Processors

A typical SIMD architecture (a) and a typical MIMD architecture (b).

Ex: Conditional Execution in SIMD Processors

Executing a conditional statement on an SIMD computer with four processors:

Programing Models: MPMD/ SPMD

Comparison: SIMD vs MIMD

You might also like