Data Level Parallelism in Smid Andvector and Gpu: BY 19PW40 S.Sayana

Loop level parallelism exploits parallelism between iterations of a loop by distributing the loop iterations across multiple processors. It provides a speed up to the overall execution time of programs. For a loop to be parallelized using loop level parallelism, it must be free of dependencies between iterations, such as a loop-carried dependency where a statement in one iteration depends on a statement in a previous iteration. Different types of dependencies like true, anti, output, and input dependencies can exist between statements in a loop. Techniques like loop unrolling can be used to remove dependencies and enable parallel execution.

Uploaded by

SAYANA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views18 pages

Data Level Parallelism in Smid Andvector and Gpu: BY 19PW40 S.Sayana

Uploaded by

SAYANA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

DATA LEVEL PARALLELISM IN

SMID ANDVECTOR AND GPU

BY
19PW40
S.Sayana
DATA PARALLELISM

Data level parallelism (also

known as loop level parallelism)
is a form of parallel computing for
multiple processors using a
technique for distributing the
data across different parallel
processor nodes.
LOOP LEVEL PARALLELISM

 Loop level parallelism is a form of parallelism

that is concerned with extracting parallel
tasks from loops.
 This often arises in programs where data is
often stored in random access data
structures.
 It uses multiple processes which will iterate
over data structure and operate on some or
all of the indices at the same time.
 It provides a speed up to overall execution
time of the program.
LOOP LEVEL PARALLELISM

The simplest and most common way

to increase the amount of parallelism
available among instructions is to
exploit parallelism among iterations
of a loop.This type of parallelism is
often called loop level parallelism.
DEPENDENCE IN LOOPS

Example 1:
for (i=1; i<=1000; i= i+1)
x[i] = x[i] + y[i];
This is a parallel loop. Every iteration of
the loop can overlap with any other
iteration, although within each loop
iteration there is little opportunity for
overlap.
Example 2
for (i=1; i<=100; i= i+1){
a[i] = a[i] + b[i]; //s1
b[i+1] = c[i] + d[i]; //s2
}
Is this loop parallel? If not how to make it parallel?
Despite dependency this loop can be made parallel as the
dependency is not circular.
LOOP CARRIED DEPENDENCY

 When a statement in one iteration of a loop

depends in some way on a statement in a
different iteration of the same loop,a loop-
carried dependance exists.
 If a statement in one iteration of a loop
depends only on a statement in the same
iteration of the loop, this creates a loop
independent dependence.
CIRCULAR DEPENDANCY

 Neither statements depends on itself.

 When s1 depends on s2 and s2
depends on s1.
A loop is parallel unless there is a cycle in the
dependecies, since the absence of a cycle means that
the dependencies give a partial ordering on the
statements.
This allows us to replace the loop above with the following code
sequence :
a[1] = a[1] + b[1];
for (i=1; i<=99; i= i+1){
b[i+1] = c[i] + d[i];
a[i+1] = a[i+1] + b[i+1];
}
b[101] = c[100] + d[100];
Example 3:
for (i=1; i<=100; i= i+1){
a[i+1] = a[i] + c[i]; //S1
b[i+1] = b[i] + a[i+1]; //S2
}
This loop is not parallel because it has cycles in the dependencies,
namely the statements S1 and S2 depend on themselves!
There are a number of techniques for converting such loop-level
parallelism into instruction-level parallelism. Basically, such
techniques work by unrolling the loop.
DEPENDENCIES IN CODE

There are many types of dependencies.They are

In order to preserve the sequential behaviour of a loop when run in

parallel, True Dependence must be preserved. Anti-Dependence and
Output Dependence can be dealt with by giving each process its own
copy of variables
EXAMPLES

Example of true dependence

S1: int a, b;
S2: a = 2;
S3: b = a + 40;
S2 ->T S3, meaning that S2 has a true
dependence on S3 because S2 writes to the
variable a, which S3 reads from.
EXAMPLES

Example of anti-dependence
S1: int a, b = 40;
S2: a = b - 38;
S3: b = -1;
S2 ->A S3, meaning that S2 has an anti-
dependence on S3 because S2 reads from
the variable b before S3 writes to it.
EXAMPLES

Example of output-dependence
S1: int a, b = 40;
S2: a = b - 38;
S3: a = 2;
S2 ->O S3, meaning that S2 has an output
dependence on S3 because both write to the
variable a.
EXAMPLES

Example of input-dependence
S1: int a, b, c = 2;
S2: a = c - 1;
S3: b = c + 1;
S2 ->I S3, meaning that S2 has an input
dependence on S3 because S2 and S3 both
read from variable c.
CHAINING,CONVOYS AND
CHIMES
 Chaining allows the results of one vector operation to be
directly used as input to another vector operation.
 A convoy is a set of vector instructions that can potentially
execute together. Only structural hazards cause separate
convoys as true dependences are handled via chaining in
the same convoy
 A chime is the unit of time taken to execute one convoy,
which is the vector length along with the startup cost.
The following VMIPS code executes in three chimes since there are
three convoys.
/* VMIPS code */ /* convoys */
LV V1,Rx 1. LV V1,Rx
MULVS.D V2,V1,F0 MULVS.D V2,V1,F0
LV V3, Ry 2. LV V3, Ry
ADDVV. D V4,V2,V3 ADDVV.D V4,V2,V3

Introduction To Informaiton Technology Express Learning by ITL Education Solutions Limited
No ratings yet
Introduction To Informaiton Technology Express Learning by ITL Education Solutions Limited
409 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
Verilog Programming Styles
No ratings yet
Verilog Programming Styles
95 pages
Pipeline and Vector Processing
No ratings yet
Pipeline and Vector Processing
18 pages
Lecture 5
No ratings yet
Lecture 5
76 pages
2 TypesofParallelism
No ratings yet
2 TypesofParallelism
69 pages
Module 5 Instruction Level Parallelism and Pipelining
No ratings yet
Module 5 Instruction Level Parallelism and Pipelining
54 pages
UNIT-V-Pipeline and Array Processing and Multi Processors
No ratings yet
UNIT-V-Pipeline and Array Processing and Multi Processors
51 pages
Instruction-Level Parallelism (ILP), Since The
100% (1)
Instruction-Level Parallelism (ILP), Since The
57 pages
CS-3006 9 DependenceAnalysis
No ratings yet
CS-3006 9 DependenceAnalysis
67 pages
14-Parallelization and Automatic Parallelization-08!11!2024
No ratings yet
14-Parallelization and Automatic Parallelization-08!11!2024
50 pages
Pipelining Achieves Instruction Level Parallelism (ILP)
No ratings yet
Pipelining Achieves Instruction Level Parallelism (ILP)
59 pages
Compiler Unit 4
No ratings yet
Compiler Unit 4
59 pages
5 Advanced-1
No ratings yet
5 Advanced-1
60 pages
Unit-5 Toc
No ratings yet
Unit-5 Toc
41 pages
PP Unit 2 Tesseract
No ratings yet
PP Unit 2 Tesseract
38 pages
C To Asm, Asm To C
No ratings yet
C To Asm, Asm To C
40 pages
MCP Unit 1
No ratings yet
MCP Unit 1
41 pages
PDC Lecture 04
No ratings yet
PDC Lecture 04
44 pages
43-Instruction Scheduling and Software Pipelining-19!11!2024
No ratings yet
43-Instruction Scheduling and Software Pipelining-19!11!2024
25 pages
Lecture 4 - Arrays&Loops
No ratings yet
Lecture 4 - Arrays&Loops
38 pages
Capp 1
No ratings yet
Capp 1
38 pages
Lecture05 - High-Level Digital Design Automation
No ratings yet
Lecture05 - High-Level Digital Design Automation
36 pages
c3 Dependence Analysis p1
No ratings yet
c3 Dependence Analysis p1
32 pages
Dependencies, Instruction Scheduling, Optimization, and Parallelism
No ratings yet
Dependencies, Instruction Scheduling, Optimization, and Parallelism
49 pages
13) Ilp1 PDF
No ratings yet
13) Ilp1 PDF
85 pages
02 Assembly
No ratings yet
02 Assembly
43 pages
Parallel Programming 1
No ratings yet
Parallel Programming 1
32 pages
Program and Network Properties 2.1 Conditions of Parallelism 2.2 Program Partitioning and Scheduling
No ratings yet
Program and Network Properties 2.1 Conditions of Parallelism 2.2 Program Partitioning and Scheduling
47 pages
ACA Unit 8 Hardware and Software For VLIW and EPIC Notes - Unit 8
No ratings yet
ACA Unit 8 Hardware and Software For VLIW and EPIC Notes - Unit 8
35 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
An Efficient Technique For Eliminating Hidden Redundant Memory Accesses
100% (1)
An Efficient Technique For Eliminating Hidden Redundant Memory Accesses
11 pages
Embedded C Programming
No ratings yet
Embedded C Programming
33 pages
Namma Kalvi 11th Computer Applications Chapter 12 and 9 Sura Guide em
No ratings yet
Namma Kalvi 11th Computer Applications Chapter 12 and 9 Sura Guide em
70 pages
Andes RVV Webinar III
No ratings yet
Andes RVV Webinar III
49 pages
High Level Synthesis II: ECE 3401 Digital Systems Design
No ratings yet
High Level Synthesis II: ECE 3401 Digital Systems Design
35 pages
Data Depend
No ratings yet
Data Depend
29 pages
Presentation 1
No ratings yet
Presentation 1
18 pages
Automatic Parallelization - 2: Y.N. Srikant
No ratings yet
Automatic Parallelization - 2: Y.N. Srikant
30 pages
HW 2 Is Out! Due 9/25!
No ratings yet
HW 2 Is Out! Due 9/25!
21 pages
CS 6290 Instruction Level Parallelism
No ratings yet
CS 6290 Instruction Level Parallelism
45 pages
Implicit Parallelism
No ratings yet
Implicit Parallelism
18 pages
(Valmir - C. - Barbosa) An Introduction To Distributed Algorithms PDF
100% (1)
(Valmir - C. - Barbosa) An Introduction To Distributed Algorithms PDF
318 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
ACA Unit 3
No ratings yet
ACA Unit 3
17 pages
Dependence Alanysis and Loop Normalization
No ratings yet
Dependence Alanysis and Loop Normalization
23 pages
U3.1 Concepts and Challenges
No ratings yet
U3.1 Concepts and Challenges
12 pages
Review On Embedded C
No ratings yet
Review On Embedded C
37 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
Computer Architecture Simd Vector Gpu
No ratings yet
Computer Architecture Simd Vector Gpu
16 pages
Jss Academy of Technical Education, BANGALORE-560060: Topic: Automatic Loop Vectorizarion in Parallel Computing
No ratings yet
Jss Academy of Technical Education, BANGALORE-560060: Topic: Automatic Loop Vectorizarion in Parallel Computing
14 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 26-Aug-2021 Module2-SIMD-VectorProcessors
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 26-Aug-2021 Module2-SIMD-VectorProcessors
16 pages
Verilog 1 - Fundamentals: UCSD CSE 141L - Taylor
No ratings yet
Verilog 1 - Fundamentals: UCSD CSE 141L - Taylor
30 pages
Week 11
No ratings yet
Week 11
7 pages
Seminar Report On Palm Vein Technology
No ratings yet
Seminar Report On Palm Vein Technology
23 pages
Compiler Construction: A Compulsory Module For Students in
No ratings yet
Compiler Construction: A Compulsory Module For Students in
34 pages
Task 1 Types of Parallel Processing
No ratings yet
Task 1 Types of Parallel Processing
3 pages
Lec 11
No ratings yet
Lec 11
19 pages
Unit-Iii: Instructions & Instruction Sequencing
No ratings yet
Unit-Iii: Instructions & Instruction Sequencing
8 pages
Assembly #4
No ratings yet
Assembly #4
3 pages
Course Information: Lecturers Web Page Assessment
No ratings yet
Course Information: Lecturers Web Page Assessment
6 pages
Cs 614 Current Papers
No ratings yet
Cs 614 Current Papers
25 pages
Embedded
100% (1)
Embedded
9 pages
Cs 903advanced Computer Architecture Unit - I
No ratings yet
Cs 903advanced Computer Architecture Unit - I
57 pages
Multiprocessor Configuration
100% (1)
Multiprocessor Configuration
7 pages
Module-1: Metrics and Measures
No ratings yet
Module-1: Metrics and Measures
47 pages
Unit 1 - Cloud Computing - Digital Content
No ratings yet
Unit 1 - Cloud Computing - Digital Content
69 pages
ICS 2202 Chapter 1
No ratings yet
ICS 2202 Chapter 1
11 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
34 pages
Vlsies M.tech19 13-03-2020
No ratings yet
Vlsies M.tech19 13-03-2020
50 pages
The Advantages of Upgrading To InfoSphere DataStage 8.7
No ratings yet
The Advantages of Upgrading To InfoSphere DataStage 8.7
6 pages
Multicore Programming Practices
100% (1)
Multicore Programming Practices
114 pages
Ch-6 Database System Architecture
No ratings yet
Ch-6 Database System Architecture
41 pages
Basic Concepts and Computer Evolution
No ratings yet
Basic Concepts and Computer Evolution
85 pages
Parallel Computing Terminology
No ratings yet
Parallel Computing Terminology
11 pages
Distributed Operating System - Unit-I
No ratings yet
Distributed Operating System - Unit-I
78 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
24 pages
Chapter 4
No ratings yet
Chapter 4
37 pages
Interview Question: Topper
No ratings yet
Interview Question: Topper
33 pages
Distributed Operating: Systems Spring 2005
No ratings yet
Distributed Operating: Systems Spring 2005
23 pages
Parallel Computing: Second Edition
No ratings yet
Parallel Computing: Second Edition
3 pages
TestNG Parallel Execution
No ratings yet
TestNG Parallel Execution
5 pages
Parrot
No ratings yet
Parrot
22 pages
Chapter-10 Parallel Programming Models, Languages and Compilers
No ratings yet
Chapter-10 Parallel Programming Models, Languages and Compilers
5 pages
DeepSpeed Inference - Enabling Efficient Inference of Transformer Models at Unprecedented Scale
No ratings yet
DeepSpeed Inference - Enabling Efficient Inference of Transformer Models at Unprecedented Scale
13 pages
0020.matrix Multiplication Systolic
No ratings yet
0020.matrix Multiplication Systolic
9 pages
Domain Specific Processors in Future SoC
No ratings yet
Domain Specific Processors in Future SoC
5 pages

Data Level Parallelism in Smid Andvector and Gpu: BY 19PW40 S.Sayana

Uploaded by

Data Level Parallelism in Smid Andvector and Gpu: BY 19PW40 S.Sayana

Uploaded by

DATA LEVEL PARALLELISM IN

SMID ANDVECTOR AND GPU

Data level parallelism (also

 Loop level parallelism is a form of parallelism

The simplest and most common way

 When a statement in one iteration of a loop

 Neither statements depends on itself.

There are many types of dependencies.They are

In order to preserve the sequential behaviour of a loop when run in

Example of true dependence

You might also like