0% found this document useful (0 votes)
45 views18 pages

Data Level Parallelism in Smid Andvector and Gpu: BY 19PW40 S.Sayana

Loop level parallelism exploits parallelism between iterations of a loop by distributing the loop iterations across multiple processors. It provides a speed up to the overall execution time of programs. For a loop to be parallelized using loop level parallelism, it must be free of dependencies between iterations, such as a loop-carried dependency where a statement in one iteration depends on a statement in a previous iteration. Different types of dependencies like true, anti, output, and input dependencies can exist between statements in a loop. Techniques like loop unrolling can be used to remove dependencies and enable parallel execution.

Uploaded by

SAYANA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views18 pages

Data Level Parallelism in Smid Andvector and Gpu: BY 19PW40 S.Sayana

Loop level parallelism exploits parallelism between iterations of a loop by distributing the loop iterations across multiple processors. It provides a speed up to the overall execution time of programs. For a loop to be parallelized using loop level parallelism, it must be free of dependencies between iterations, such as a loop-carried dependency where a statement in one iteration depends on a statement in a previous iteration. Different types of dependencies like true, anti, output, and input dependencies can exist between statements in a loop. Techniques like loop unrolling can be used to remove dependencies and enable parallel execution.

Uploaded by

SAYANA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

DATA LEVEL PARALLELISM IN

SMID ANDVECTOR AND GPU


BY
19PW40
S.Sayana
DATA PARALLELISM

Data level parallelism (also


known as loop level parallelism)
is a form of parallel computing for
multiple processors using a
technique for distributing the
data across different parallel
processor nodes.
LOOP LEVEL PARALLELISM

 Loop level parallelism is a form of parallelism


that is concerned with extracting parallel
tasks from loops.
 This often arises in programs where data is
often stored in random access data
structures.
 It uses multiple processes which will iterate
over data structure and operate on some or
all of the indices at the same time.
 It provides a speed up to overall execution
time of the program.
LOOP LEVEL PARALLELISM

The simplest and most common way


to increase the amount of parallelism
available among instructions is to
exploit parallelism among iterations
of a loop.This type of parallelism is
often called loop level parallelism.
DEPENDENCE IN LOOPS

Example 1:
for (i=1; i<=1000; i= i+1)
  x[i] = x[i] + y[i];
This is a parallel loop. Every iteration of
the loop can overlap with any other
iteration, although within each loop
iteration there is little opportunity for
overlap.
Example 2
for (i=1; i<=100; i= i+1){
  a[i] = a[i] + b[i];         //s1
  b[i+1] = c[i] + d[i];      //s2
}
Is this loop parallel? If not how to make it parallel?
Despite dependency this loop can be made parallel as the
dependency is not circular.
LOOP CARRIED DEPENDENCY

 When a statement in one iteration of a loop


depends in some way on a statement in a
different iteration of the same loop,a loop-
carried dependance exists.
 If a statement in one iteration of a loop
depends only on a statement in the same
iteration of the loop, this creates a loop
independent dependence.
CIRCULAR DEPENDANCY

  Neither statements depends on itself.


 When s1 depends on s2 and s2
depends on s1.
A loop is parallel unless there is a cycle in the
dependecies, since the absence of a cycle means that
the dependencies give a partial ordering on the
statements.
This allows us to replace the loop above with the following code
sequence :
a[1] = a[1] + b[1];
for (i=1; i<=99; i= i+1){
b[i+1] = c[i] + d[i];
 a[i+1] = a[i+1] + b[i+1];
}
b[101] = c[100] + d[100];
Example 3:
for (i=1; i<=100; i= i+1){
  a[i+1] = a[i] + c[i];       //S1
  b[i+1] = b[i] + a[i+1];     //S2
}
This loop is not parallel because it has cycles in the dependencies,
namely the statements S1 and S2 depend on themselves!
There are a number of techniques for converting such loop-level
parallelism into instruction-level parallelism. Basically, such
techniques work by unrolling the loop.
DEPENDENCIES IN CODE

There are many types of dependencies.They are

In order to preserve the sequential behaviour of a loop when run in


parallel, True Dependence must be preserved. Anti-Dependence and
Output Dependence can be dealt with by giving each process its own
copy of variables
EXAMPLES

Example of true dependence


S1: int a, b;
S2: a = 2;
S3: b = a + 40;
S2 ->T S3, meaning that S2 has a true
dependence on S3 because S2 writes to the
variable a, which S3 reads from.
EXAMPLES

Example of anti-dependence
S1: int a, b = 40;
S2: a = b - 38;
S3: b = -1;
S2 ->A S3, meaning that S2 has an anti-
dependence on S3 because S2 reads from
the variable b before S3 writes to it.
EXAMPLES

Example of output-dependence
S1: int a, b = 40;
S2: a = b - 38;
S3: a = 2;
S2 ->O S3, meaning that S2 has an output
dependence on S3 because both write to the
variable a.
EXAMPLES

Example of input-dependence
S1: int a, b, c = 2;
S2: a = c - 1;
S3: b = c + 1;
S2 ->I S3, meaning that S2 has an input
dependence on S3 because S2 and S3 both
read from variable c.
CHAINING,CONVOYS AND
CHIMES
 Chaining allows the results of one vector operation to be
directly used as input to another vector operation.
 A convoy is a set of vector instructions that can potentially
execute together. Only structural hazards cause separate
convoys as true dependences are handled via chaining in
the same convoy
 A chime is the unit of time taken to execute one convoy,
which is the vector length along with the startup cost.
The following VMIPS code executes in three chimes since there are
three convoys.
/* VMIPS code */ /* convoys */
LV V1,Rx 1. LV V1,Rx
MULVS.D V2,V1,F0 MULVS.D V2,V1,F0
LV V3, Ry 2. LV V3, Ry
ADDVV. D V4,V2,V3 ADDVV.D V4,V2,V3

You might also like