Mpsoc Architectures Openmp
Mpsoc Architectures Openmp
MPSoC Architectures
OpenMP
Introduction to OpenMP
l What is OpenMP?
l Open specification for Multi-Processing
l “Standard” API for defining multi-threaded shared-
memory programs
–www.openmp.org – Talks, examples, forums, etc.
l High-level API
l Preprocessor (compiler) directives ( ~ 80% )
l Library Calls ( ~ 19% )
l Environment Variables ( ~ 1% )
1
11/7/17
l OpenMP will:
l Allow a programmer to separate a program into serial
regions and parallel regions, rather than T concurrently-
executing threads.
l Hide stack management
l Provide synchronization constructs
Outline
l Introduction
l Motivating example
l Parallel Programming is Hard
l Discussion
l specOMP
2
11/7/17
int main() {
pthread_attr_t attr;
pthread_t threads[16];
int tn;
pthread_attr_init(&attr);
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
for(tn=0; tn<16; tn++) {
pthread_create(&threads[tn], &attr, SayHello, NULL);
}
for(tn=0; tn<16 ; tn++) {
pthread_join(threads[tn], NULL);
}
return 0;
}
3
11/7/17
Motivation
Motivation
4
11/7/17
Motivation – OpenMP
int main() {
return 0;
}
5
11/7/17
Motivation – OpenMP
int main() {
omp_set_num_threads(16);
return 0;
}
6
11/7/17
// parallel region
Fork
#pragma omp parallel
{
printf(“World”);
}
7
11/7/17
Fork
– Nesting complication handled
“automagically” at compile-time
Fork
– Independent of the number of
threads actually running Join
Join
8
11/7/17
Example
int main() {
return 0;
l Control parallelism
l Threads perform differing
functions
- One thread for I/O, one for
computation, etc… Join
9
11/7/17
Memory Model
l Shared memory communication
l Threads cooperates by accessing shared variables
l The sharing is defined syntactically
l Any variable that is seen by two or more threads is
shared
l Any variable that is seen by one thread only is
private
l Race conditions possible
l Use synchronization to protect from conflicts
l Change how data is stored to minimize the
synchronization
10
11/7/17
Structure
11
11/7/17
The problem
l Executes the same code as many times as there are
threads
l How many threads do we have? omp_set_num_threads(n)
What is the use of repeating the same work n times in
parallel? Can use omp_thread_num() to distribute the work
between threads.
l D is shared between the threads, i and sum are private
12
11/7/17
13
11/7/17
Controlling Granularity
l #pragma omp parallel if (expression)
l Can be used to disable parallelization in some
cases (when the input is determined to be too small
to be beneficially multithreaded)
l #pragma omp num_threads (expression)
l Control the number of threads used for this parallel
region
14
11/7/17
Example
15
11/7/17
Sections
l The SECTIONS directive is a non-iterative
work-sharing construct. It specifies that the
enclosed section(s) of code are to be divided
among the threads in the team.
l Independent SECTION directives are nested
within a SECTIONS directive.
l Each SECTION is executed once by a thread in the
team. Different sections may be executed by
different threads. It is possible that for a thread to
execute more than one section if it is quick enough
and the implementation permits such.
16
11/7/17
Example
#include <omp.h>
#define N 1000
main ()
{
int i;
float a[N], b[N], c[N], d[N];
/* Some initializations */
for (i=0; i < N; i++) {
a[i] = i * 1.5;
b[i] = i + 22.35;
}
Example
#pragma omp parallel shared(a,b,c,d) private(i)
{
#pragma omp sections
{
#pragma omp section
for (i=0; i < N; i++)
c[i] = a[i] + b[i];
#pragma omp section
for (i=0; i < N; i++)
d[i] = a[i] * b[i];
} /* end of sections */
} /* end of parallel section */
}
17
11/7/17
Data Sharing
l Shared Memory programming model
l Most variables are shared by default
l We can define a variable as private
• OpenMP: {
– shared variables are shared /* Calc. here */
– private variables are private
}
}
18
11/7/17
Int i;
#pragma omp parallel for private(i)
for (i=0; i<n; i++) { ... }
19
11/7/17
Thread private
l Similar to private, but defined per variable
l Declaration immediately after variable definition.
l Must be visible in all translation units. Persistent
between parallel sections
l Can be initialized from the master's copy with
l #pragma omp copyin
l More efficient than private, but a global variable!
Synchronization
l What should the result be (assuming 2
threads)?
X=0;
#pragma omp parallel
X = X+1;
20
11/7/17
Synchronization
l 2 is the expected answer But can be 1 with
unfortunate interleaving
l OpenMP assumes that the programmer knows
what he is doing
l Regions of code that are marked to run in
parallel are independent If access collisions are
possible, it is the programmer's responsibility to
insert protection
Synchronization
l Many of the existing mechanisms for shared
programming
l OpenMP Synchronization
l Nowait (turn synchronization off!)
l Single/Master execution
l Critical sections, Atomic updates
l Ordered
l Barriers
l Flush (memory subsystem synchronization)
l Reduction (special case)
21
11/7/17
Single/Master
l #pragma omp single
l Only one of the threads will execute the following
block of code
l The rest will wait for it to complete
l Good for non-thread-safe regions of code (such as
I/O)
l Must be used in a parallel region
l Applicable to parallel for sections
Single/Master
l #pragma omp master
l The following block will be executed by the master
thread
l No synchronization involved
l Applicable only to parallel sections
#pragma omp parallel
{
do_preprocessing () ;
#pragma omp single
read_input () ;
#pragma omp master
notify_input_consumed () ;
do_processing () ; }
22
11/7/17
Critical Sections
l #pragma omp critical [name]
l Standard critical section functionality
l Critical sections are global in the program
l Can be used to protect a single resource in different
functions
l Critical sections are identified by the name
l All the unnamed critical sections are mutually
exclusive throughout the program
l All the critical sections having the same name are
mutually exclusive between themselves
Critical Sections
int x=0;
#pragma omp parallel shared(x)
{
#pragma omp critical
x++;
}
23
11/7/17
Ordered
l #pragma omp ordered statement
l Executes the statement in the sequential order
of iterations
l Example:
#pragma omp parallel for ordered
for (j=0; j<N; j++) {
int result = j*j;
#pragma omp ordered
printf ("computation(%d) = %d\n" ,j ,
result ) ;
}
Barrier synchronization
l #pragma omp barrier
l Performs a barrier synchronization between all
the threads in a team at the given point.
l Example:
#pragma omp parallel
{
int result = heavy_computation_part1 ()
;
#pragma omp atomic
sum += result ;
#pragma omp barrier
heavy_computation_part2 (sum) ;
}
24
11/7/17
Explicit Locking
l Can be used to pass lock variables around
(unlike critical sections!)
l Can be used to implement more involved
synchronization constructs
l Functions:
l omp_init_lock(), omp_destroy_lock(),
omp_set_lock(), omp_unset_lock(), omp_test_lock()
The usual semantics
l Use #pragma omp flush to synchronize memory
Consistency Violation?
25
11/7/17
Consistency Violation?
Reduction
for (j=0; j<N; j++) {
sum =
sum+a[j]∗b[j];
}
l How to parallelize this code?
l sum is not private, but accessing it atomically is too
expensive
l Have a private copy of sum in each thread, then
add them up
l Use the reduction clause!
l #pragma omp parallel for reduction(+: sum)
l An operator must be used: +, -, *...
26
11/7/17
Synchronization Overhead
l Lost time waiting for locks
l Prefer to use structures that are as lock-free as
possible!
Summary
l OpenMP is a compiler-based technique to create
concurrent code from (mostly) serial code
l OpenMP can enable (easy) parallelization of loop-
based code
l Lightweight syntactic language extensions
27
11/7/17
More Information
• www.openmp.org
l OpenMP official site
• www.llnl.gov/computing/tutorials/openMP/
l A handy OpenMP tutorial
• www.nersc.gov/nusers/help/tutorials/openmp/
l Another OpenMP tutorial and reference
Backup Slides
Syntax, etc
28
11/7/17
lOpenMP Syntax
lOpenMP Syntax
l PARALLEL syntax
#pragma omp parallel [clause…] CR
structured_block
Ex:
#pragma omp parallel
Output:
Hello! (T=4)
Hello!
{
Hello!
printf(“Hello!\n”);
Hello!
} // implicit barrier
29
11/7/17
l OpenMP Syntax
l DO/for Syntax (DO-Fortran, for-C)
#pragma omp for [clause…] CR
for_loop
Ex:
#pragma omp parallel
{
#pragma omp for private(i) shared(x) \
schedule(static,x/N)
for(i=0;i<x;i++) printf(“Hello!\n”);
} // implicit barrier
Note: Must reside inside a parallel section
l OpenMP Syntax
More on Clauses
• private() – A variable in private list is private
to each thread
• shared() – Variables in shared list are visible
to all threads
l Implies no synchronization, or even consistency!
• schedule() – Determines how iterations will
be divided among threads
–schedule(static, C) – Each thread will be
given C iterations
- Usually T*C = Number of total iterations
–schedule(dynamic) – Each thread will be given
additional iterations as-needed
- Often less efficient than considered static allocation
• nowait – Removes implicit
CS Architecture Seminarbarrier from end of
block
30
11/7/17
OpenMP Syntax
l
Ex:
#pragma omp parallel for shared(x)\
private(i)
\
schedule(dynamic)
for(i=0;i<x;i++) {
printf(“Hello!\n”);
lExample: AddMatrix
Files:
(Makefile)
addmatrix.c // omp-
parallelized
matrixmain.c // non-omp
printmatrix.c // non-omp
31
11/7/17
lOpenMP Syntax
l ATOMIC syntax
#pragma omp atomic CR
simple_statement
Ex:
#pragma omp parallel shared(x)
{
#pragma omp atomic
x++;
} // implicit barrier
OpenMP Syntax
• CRITICAL syntax
#pragma omp critical CR
structured_block
Ex:
#pragma omp parallel shared(x)
{
#pragma omp critical
{
// only one thread in here
}
} // implicit barrier
32
11/7/17
l OpenMP Syntax
ATOMIC vs. CRITICAL
l OpenMP Syntax
l MASTER – only Thread 0 executes a block
#pragma omp master CR
structured_block
l SINGLE – onlyomp
#pragma one single
thread executes
CR a block
structured_block
l No implied synchronization
33
11/7/17
lOpenMP Syntax
l BARRIER
#pragma omp barrier CR
l Locks
l Locks are provided through omp.h library calls
–omp_init_lock()
–omp_destroy_lock()
–omp_test_lock()
–omp_set_lock()
–omp_unset_lock()
lOpenMP Syntax
l FLUSH
#pragma omp flush CR
34
11/7/17
lOpenMP Syntax
l Functions
omp_set_num_threads()
omp_get_num_threads()
omp_get_max_threads()
omp_get_num_procs()
omp_get_thread_num()
omp_set_dynamic()
omp_[init destroy test set
unset]_lock()
35