0% found this document useful (0 votes)

379 views47 pages

OpenMP Basics

OpenMP is designed for shared memory parallel programming. It uses compiler directives or function calls to parallelize loops and sections of code across multiple threads. Some key points: - OpenMP provides an easy way to parallelize existing serial code with little effort through compiler pragmas and directives. - It is best suited for shared memory machines like SMP systems. Codes can only run on shared memory and not distributed memory architectures. - Common parallelization approaches include parallelizing loops and sections of code. Loop parallelization using directives like #pragma omp parallel for is very common. - Care must be taken to avoid data dependencies between loop iterations that could cause undefined behavior if iterations execute out of order.

Uploaded by

captainnsane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

379 views47 pages

OpenMP Basics

Uploaded by

captainnsane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 47

Introduction to OpenMP

Philip Blood
Scientific Specialist
Pittsburgh Supercomputing Center

Jeff Gardner (U. of Washington)

Shawn Brown (PSC)
Different types of parallel platforms:
Distributed Memory
Different types of parallel platforms:
Shared Memory
Different types of parallel
platforms: Shared Memory
• SMP: Symmetric Multiprocessing
– Identical processing units working from the same main
memory
– SMP machines are becoming more common in the everyday
workplace
• Dual-socket motherboards are very common, and quad-sockets
are not uncommon
• 2 and 4 core CPUs are now commonplace
• Intel Larabee: 12-48 cores in 2009-2010
• ASMP: Asymmetric Multiprocessing
– Not all processing units are identical
– Cell processor of PS3
Parallel Programming Models
• Shared Memory
– Multiple processors sharing the same memory space
• Message Passing
– Users make calls that explicitly share information between
execution entities
• Remote Memory Access
– Processors can directly access memory on another
processor
• These models are then used to build more
sophisticated models
– Loop Driven
– Function Driven Parallel (Task-Level)
Shared Memory Programming
• SysV memory manipulation
– One can actually create, manipulate, shared memory spaces.
• Pthreads (Posix Threads)
– Lower level Unix library to build multi-threaded programs
• OpenMP (www.openmp.org)
– Protocol designed to provide automatic parallelization
through compiler pragmas.
– Mainly loop driven parallelism
– Best suited to desktop and small SMP computers
• Caution: Race Conditions
– When two threads are changing the same memory location
at the same time.
Introduction
• OpenMP is designed for shared memory systems.
• OpenMP is easy to use
– achieve parallelism through compiler directives
– or the occasional function call
• OpenMP is a “quick and dirty” way of parallelizing a
program.
• OpenMP is usually used on existing serial programs
to achieve moderate parallelism with relatively little
effort
Computational Threads
•Each processor has one thread assigned to it
•Each thread runs one copy of your program

Thread 0Thread 1Thread 2 Thread n

OpenMP Execution Model
• In MPI, all threads are active all the time
• In OpenMP, execution begins only on
the master thread. Child threads are
spawned and released as needed.
– Threads are spawned when program
enters a parallel region.
– Threads are released when program exits
a parallel region
OpenMP Execution Model
Parallel Region Example:
For loop
Fortran:
This comment or pragma
!$omp parallel do tells openmp compiler to
do i = 1, n spawn threads *and*
distribute work among those
a(i) = b(i) + c(i)
enddo threads
C/C++: These actions are combined
#pragma omp parallel for
here but they can be
for(i=1; i<=n; i++)
specified separately
a[i] = b[i] + c[i];
between the threads
Pros of OpenMP
• Because it takes advantage of shared memory, the
programmer does not need to worry (that much)
about data placement
• Programming model is “serial-like” and thus
conceptually simpler than message passing
• Compiler directives are generally simple and easy to
use
• Legacy serial code does not need to be rewritten
Cons of OpenMP
• Codes can only be run in shared memory
environments!
– In general, shared memory machines beyond ~8
CPUs are much more expensive than distributed
memory ones, so finding a shared memory system
to run on may be difficult
• Compiler must support OpenMP
– whereas MPI can be installed anywhere
– However, gcc 4.2 now supports OpenMP
Cons of OpenMP
• In general, only moderate speedups can be
achieved.
– Because OpenMP codes tend to have
serial-only portions, Amdahl’s Law
prohibits substantial speedups

• Amdahl’s Law:
F = Fraction of serial execution time that cannot
be
parallelized
N = Number of processors

If you have big

Execution time = loops that dominate
execution time,
these are ideal
targets for OpenMP
Goals of this lecture
• Exposure to OpenMP
– Understand where OpenMP may be useful to you
now
– Or perhaps 4 years from now when you need to
parallelize a serial program, you will say, “Hey! I
can use OpenMP.”
• Avoidance of common pitfalls
– How to make your OpenMP actually get the same
answer that it did in serial
– A few tips on dramatically increasing the
performance of OpenMP applications
Compiling and Running OpenMP

• True64: -mp
• SGI IRIX: -mp
• IBM AIX: -qsmp=omp
• Portland Group: -mp
• Intel: -openmp
• gcc (4.2) -fopenmp
Compiling and Running OpenMP
• OMP_NUM_THREADS environment
variable sets the number of processors
the OpenMP program will have at its
disposal.
• Example script
#!/bin/tcsh
setenv OMP_NUM_THREADS 4
mycode < my.in > my.out
OpenMP Basics:
2 Approaches to Parallelism
Divide loop Divide various
iterations among sections of
threads: We will
focus mainly on
code between
loop level threads
parallelism in this
lecture
Sections: Functional parallelism
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp
section
block1
#pragma omp
section
block2 Image from:
https://computing.llnl.gov/tutorials/openMP
}
Parallel DO/for:
Loop level parallelism
Fortran:
!$omp parallel do
do i = 1, n
a(i) = b(i) + c(i)
enddo

C/C++:
#pragma omp parallel for
for(i=1; i<=n; i++)
a[i] = b[i] + c[i];
Image from:
https://computing.llnl.gov/tutorials/openMP
Pitfall #1: Data dependencies
• Consider the following code:
a[0] = 1;
for(i=1; i<5; i++)
a[i] = i + a[i-1];

• There are dependencies between loop

iterations.
• Sections of loops split between threads will
not necessarily execute in order
• Out of order loop execution will result in
undefined behavior
Pitfall #1: Data dependencies
• 3 simple rules for data dependencies
1. All assignments are performed on arrays.
2. Each element of an array is assigned to
by at most one iteration.
3. No loop iteration reads array elements
modified by any other iteration.
Avoiding dependencies by using
Private Variables (Pitfall #1.5)
• Consider the following loop:
#pragma omp parallel for
{
for(i=0; i<n; i++){
temp = 2.0*a[i];
a[i] = temp;
b[i] = c[i]/temp;
}
}
• By default, all threads share a common address
space. Therefore, all threads will be modifying temp
simultaneously
Avoiding dependencies by using
Private Variables (Pitfall #1.5)
• The solution is to make temp a thread-
private variable by using the “private”
clause:
#pragma omp parallel for private(temp)
{
for(i=0; i<n; i++){
temp = 2.0*a[i];
a[i] = temp;
b[i] = c[i]/temp;
}
}
Avoiding dependencies by using
Private Variables (Pitfall #1.5)
• Default OpenMP behavior is for variables to be
shared. However, sometimes you may wish to
make the default private and explicitly declare
your shared variables (but only in Fortran!):
!$omp parallel do default(private) shared(n,a,b,c)
do i=1,n
temp = 2.0*a(i)
a(i) = temp
b(i) = c(i)/temp;
enddo
!$omp end parallel do
Private variables
• Note that the loop iteration variable (e.g. i in
previous example) is private by default

• Caution: The value of any variable specified

as private is undefined both upon entering
and leaving the construct in which it is
specified

• Use firstprivate and lastprivate clauses to

retain values of variables declared as private
Use of function calls within
parallel loops
• In general, the compiler will not parallelize a loop that
involves a function call unless is can guarantee that
there are no dependencies between iterations.
– sin(x) is OK, for example, if x is private.

• A good strategy is to inline function calls within loops.

If the compiler can inline the function, it can usually
verify lack of dependencies.

• System calls do not parallelize!!!

Pitfall #2: Updating shared
variables simultaneously
Consider the following serial code:
the_max = 0;
for (i=0;i<n; i++)
the_max = max(myfunc(a[i]), the_max);
• This loop can be executed in any order, however the_max is
modified every loop iteration.
• Use “critical” clause to specifiy code segments that can only be
executed by one thread at a time:
#pragma omp parallel for private(temp)
{
for(i=0; i<n; i++){
temp = myfunc(a[i]);
#pragma omp critical
the_max = max(temp, the_max);
}
}
Reduction operations
• Now consider a global sum:
for(i=0; i<n; i++)
sum = sum + a[i];
• This can be done by defining “critical” sections, but
for convenience, OpenMP also provides a reduction
clause:
#pragma omp parallel for reduction(+:sum)
{
for(i=0; i<n; i++)
sum = sum + a[i];
}
Reduction operations
• C/C++ reduction-able operators (and initial values):
– +(0)
– - (0)
– * (1)
– &(~0)
– | (0)
– ^ (0)
– && (1)
– || (0)
Pitfall #3: Parallel overhead
• Spawning and releasing threads results
in significant overhead.
Pitfall #3: Parallel overhead
Pitfall #3: Parallel Overhead
• Spawning and releasing threads results
in significant overhead.
• Therefore, you want to make your
parallel regions as large as possible
– Parallelize over the largest loop that you
can (even though it will involve more work
to declare all of the private variables and
eliminate dependencies)
– Coarse granularity is your friend!
Separating “Parallel” and “For”
directives to reduce overhead
• In the following example, threads are
spawned only once, not once per loop:
#pragma omp parallel { !$omp parallel
#pragma omp for !$omp do
for(i=0; i<maxi; i++) do i=1,maxi
a[i] = b[i]; a(i) = b(i)
enddo
#pragma omp for !$omp end do !(optional)
for(j=0; j<maxj; j++)
c[j] = d[j]; !$omp do
} do i=1,maxj
c(j) = d(j)
enddo
!$omp end do !(optional)
!$omp end parallel !
(required)
Use “nowait” to avoid barriers
• At the end of every loop is an implied barrier.
• Use “nowait” to remove the barrier at the end of the
first loop:
#pragma omp parallel {
#pragma omp for nowait
for(i=0; i<maxi; i++)
a[i] = b[i];
#pragma omp for
Barrier removed by
for(j=0; j<maxj; j++) “nowait” clause
c[j] = d[j];
}
Use “nowait” to avoid barriers
In Fortran, “nowait” goes at end of loop:
!$omp parallel
!$omp do
do i=1,maxi
a(i) = b(i)
enddo
!$omp end do nowait
Barrier removed by
!$omp do “nowait” clause
do i=1,maxj
c(j) = d(j)
enddo
!$omp end do
!$omp end parallel
Other useful directives to avoid
releasing and spawning threads
• #pragma omp master
!$omp master ... !$omp end master
– Denotes codes within a parallel region to only be executed
by the master
• #pragma omp single
– Denotes code that will be performed only one thread
– Useful for overlapping serial segments with parallel
computation.
• #pragma omp barrier
– Sets a global barrier within a parallel region
Thread stack
• Each thread has its own memory region
called the thread stack
• This can grow to be quite large, so default
size may not be enough
• This can be increased (e.g. to 16 MB):
csh:
limit stacksize 16000; setenv KMP_STACKSIZE 16000000
bash:
ulimit -s 16000; export KMP_STACKSIZE=16000000
Useful OpenMP Functions
• void omp_set_num_threads(int num_threads)
– Sets the number of OpenMP threads (overrides
OMP_NUM_THREADS)
• int omp_get_thread_num()
– Returns the number of the current thread
• int omp_get_num_threads()
– Returns the total number of threads currently
participating in a parallel region
– Returns “1” if executed in a serial region
• For portability, surround these functions with
#ifdef _OPENMP
• #include <omp.h>
Optimization: Scheduling

• OpenMP partitions workload into

“chunks” for distribution among threads
• Default strategy is static:
0
Chunk 0 Thread 0
Loop iterations

1
2
Chunk 1 Thread 1
3
4
Chunk 2 Thread 2
5
6
Chunk 3 Thread 3
7
Optimization: Scheduling
• This strategy has the least amount of overhead
• However, if not all iterations take the same amount of
time, this simple strategy will lead to load imbalance.

0
Chunk 0 Thread 0
Loop iterations

1
2
Chunk 1 Thread 1
3
4
Chunk 2 Thread 2
5
6
Chunk 3 Thread 3
7
Optimization: Scheduling
• OpenMP offers a variety of scheduling
strategies:
– schedule(static,[chunksize])
• Divides workload into equal-sized chunks
• Default chunksize is Nwork/Nthreads
– Setting chunksize to less than this will result in chunks
being assigned in an interleaved manner
• Lowest overhead
• Least optimal workload distribution
Optimization: Scheduling
– schedule(dynamic,[chunksize])
• Dynamically assigned chunks to threads
• Default chunksize is 1
• Highest overhead
• Optimal workload distribution
– schedule(guided,[chunksize])
• Starts with big chunks proportional to (number of
unassigned iterations)/(number of threads), then makes
them progressively smaller until chunksize is reached
• Attempts to seek a balance between overhead and
workload optimization
Optimization: Scheduling
– schedule(runtime)
• Scheduling can be selected at runtime using
OMP_SCHEDULE
• e.g. setenv OMP_SCHEDULE “guided, 100”
– In practice, often use:
• Default scheduling (static, large chunks)
• Guided with default chunksize
– Experiment with your code to determine
optimal strategy
What we have learned
• How to compile and run OpenMP progs
• Private vs. shared variables
• Critical sections and reductions for
updating scalar shared variables
• Techniques for minimizing thread
spawning/exiting overhead
• Different scheduling strategies
Summary
• OpenMP is often the easiest way to achieve
moderate parallelism on shared memory
machines
• In practice, to achieve decent scaling, will
probably need to invest some amount of effort
in tuning your application.
• More information available at:
– https://computing.llnl.gov/tutorials/openMP/
– http://www.openmp.org
– Using OpenMP, MIT Press, 2008
Hands-On
If you’ve finished parallelizing the Laplace code
(or you want a break from MPI):

Go to www.psc.edu/~blood and click on

OpenMPHands-On_PSC.pdf for introductory
exercises and examples.

HCIP-AI-EI Developer V2.0 Training Material
No ratings yet
HCIP-AI-EI Developer V2.0 Training Material
508 pages
1.introduction To Microprocessor of 8085 - 2024
No ratings yet
1.introduction To Microprocessor of 8085 - 2024
77 pages
Mod 1-10 Quiz
No ratings yet
Mod 1-10 Quiz
26 pages
Os Notes PDF
No ratings yet
Os Notes PDF
78 pages
Loan Approval Predictor Using Data Science and Machine Learning Project
100% (1)
Loan Approval Predictor Using Data Science and Machine Learning Project
66 pages
DCA6201 Operating System (All Units) PDF
No ratings yet
DCA6201 Operating System (All Units) PDF
258 pages
Microprocessors and Interfacing Devices - Unit-1
No ratings yet
Microprocessors and Interfacing Devices - Unit-1
42 pages
Unix Commands Document
No ratings yet
Unix Commands Document
21 pages
Jump, Loop and Call Instructions: The 8051 Microcontroller and Embedded Systems: Using Assembly and C
No ratings yet
Jump, Loop and Call Instructions: The 8051 Microcontroller and Embedded Systems: Using Assembly and C
25 pages
Ch02 OS9e
No ratings yet
Ch02 OS9e
97 pages
UNIX and LINUX Commands
No ratings yet
UNIX and LINUX Commands
79 pages
Online
No ratings yet
Online
15 pages
PC File
No ratings yet
PC File
57 pages
Week 2 Tags Coursera Answer
No ratings yet
Week 2 Tags Coursera Answer
13 pages
Bilal Ahmed Shaik CPP
100% (3)
Bilal Ahmed Shaik CPP
16 pages
OpenMP Presentation
No ratings yet
OpenMP Presentation
51 pages
Performance Analysis - Understanding Perfstat Data: Spencer G. Watson
No ratings yet
Performance Analysis - Understanding Perfstat Data: Spencer G. Watson
68 pages
Presentation English Operating System
100% (1)
Presentation English Operating System
16 pages
NetBatch Tutorial
50% (2)
NetBatch Tutorial
4 pages
Eisenberg Mcguire
No ratings yet
Eisenberg Mcguire
3 pages
Cifar 10
No ratings yet
Cifar 10
7 pages
Course: Operating Systems Assignment #2 - Simple Operating System
No ratings yet
Course: Operating Systems Assignment #2 - Simple Operating System
11 pages
Client-Server Reviewer
No ratings yet
Client-Server Reviewer
7 pages
Modern Tutorial
No ratings yet
Modern Tutorial
18 pages
C++ Quick Guide: #Include Using Namespace STD // Main Is Where Program Execution Begins. Int Main (
100% (2)
C++ Quick Guide: #Include Using Namespace STD // Main Is Where Program Execution Begins. Int Main (
15 pages
Os PDF
No ratings yet
Os PDF
58 pages
CH 5
No ratings yet
CH 5
30 pages
Logical Addressing: IPV4 Addressing - Computer Networks
No ratings yet
Logical Addressing: IPV4 Addressing - Computer Networks
19 pages
17bce0500 VL2019201001477 Ast02 PDF
No ratings yet
17bce0500 VL2019201001477 Ast02 PDF
6 pages
Otrs Itsm Book
No ratings yet
Otrs Itsm Book
86 pages
Linux Booting Process
100% (1)
Linux Booting Process
10 pages
Nmon Performance - A Free Tool To Analyze AIX and Linux Performance
No ratings yet
Nmon Performance - A Free Tool To Analyze AIX and Linux Performance
19 pages
Xzno22222222222 PDF
No ratings yet
Xzno22222222222 PDF
278 pages
Multiprocessors Interconnection Networks
No ratings yet
Multiprocessors Interconnection Networks
32 pages
Unit II - Loops and Function Pointers, Queues
No ratings yet
Unit II - Loops and Function Pointers, Queues
34 pages
PES University, Bengaluru
No ratings yet
PES University, Bengaluru
8 pages
RNIC Verbs Overview2
No ratings yet
RNIC Verbs Overview2
28 pages
1.3.1. Basic Commands in Linux - Exercises-1
No ratings yet
1.3.1. Basic Commands in Linux - Exercises-1
1 page
Ericsson Radio System: Baseband T605
No ratings yet
Ericsson Radio System: Baseband T605
3 pages
Inside The Linux Kernel Debugger
100% (1)
Inside The Linux Kernel Debugger
9 pages
Computer Systems 1 - 2 PDF
No ratings yet
Computer Systems 1 - 2 PDF
10 pages
Multiprocessor Configuration
100% (1)
Multiprocessor Configuration
7 pages
Parallel and Distributed Computing CSE4001 Lab - 4
100% (1)
Parallel and Distributed Computing CSE4001 Lab - 4
5 pages
Vmware Vmotion In-Depth
No ratings yet
Vmware Vmotion In-Depth
4 pages
Array Processors: SIMD Computer Organization
100% (1)
Array Processors: SIMD Computer Organization
45 pages
Account Manager CV 1
No ratings yet
Account Manager CV 1
3 pages
Operating System Interview Questions
No ratings yet
Operating System Interview Questions
22 pages
Untitled
No ratings yet
Untitled
25 pages
Algorithm Analysis
No ratings yet
Algorithm Analysis
61 pages
Test Blanc
No ratings yet
Test Blanc
23 pages
Chapter 5 - Strings, Procedures and Macros: From Microprocessors and Interfacing by Douglas Hall
No ratings yet
Chapter 5 - Strings, Procedures and Macros: From Microprocessors and Interfacing by Douglas Hall
25 pages
Course Outline: Windows Server Administration (55371AC)
No ratings yet
Course Outline: Windows Server Administration (55371AC)
6 pages
Vxworks Rtos: Embedded Systems-Assignment
No ratings yet
Vxworks Rtos: Embedded Systems-Assignment
6 pages
Synchronization Primitives
No ratings yet
Synchronization Primitives
14 pages
Adobe Premiere Pro
67% (3)
Adobe Premiere Pro
7 pages
Internetworking, Fragmentation, IPv4
100% (4)
Internetworking, Fragmentation, IPv4
43 pages
GCC Profile Guided Optimization
No ratings yet
GCC Profile Guided Optimization
47 pages
Network Simulator2 (NS2) Basic Commands
100% (7)
Network Simulator2 (NS2) Basic Commands
4 pages
E 3 (Openmp - Iii) : Matrix Multiplication
No ratings yet
E 3 (Openmp - Iii) : Matrix Multiplication
10 pages
Introduction To The Linux Operating System
No ratings yet
Introduction To The Linux Operating System
5 pages
VCentric Sambit Kumar Pradhan SAP ABAP 5yrs Wipro Bangalore
No ratings yet
VCentric Sambit Kumar Pradhan SAP ABAP 5yrs Wipro Bangalore
3 pages
FreeNAS Server Manual
No ratings yet
FreeNAS Server Manual
11 pages
Nasmdoc
No ratings yet
Nasmdoc
303 pages
Chitransh Saxena Project - Pagenumber
No ratings yet
Chitransh Saxena Project - Pagenumber
40 pages
Technical Documentation For Template Stories in People Analytics
No ratings yet
Technical Documentation For Template Stories in People Analytics
385 pages
PSS SB 3006-3 Operating Manual 20955-En-05
No ratings yet
PSS SB 3006-3 Operating Manual 20955-En-05
83 pages
T2 File Handling
No ratings yet
T2 File Handling
15 pages
VMWare Syllabus
100% (1)
VMWare Syllabus
2 pages
ITMS 7 6 Installation and Upgrade Guide English
No ratings yet
ITMS 7 6 Installation and Upgrade Guide English
223 pages
Siemens Profinet CMMT PN 1 20EN
No ratings yet
Siemens Profinet CMMT PN 1 20EN
26 pages
API Functional Description
No ratings yet
API Functional Description
79 pages
Software Requirement Specification: Film Review System
No ratings yet
Software Requirement Specification: Film Review System
11 pages
Student Marks Management System
No ratings yet
Student Marks Management System
16 pages
Haritha Resume - DXC
No ratings yet
Haritha Resume - DXC
3 pages
Spring Boot
No ratings yet
Spring Boot
5 pages
Bca Syllabus
No ratings yet
Bca Syllabus
25 pages
Comparing C++ Java and C#
No ratings yet
Comparing C++ Java and C#
40 pages
Standard Framework Security Management: Final Confidential
No ratings yet
Standard Framework Security Management: Final Confidential
39 pages
Digital Cataloguing Practices March 2017
No ratings yet
Digital Cataloguing Practices March 2017
23 pages
Audit Report - Oneextel
No ratings yet
Audit Report - Oneextel
17 pages
Awp Report
No ratings yet
Awp Report
9 pages
0001 Visualizing-and-Forecasting-Stocks-with-Dash
No ratings yet
0001 Visualizing-and-Forecasting-Stocks-with-Dash
9 pages
PAPER HUNGRY ROBOT Pringles Recycle Arduino Robot
No ratings yet
PAPER HUNGRY ROBOT Pringles Recycle Arduino Robot
20 pages
Discovre Efficient Cross Architecture Identification Bugs Binary Code
No ratings yet
Discovre Efficient Cross Architecture Identification Bugs Binary Code
15 pages
Categories of Software Maintenance
No ratings yet
Categories of Software Maintenance
24 pages
Google Mail Blacklist Removal: Get Answers From Your Peers Along With Millions of IT Pros Who Visit Spiceworks
No ratings yet
Google Mail Blacklist Removal: Get Answers From Your Peers Along With Millions of IT Pros Who Visit Spiceworks
9 pages
Mpti
No ratings yet
Mpti
4 pages
Riya Resume
No ratings yet
Riya Resume
1 page
Website Security
No ratings yet
Website Security
4 pages
TIBCO Software The Ultimate Step-By-Step Guide
From Everand
TIBCO Software The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet

OpenMP Basics

Uploaded by

OpenMP Basics

Uploaded by

Introduction to OpenMP

Jeff Gardner (U. of Washington)

Thread 0Thread 1Thread 2 Thread n

If you have big

• There are dependencies between loop

• Caution: The value of any variable specified

• Use firstprivate and lastprivate clauses to

• A good strategy is to inline function calls within loops.

• System calls do not parallelize!!!

• OpenMP partitions workload into

Go to www.psc.edu/~blood and click on

You might also like