0% found this document useful (0 votes)

45 views25 pages

Lecture Open MP

OpenMP is a specification for multi-platform shared memory parallel programming. It uses compiler directives, library routines, and environment variables to define parallel regions of code. OpenMP allows a programmer to separate serial and parallel regions of a program without managing threads directly. It provides constructs for parallelizing loops and synchronizing threads that hide complexity and allow portable parallel programming across platforms.

Uploaded by

Dr. V. Padmavathi Associate Professor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views25 pages

Lecture Open MP

Uploaded by

Dr. V. Padmavathi Associate Professor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 25

Parallel

Programming
with OpenMP

CS240A, T. Yang, 2013

Modified from Demmel/Yelick’s
and Mary Hall’s Slides

1
Introduction to OpenMP
• What is OpenMP?
• Open specification for Multi-Processing
• “Standard” API for defining multi-threaded shared-memory
programs
• openmp.org – Talks, examples, forums, etc.

• High-level API
• Preprocessor (compiler) directives ( ~ 80% )
• Library Calls ( ~ 19% )
• Environment Variables ( ~ 1% )

2
A Programmer’s View of OpenMP
• OpenMP is a portable, threaded, shared-memory
programming specification with “light” syntax
• Exact behavior depends on OpenMP implementation!
• Requires compiler support (C or Fortran)

• OpenMP will:
• Allow a programmer to separate a program into serial regions and
parallel regions, rather than T concurrently-executing threads.
• Hide stack management
• Provide synchronization constructs

• OpenMP will not:

• Parallelize automatically
• Guarantee speedup
• Provide freedom from data races

3
Motivation – OpenMP

int main() {

// Do this part in parallel

printf( "Hello, World!\n" );

return 0;
}

4
Motivation – OpenMP

int main() {

omp_set_num_threads(4);

// Do this part in parallel Printf Printf Printf Printf

#pragma omp parallel
{
printf( "Hello, World!\n" );
}

return 0;
}

5
OpenMP parallel region construct
• Block of code to be executed by multiple threads in
parallel
• Each thread executes the same code redundantly
(SPMD)
• Work within work-sharing constructs is distributed among the
threads in a team
• Example with C/C++ syntax
#pragma omp parallel [ clause [ clause ] ... ] new-line
structured-block
• clause can include the following:
private (list)
shared (list)
OpenMP Data Parallel Construct: Parallel Loop
• All pragmas begin: #pragma
• Compiler calculates loop bounds for each thread directly
from serial source (computation decomposition)
• Compiler also manages data partitioning
• Synchronization also automatic (barrier)
Programming Model – Parallel Loops
• Requirement for parallel loops
• No data dependencies
(reads/write or write/write
pairs) between iterations!

• Preprocessor calculates loop

bounds and divide iterations
among parallel threads

#pragma omp parallel for

?
for( i=0; i < 25; i++ )
{
printf(“Foo”);
}
8
OpenMp: Parallel Loops with Reductions
• OpenMP supports reduce operation
sum = 0;
#pragma omp parallel for reduction(+:sum)
for (i=0; i < 100; i++) {
sum += array[i];
}

• Reduce ops and init() values (C and C++):

+ 0 bitwise & ~0 logical & 1
- 0 bitwise | 0 logical | 0
* 1 bitwise ^ 0
Example: Trapezoid Rule for Integration

• Straight-line approximation
1
f ( x )dx   c i f ( x i )  c 0 f ( x 0 )  c 1 f ( x 1 )
b
a
i 0

h
  f ( x 0 )  f ( x 1 )
2
f(x)

L(x)

x0 x1 x
Composite Trapezoid Rule
b x1 x2 xn

a
f(x)dx   f(x)dx   f(x)dx    
x0 x1 xn  1
f(x)dx
h
  f(x0 )  f(x1 )  h  f(x1 )  f(x 2 )    h  f(x n1 )  f(x n )
2 2 2
h
  f(x 0 )  2 f(x 1 )    2f(x i )    2 f ( x n  1 )  f ( x n )
2
f(x)

ba
h
n
x0 h x1 h x2 h x3 h x4 x
Serial algorithm for composite trapezoid rule

f(x)

x h x h x2 h x3 h x4 x
From Serial Code to Parallel Code
f(x)

x h x h x h x h x
0 1 2 3 4
Programming Model – Loop Scheduling
• schedule clause determines how loop iterations are
divided among the thread team
• static([chunk]) divides iterations statically between
threads
• Each thread receives [chunk] iterations, rounding as necessary to
account for all iterations
• Default [chunk] is ceil( # iterations / # threads )
• dynamic([chunk]) allocates [chunk] iterations per thread,
allocating an additional [chunk] iterations when a thread
finishes
• Forms a logical work queue, consisting of all loop iterations
• Default [chunk] is 1
• guided([chunk]) allocates dynamically, but [chunk] is
exponentially reduced with each allocation

14
Loop scheduling options

2(2)
Impact of Scheduling Decision
• Load balance
• Same work in each iteration?
• Processors working at same speed?
• Scheduling overhead
• Static decisions are cheap because they require no run-time
coordination
• Dynamic decisions have overhead that is impacted by
complexity and frequency of decisions
• Data locality
• Particularly within cache lines for small chunk sizes
• Also impacts data reuse on same processor
More loop scheduling attributes
• RUNTIME The scheduling decision is deferred until
runtime by the environment variable OMP_SCHEDULE.
It is illegal to specify a chunk size for this clause.
• AUTO The scheduling decision is delegated to the
compiler and/or runtime system.
• NO WAIT / nowait: If specified, then threads do not
synchronize at the end of the parallel loop.
• ORDERED: Specifies that the iterations of the loop must
be executed as they would be in a serial program.
• COLLAPSE: Specifies how many loops in a nested loop
should be collapsed into one large iteration space and
divided according to the schedule clause (collapsed
order corresponds to original sequential order).
OpenMP environment variables
OMP_NUM_THREADS
 sets the number of threads to use during execution
 when dynamic adjustment of the number of threads is enabled, the
value of this environment variable is the maximum number of
threads to use
 For example,
setenv OMP_NUM_THREADS 16 [csh, tcsh]
export OMP_NUM_THREADS=16 [sh, ksh, bash]
OMP_SCHEDULE
 applies only to do/for and parallel do/for directives that have the
schedule type RUNTIME
 sets schedule type and chunk size for all such loops
 For example,
setenv OMP_SCHEDULE GUIDED,4 [csh, tcsh]
export OMP_SCHEDULE= GUIDED,4 [sh, ksh, bash]
Programming Model – Data Sharing
• Parallel programs often employ
// shared, globals
two types of data
• Shared data, visible to all int bigdata[1024];
threads, similarly named
• Private data, visible to a single
void* foo(void* bar) {
thread (often stack-allocated)
intprivate,
// tid; stack
• PThreads:
• Global-scoped variables are int tid;
shared #pragma omp parallel \
• Stack-allocated variables are
private shared
/* ( bigdata
Calculation ) \
goes
private ( tid )
here */
• OpenMP:
• shared variables are shared } {
• private variables are private /* Calc. here */
}
}
19
Programming Model - Synchronization
• OpenMP Synchronization
#pragma omp critical
• OpenMP Critical Sections
{
• Named or unnamed /* Critical code here */
• No explicit locks / mutexes }

• Barrier directives
#pragma omp barrier

• Explicit Lock functions omp_set_lock( lock l );

• When all else fails – may /* Code goes here */
require flush directive omp_unset_lock( lock l );

• Single-thread regions within#pragma omp single

parallel regions {
• master, single directives /* Only executed once */
}

20
Microbenchmark: Grid Relaxation (Stencil)

for( t=0; t < t_steps; t++) {

#pragma omp parallel for \
shared(grid,x_dim,y_dim) private(x,y)

for( x=0; x < x_dim; x++) {

for( y=0; y < y_dim; y++) {
grid[x][y] = /* avg of neighbors */
}
}
// Implicit Barrier Synchronization
temp_grid = grid;
grid = other_grid;
} other_grid = temp_grid;

CS267 Lecture 6 21
Microbenchmark: Ocean

CS267 Lecture 6 22
Microbenchmark: Ocean

CS267 Lecture 6 23
OpenMP Summary
• OpenMP is a compiler-based technique to create
concurrent code from (mostly) serial code
• OpenMP can enable (easy) parallelization of loop-based
code
• Lightweight syntactic language extensions

• OpenMP performs comparably to manually-coded

threading
• Scalable
• Portable

• Not a silver bullet for all applications

25
More Information

• openmp.org
• OpenMP official site

• www.llnl.gov/computing/tutorials/openMP/
• A handy OpenMP tutorial

• www.nersc.gov/nusers/help/tutorials/openmp/
• Another OpenMP tutorial and reference

CS267 Lecture 6 26

Manual For The Center Controller 5.0 - 2012-07
No ratings yet
Manual For The Center Controller 5.0 - 2012-07
5 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
Chapter 3 - Shared-Memory Programming, OpenMP
No ratings yet
Chapter 3 - Shared-Memory Programming, OpenMP
65 pages
Unit III
No ratings yet
Unit III
15 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Programming Assignment: On Openmp
No ratings yet
Programming Assignment: On Openmp
19 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
Lecture 10 Shared Memory Programming with OpenMP.pptx
No ratings yet
Lecture 10 Shared Memory Programming with OpenMP.pptx
30 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
Shared Memory and Accelerators
No ratings yet
Shared Memory and Accelerators
88 pages
openmp_HPC_ass1
No ratings yet
openmp_HPC_ass1
43 pages
CS-3006_8_UsingOpenMP_SharedMemoryProgramming
No ratings yet
CS-3006_8_UsingOpenMP_SharedMemoryProgramming
61 pages
Lec 12 OpenMP
No ratings yet
Lec 12 OpenMP
152 pages
Openmp
No ratings yet
Openmp
61 pages
Openmp
No ratings yet
Openmp
115 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
No ratings yet
Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
73 pages
Lecture - 06 (Shared Memory Programming With OpenMP)
No ratings yet
Lecture - 06 (Shared Memory Programming With OpenMP)
65 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
OpenMP_SPM
No ratings yet
OpenMP_SPM
9 pages
Openmp 1
No ratings yet
Openmp 1
38 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
UNIT III
No ratings yet
UNIT III
61 pages
UNIT 3
No ratings yet
UNIT 3
13 pages
04
No ratings yet
04
39 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
10 OpenMP-2
No ratings yet
10 OpenMP-2
25 pages
Openmp Boston
No ratings yet
Openmp Boston
90 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
Govindarajan_ParallelizationPrinciples-NSM-AstroPhysics
No ratings yet
Govindarajan_ParallelizationPrinciples-NSM-AstroPhysics
50 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
DS1822-Parallel Computing - Unit2
No ratings yet
DS1822-Parallel Computing - Unit2
25 pages
Omp Exercises
No ratings yet
Omp Exercises
81 pages
Omp Hands On SC08
No ratings yet
Omp Hands On SC08
153 pages
Omp Hands On SC08 PDF
No ratings yet
Omp Hands On SC08 PDF
153 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
Beginning OpenMP
No ratings yet
Beginning OpenMP
20 pages
Openmp
No ratings yet
Openmp
21 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
Concurrent and Parallel Programming Unit V-Notes Unit V Openmp, Opencl, Cilk++, Intel TBB, Cuda 5.1 Openmp
No ratings yet
Concurrent and Parallel Programming Unit V-Notes Unit V Openmp, Opencl, Cilk++, Intel TBB, Cuda 5.1 Openmp
10 pages
HPC_SUMMARY
No ratings yet
HPC_SUMMARY
17 pages
Num Tech
No ratings yet
Num Tech
39 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
OpenMP Tutorial - Lawrence Livermore National Laboratory
No ratings yet
OpenMP Tutorial - Lawrence Livermore National Laboratory
75 pages
Parallel Programming Using OpenMP
No ratings yet
Parallel Programming Using OpenMP
76 pages
Chap4 OpenMP
No ratings yet
Chap4 OpenMP
35 pages
Chapter 5
No ratings yet
Chapter 5
92 pages
Presentation2 HS OpenMP
No ratings yet
Presentation2 HS OpenMP
29 pages
Open MP
No ratings yet
Open MP
30 pages
Parallel Programming Unit 2
No ratings yet
Parallel Programming Unit 2
71 pages
Open MP1
No ratings yet
Open MP1
15 pages
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
v1 Covered
No ratings yet
v1 Covered
16 pages
Li Model-Contrastive Federated Learning CVPR 2021 Paper
No ratings yet
Li Model-Contrastive Federated Learning CVPR 2021 Paper
10 pages
1 s2.0 S0950705121000381 Main
No ratings yet
1 s2.0 S0950705121000381 Main
11 pages
1 s2.0 S0925231223010202 Main
No ratings yet
1 s2.0 S0925231223010202 Main
18 pages
Securing Federated Learning With Blockchain: A Systematic Literature Review
No ratings yet
Securing Federated Learning With Blockchain: A Systematic Literature Review
35 pages
Federated Learning For Healthcare Informatics
100% (1)
Federated Learning For Healthcare Informatics
19 pages
Dopamine: Differentially Private Federated Learning On Medical Data
No ratings yet
Dopamine: Differentially Private Federated Learning On Medical Data
9 pages
Quantum Federated Learning Remarks and Challenges
No ratings yet
Quantum Federated Learning Remarks and Challenges
5 pages
F Ml-He: A E H - E - B P - P F L S: ED N Fficient Omomorphic Ncryption Ased Rivacy Reserving Ederated Earning Ystem
No ratings yet
F Ml-He: A E H - E - B P - P F L S: ED N Fficient Omomorphic Ncryption Ased Rivacy Reserving Ederated Earning Ystem
20 pages
Sec22 Stevens
No ratings yet
Sec22 Stevens
18 pages
1 s2.0 S016740482300007X Main
No ratings yet
1 s2.0 S016740482300007X Main
18 pages
Entropy 23 00460 v2
No ratings yet
Entropy 23 00460 v2
14 pages
Pthreads Mod
No ratings yet
Pthreads Mod
110 pages
An Approachto Work With Quantum Data in
No ratings yet
An Approachto Work With Quantum Data in
6 pages
Cybersecurity
No ratings yet
Cybersecurity
11 pages
12 MPIProgramPerformance
No ratings yet
12 MPIProgramPerformance
33 pages
MPI Pacheco Ch3
No ratings yet
MPI Pacheco Ch3
124 pages
Qtutorial
No ratings yet
Qtutorial
7 pages
Maharaja Surajmal Institute: Department of Computer Applications
No ratings yet
Maharaja Surajmal Institute: Department of Computer Applications
3 pages
Digital Image Processing
No ratings yet
Digital Image Processing
23 pages
Sample Thesis Android Application
100% (2)
Sample Thesis Android Application
5 pages
Ocularis Recorder Configuration Manual
No ratings yet
Ocularis Recorder Configuration Manual
113 pages
The Bells of Notre Dame
No ratings yet
The Bells of Notre Dame
15 pages
CD Keys
No ratings yet
CD Keys
7 pages
Introduction To Precious Metals: Gold Bulletin June 2009
No ratings yet
Introduction To Precious Metals: Gold Bulletin June 2009
2 pages
Cs3451-Introduction to Operating System-1048951571-Unit-III - Memory Management
No ratings yet
Cs3451-Introduction to Operating System-1048951571-Unit-III - Memory Management
28 pages
UNIT No. 1 Introduction To Software and Software Engineering
No ratings yet
UNIT No. 1 Introduction To Software and Software Engineering
2 pages
OBD2 Scanner, Car Code Reader, TOPDON AL400 Check Engine Light Scan Tool, Car Scanner With O2 Sensor/Freeze Frame/I/M Readiness/Smog Check/DTC Lookup, CAN Diagnostic Scanner For All OBDII Cars
No ratings yet
OBD2 Scanner, Car Code Reader, TOPDON AL400 Check Engine Light Scan Tool, Car Scanner With O2 Sensor/Freeze Frame/I/M Readiness/Smog Check/DTC Lookup, CAN Diagnostic Scanner For All OBDII Cars
3 pages
URL Fuzzer - Discover Hidden Files and Directories Report (Light)
No ratings yet
URL Fuzzer - Discover Hidden Files and Directories Report (Light)
2 pages
Route Policy Based
100% (1)
Route Policy Based
14 pages
Riqas-To-Multiqc 3.2 - User Manual: 1. Intended Use
No ratings yet
Riqas-To-Multiqc 3.2 - User Manual: 1. Intended Use
4 pages
Mergetool en
No ratings yet
Mergetool en
18 pages
Pico 8
No ratings yet
Pico 8
66 pages
Project PPT Present1
No ratings yet
Project PPT Present1
24 pages
Datasheet Atmega2560
No ratings yet
Datasheet Atmega2560
435 pages
Computer Applications Technology P2 Nov 2015 Eng
No ratings yet
Computer Applications Technology P2 Nov 2015 Eng
18 pages
E 182 Servicehandbook
No ratings yet
E 182 Servicehandbook
302 pages
Python Packages For Exploratory Factor Analysis
No ratings yet
Python Packages For Exploratory Factor Analysis
7 pages
C:/progedit/examdiff - TXT Printed at 22:26 On 04 Apr 2011 Page 1 of 2
No ratings yet
C:/progedit/examdiff - TXT Printed at 22:26 On 04 Apr 2011 Page 1 of 2
2 pages
E1381-WITHDRAWN Iduw7907
No ratings yet
E1381-WITHDRAWN Iduw7907
11 pages
Resume - Shija Shaji
No ratings yet
Resume - Shija Shaji
1 page
Ps 132kibes Zr32a Mux2b en
No ratings yet
Ps 132kibes Zr32a Mux2b en
20 pages
Toolformer - Language Models Can Teach Themselves To Use Tools
No ratings yet
Toolformer - Language Models Can Teach Themselves To Use Tools
17 pages
ACTi Camera API and URL Commands 20220803
No ratings yet
ACTi Camera API and URL Commands 20220803
73 pages
Fffa
No ratings yet
Fffa
12 pages
E Visibility
100% (1)
E Visibility
32 pages
AS-960E
No ratings yet
AS-960E
4 pages

Lecture Open MP

Uploaded by

Lecture Open MP

Uploaded by

Parallel

CS240A, T. Yang, 2013

• OpenMP will not:

// Do this part in parallel

printf( "Hello, World!\n" );

// Do this part in parallel Printf Printf Printf Printf

• Preprocessor calculates loop

#pragma omp parallel for

• Reduce ops and init() values (C and C++):

• Explicit Lock functions omp_set_lock( lock l );

• Single-thread regions within#pragma omp single

for( t=0; t < t_steps; t++) {

for( x=0; x < x_dim; x++) {

• OpenMP performs comparably to manually-coded

• Not a silver bullet for all applications

You might also like