Openmp
Openmp
with OpenMP
Doug Sondak
Boston University
Scientific Computing and Visualization
Office of Information Technology
[email protected]
Outline
• Introduction
• Basics
• Data Dependencies
• A Few More Basics
• Caveats & Compilation
• Coarse-Grained Parallelization
• Thread Control Diredtives
• Some Additional Functions
• Some Additional Clauses
• Nested Paralleliism
• Locks
Introduction
Introduction
• Types of parallel machines
– distributed memory
• each processor has its own memory address space
• variable values are independent
x = 2 on one processor, x = 3 on a different processor
• examples: linux clusters, Blue Gene/L
– shared memory
• also called Symmetric Multiprocessing (SMP)
• single address space for all processors
– If one processor sets x = 2 , x will also equal 2 on other
processors (unless specified otherwise)
• examples: IBM p-series, SGI Altix
Shared vs. Distributed Memory
distributed shared
Shared vs. Distributed Memory (cont’d)
• Multiple processes
– Each processor (typically) performs
independent task
• Multiple threads
– A process spawns additional tasks (threads)
with same memory address space
What is OpenMP?
serial
loop
serial
loop
serial
ifirst = 10;
for(i = 1; i <= imax; i++){
i2 = 2*i;
j[i] = ifirst + i2;
}
Shared vs. Private (3)
ifirst = 10;
#pragma omp parallel for private(i2)
for(i = 1; i <= imax; i++){
i2 = 2*i;
j[i] = ifirst + i2;
}
Shared vs. Private (5)
before after
ifirst = 10 spawn ifirst = 10
…
…
additional
i2thread 0 = ??? thread i2thread 0 = ???
…
i2thread 1 = ???
Shared vs. Private (6)
ifirst = 10
Thread 0 Thread 1
…
i2thread 0 = ???
…
i2thread 1 = ???
Data Dependencies
Data Dependencies
a[1] = 0;
a[2] = 1;
for(i = 3; i <= 100; i++){
a[i] = a[i-1] + a[i-2];
}
Data Dependencies (3)
• parallelize on 2 threads
– thread 0 gets i = 3 to 51
– thread 1 gets i = 52 to 100
– look carefully at calculation for i = 52 on
thread 1
• what will be values of for i -1 and i - 2 ?
Data Dependencies (4)
ifirst = 10;
#pragma omp parallel for \
default(none) \
shared(ifirst,imax,j) \
private(i2)
for(i = 0; i < imax; i++){
i2 = 2*i;
j[i] = ifirst + i2;
}
Firstprivate
sum1 = 0.0;
for(i = 0; i < imaxi; i++){
sum1 = sum1 + a[i];
}
Reduction (cont’d)
• Solution? – Reduction clause
is the same as
!$OMP PARALLEL
!$OMP SECTIONS
!$OMP SECTION
call init_field(a)
!$OMP SECTION
call check_grid(x)
!$OMP END SECTIONS
!$OMP END PARALLEL
Sections Example - C
!$OMP PARALLEL
!$OMP SINGLE
p = listhead
do while(p)
!$OMP TASK
process(p)
!$OMP END TASK
p = next(p)
enddo
!$OMP END SINGLE
!$OMP END
Task Example - C
do i = 1, 10
!$OMP ATOMIC
x(j(i)) = x(j(i)) + 1.0
enddo
Atomic Example - C
int mydyn = 0;
omp_set_dynamic(mydyn);
Dynamic Thread Assignment (cont’d)
• Dynamic threading can also be turned on or off
by setting the environment variable
omp_dynamic to true or false
– call to omp_set_dynamic overrides the environment
variable
• The function omp_get_dynamic() returns a
value of “true” or “false,” indicating whether
dynamic threading is turned on or off.
OMP_GET_MAX_THREADS
• integer function
• returns maximum number of threads available
in current parallel region
• same result as omp_get_num_threads if
dynamic threading is turned off
OMP_GET_NUM_PROCS
• integer function
• returns maximum number of processors in the
system
– indicates amount of hardware, not number of
available processors
• could be used to make sure enough
processors are available for specified number
of sections
OMP_IN_PARALLEL
• logical function
• tells whether or not function was called from a
parallel region
• useful debugging device when using
“orphaned” directives
– example of orphaned directive: omp parallel in
“main,” omp do or omp for in routine called from main
Some Additional Clauses
Collapse
0 1-13 13
1 14-26 13
2 27-39 13
3 40-51 12
Schedule (3)
• number of indices doled out at a time to each
thread is called the chunk size
• can be modified with the SCHEDULE clause
!$omp do schedule(static,5)
!$omp do schedule(dynamic,5)
!$omp do schedule(runtime)
integer(omp_lock_kind) :: mylock
omp_nest_lock_t *mylock;
Lock Routines (cont’d)
• omp_init_lock(mylock)
– initializes mylock
– must be called before mylock is used
– subroutine in Fortran
• omp_set_lock(mylock)
– gives ownership of mylock to calling thread
– other threads cannot execute code following call
until mylock is released
– subroutine in Fortran
Lock Routines (3)
• omp_test_lock(mylock)
– logical function
– if mylock is presently owned by another
thread, returns false
– if mylock is available, returns true and sets
lock, i.e., gives ownership to calling
thread
– be careful: omp_test_lock(mylock)does
more than just test the lock
Lock Routines (4)
• omp_unset_lock(mylock)
– releases ownership of mylock
– subroutine in Fortran
• omp_destroy_lock(mylock)
– call when you’re done with the lock
– complement to omp_init_lock(mylock)
– subroutine in Fortran
Lock Example
• an independent task takes a significant
amount of time, and it must be performed
serially
• another task, independent of the first,
may be performed in parallel
• improve parallel efficiency by doing both
tasks at the same time
• one thread locks serial task
• other threads work on parallel task until
serial task is complete
Lock Example - Fortran
call omp_init_lock(mylock)
!$omp parallel
if(omp_test_lock(mylock)) then
call long_serial_task
call omp_unset_lock(mylock)
else
dowhile(.not. omp_test_lock(mylock))
call short_parallel_task
enddo
call omp_unset_lock(mylock)
endif
!$omp end parallel
call omp_destroy_lock(mylock)
Lock Example C
omp_init_lock(mylock);
#pragma omp parallel
if(omp_test_lock(mylock)){
long_serial_task();
omp_unset_lock(mylock);
}else{
while(! omp_test_lock(mylock)){
short_parallel_task();
}
omp_unset_lock(mylock);
}
#pragma omp end parallel omp_destroy_lock(mylock);
Survey
• Please fill out evaluation survey for this
tutorial at
http://scv.bu.edu/survey/tutorial_evaluation.html