0% found this document useful (0 votes)
38 views

Parallel-Port-Example-Computer-Science-2004-7-7-The-Point-Jacobi-Iteration - PRG Örnekleri

This document provides an overview of using MPI to parallelize the solution of the Laplace partial differential equation across multiple processors. It describes distributing the work, data, and communication required. The serial Jacobi iteration method is presented, and its parallel implementation is demonstrated through domain decomposition, data distribution across processors, and use of MPI functions like Send, Recv, and Reduce to exchange boundary data between processors at each iteration. Sample C and Fortran code templates are provided to get started in implementing the parallel solution.

Uploaded by

Mike Thomson
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Parallel-Port-Example-Computer-Science-2004-7-7-The-Point-Jacobi-Iteration - PRG Örnekleri

This document provides an overview of using MPI to parallelize the solution of the Laplace partial differential equation across multiple processors. It describes distributing the work, data, and communication required. The serial Jacobi iteration method is presented, and its parallel implementation is demonstrated through domain decomposition, data distribution across processors, and use of MPI functions like Send, Recv, and Reduce to exchange boundary data between processors at each iteration. Sample C and Fortran code templates are provided to get started in implementing the parallel solution.

Uploaded by

Mike Thomson
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Parallel Port Example

April 24, 2002


Introduction
The objective of this lecture is to go over a simple problem that illustrates the use of the MPI
library to parallelize a partial differential equation (PDE).
The Laplace problem is a simple PDE and is found at the core of many applications. More
elaborate problems often have the same communication structure that we will discuss in
this class. Thus, we will use this example to provide the fundamentals on how
communication patterns appear on more complex PDE problems.
This lecture will demonstrate message passing techniques, among them, how to:
• Distribute Work
• Distribute Data
• Communication:
Since each processor has its own memory, the data is not shared, and communication
becomes important.
• Synchronization

April 24, 2002


Laplace Equation
The Laplace equation is:

We want to know t(x,y) subject to the following initial boundary conditions:

April 24, 2002


Laplace Equation
To find an approximate solution to the equation, define a square mesh or grid
consisting of points

April 24, 2002


The Point Jacobi Iteration
The method known as “point Jacobi iteration” calculates the value if T9i,j) as
an average of the old values of T at the neighboring points:

April 24, 2002


The Point Jacobi Iteration
The iteration is repeated until the solution is reached.

If we want to solve T for [1000, 1000] points, the grid itself needs to be of
dimension 1002 x 1002; since the algorithm to calculate T9i,j) requires
values of T at I-1, I+1, j-1, and j+1.

April 24, 2002


Serial Code Implementation
In the following NR=numbers of rows, NC= number of columns. (excluding the boundary columns
and rows)
The serial implementation of the Jacobi iteration is:

April 24, 2002


Serial Version – C

April 24, 2002


Serial Version – C

April 24, 2002


Serial Version – C

April 24, 2002


Serial Version - Fortran

April 24, 2002


Serial Version - Fortran

April 24, 2002


Serial Version - Fortran

April 24, 2002


Serial Version - Fortran

April 24, 2002


Serial Version - Fortran

April 24, 2002


Parallel Version: Example Using
4 Processors
Recall that in the serial case the grid boundaries were:

April 24, 2002


Simplest Decomposition for
Fortran Code

April 24, 2002


Simplest Decomposition for
Fortran Code
A better distribution from the point of view of communication
optimization is the following:

The program has a “local” view of data.


The programmer has to have a “global” view of data.
April 24, 2002
Simplest Decomposition for C
Code

April 24, 2002


Simplest Decomposition for C
Code
In the parallel case, we will break this up into 4 processors:
There is only one set of boundary values. But when we distribute the data, each
processor needs to have an extra row for data distribution:

The program has a “local” view of data.


The programmer has to have a “global”
view of data.

April 24, 2002


Include Files
Fortran:
* (always declare all variables)
implicit none
INCLUDE 'mpif.h‘

* Initialization and clean up (always check error codes):


call MPI_Init(ierr)
call MPI_Finalize(ierr)

C:
#include "mpi.h"
/* Initialization and clean up (always check error codes): */

stat = MPI_Init(&argc, &argv);


stat = MPI_Finalize();

Note: Check for MPI_SUCCESS

if (ierr. ne. MPI_SUCCESS) then


do error processing
endif

April 24, 2002


Initialization
Serial version:

Parallel version:
Just for simplicity, we will distribute rows in C and columns in Fortran; this is easier because data
is stored in rows C and in columns Fortran.

April 24, 2002


Parallel Version: Boundary
Conditions
Fortran Version

We need to know MYPE number and how many PEs we are using.
Each processor will work on different data depending on MYPE.
Here are the boundary conditions in the serial code, where
NRL-local number of rows, NRL=NPROC

April 24, 2002


Parallel C Version: Boundary
Conditions

We need to know MYPE number and how many PEs we are using. Each processor will work on
different data depending on MYPE.
Here are the boundary conditions in the serial code, where
NRL=local number of rows, NRL=NR/NPROC

April 24, 2002


Processor Information
Fortran:
Number of processors:
call MPI_Comm_size (MPI_COMM_WORLD, npes ierr)
Processor Number:
call MPI_Comm_rank(MPI_COMM_WORLD, mype, ierr)
C:
Number of processors:
stat = MPI_Comm_size(MPI_COMM_WORLD, &npes);
Processor Number:
stat = MPI_Comm_rank(MPI_COMM_WORLD, &mype);

April 24, 2002


Maximum Number of Iterations
Only 1 PE has to do I/O (usually PE0).
Then PE0 (or root PE) will broadcast niter to all others. Use the
collective operation MPI_Bcast.
Fortran:

Here number of elements is how many values we are passing, in this case
only one: niter.
C:

April 24, 2002


Main Loop
for (iter=1; iter <= NITER; iter++) {
Do averaging (each PE averages from 1 to 250)
Copy T into Told
Send Values down

Send values up
This is where we use MPI communication calls: need to exchange data between processors
Receive values from above

Receive values from below

(find the max change)

Synchronize

April 24, 2002


Parallel Template: Send data up
Once the new T values have been calculated:
SEND
• All processors except processor 0 send their “first” row (in C) to their neighbor above
(mype – 1).

April 24, 2002


Parallel Template: Send data
down
SEND
• All processors except the last one, send their “last” row to their neighbor below (mype + 1).

April 24, 2002


Parallel Template: Receive from
above
Receive
• All processors except PE0, receive from their neighbor above and unpack in row 0.

April 24, 2002


Parallel Template: Receive from
below
Receive
• All processors except processor (NPES-1), receive from the neighbor below and unpack in
the last row.

Example: PE1 receives 2 messages – there is no guarantee of the order in which they will be
received.
April 24, 2002
Parallel Template (C)

April 24, 2002


Parallel Template (C)

April 24, 2002


Parallel Template (C)

April 24, 2002


Parallel Template (C)

April 24, 2002


Parallel Template (C)

April 24, 2002


Parallel Template (Fortran)

April 24, 2002


Parallel Template (Fortran)

April 24, 2002


Parallel Template (Fortran)

April 24, 2002


Parallel Template (Fortran)

April 24, 2002


Parallel Template (Fortran)

April 24, 2002


Variations

if ( mype != 0 ){
up = mype - 1
MPI_Send( t, NC, MPI_FLOAT, up, UP_TAG, comm, ierr
); }

Alternatively
up = mype - 1
if ( mype == 0 ) up = MPI_PROC_NULL;
MPI_Send( t, NC, MPI_FLOAT, up, UP_TAG, comm,ierr );

April 24, 2002


Variations

if( mype.ne.0 ) then


left = mype - 1
call MPI_Send( t, NC, MPI_REAL, left, L_TAG, comm, ierr)
endif
Alternatively
left = mype - 1
if( mype.eq.0 ) left = MPI_PROC_NULL
call MPI_Send( t, NC, MPI_REAL, left, L_TAG, comm, ierr)
endif

Note: You may also MPI_Recv from MPI_PROC_NULL


April 24, 2002
Variations
Send and receive at the same time:
MPI_Sendrecv( … )

April 24, 2002


Finding Maximum Change

Each PE can find it’s own maximum change dt

To find the global change dtg in C::


MPI_Reduce(&dt, & dtg, 1, MPI_FLOAT,
MPI_MAX, PE0, comm);

To find the global change dtg in Fortran:


call
MPI_Reduce(dt,dtg,1,MPI_REAL,MPI_MAX, PE0,
comm, ierr)

April 24, 2002


Domain Decomposition

April 24, 2002


Data Distribution I
Domain Decomposition I

• All processors have entire T array.


• Each processor works on TW part of T.
• After every iteration, all processors broadcast their TW to all other
processors.
• Increased memory.
• Increased operations.
April 24, 2002
Data Distribution I
Domain Decomposition II

• Each processor has sub-grid.


• Communicate boundary values only.
• Reduce memory.
• Reduce communications.
• Have to keep track of neighbors in two directions.
April 24, 2002
Exercise
1. Copy the following parallel templates into your /tmp directory in jaromir:
/tmp/training/laplace/laplace.t3e.c
/tmp/training/laplace/laplace.t3e.f

2. These are template files; your job is to go into the sections marked "<<<<<<" in the source code
and add the necessary statements so that the code will run on 4 PEs.

Useful Web reference for this exercise:


To view a list of all MPI calls, with syntax and descriptions, access the Message Passing
Interface Standard at:
http://www-unix.mcs.anl.gov/mpi/www/
3. To compile the program, after you have modified it, rename the new programs laplace_mpi_c.c
and laplace_mpi_f.f and execute:
cc –lmpi laplace_mpi_c
f90 –lmpi laplace_mpi_f

April 24, 2002


Exercise
4. To run:
echo 200 | mpprun -n4 ./laplace_mpi_c
echo 200 | mpprun -n 4 ./laplace_mpi_f

5. You can check your program against the solutions


laplace_mpi_c.c and

laplace_mpi_f.f

April 24, 2002


Source Codes
The following are the C and Fortran templates that you need to parallelize for the Exercise.
laplace.t3e.c

April 24, 2002


Source Codes

April 24, 2002


Source Codes

April 24, 2002


Source Codes

April 24, 2002


Source Codes

April 24, 2002


Source Codes
laplace.t3e.f

April 24, 2002


Source Codes

April 24, 2002


Source Codes

April 24, 2002


Source Codes

April 24, 2002


Source Codes

April 24, 2002

You might also like