0% found this document useful (0 votes)
7 views26 pages

Copyright 107

This document is a proposal for copyright of a Laboratory Manual on Parallel Programming submitted to the Copyright Office in India. It outlines the vision, mission, course syllabus, objectives, outcomes, and exercises related to parallel programming using various paradigms such as MPI, Pthreads, and OpenMP. The manual aims to equip students with the necessary skills to implement parallel programming techniques effectively.

Uploaded by

shilpaseyash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views26 pages

Copyright 107

This document is a proposal for copyright of a Laboratory Manual on Parallel Programming submitted to the Copyright Office in India. It outlines the vision, mission, course syllabus, objectives, outcomes, and exercises related to parallel programming using various paradigms such as MPI, Pthreads, and OpenMP. The manual aims to equip students with the necessary skills to implement parallel programming techniques effectively.

Uploaded by

shilpaseyash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

A Proposal for Copyright to

LABORATORY MANUAL - PARALLEL


PROGRAMMING

Submitted to

COPYRIGHT OFFICE
DEPARTMENT FOR PROMOTION OF INDUSTRY AND
INTERNAL TRADE
MINISTRY OF COMMERCE AND INDUSTRY
NEW DELHI-110078

Submitted by

Dr. S. Devi Mahalakshmi

Department of Computer Science and Engineering

MEPCO SCHLENK ENGINEERING COLLEGE (AUTONOMOUS)


SIVAKASI
MEPCO NAGAR, SIVAKASI – 626 005
(Via) VIRUDHUNAGAR
Email: [email protected] Web: http://www.mepcoeng.ac.in

June 2020
TABLE OF CONTENTS

Sl. No Topic Page No

1. Vision, Mission, PEO and PO 1

2. Course Syllabus 3

3. Course Outcomes– Programme Outcomes mapping 4

4. Plan of Exercises 5

5. Instructions for Compilation and Execution of the Program 9

6. System Calls for Inter process communication 10

7. MPI Programming for Process communication 14

8. Pthread Programming for multithreaded programming 19

9. OpenMP programming for concurrency 23


MEPCO SCHLENK ENGINEERING COLLEGE (AUTONOMOUS), SIVAKASI

Vision:

Envisioning a world lead by our engineers, holding a beacon of hope and confidence for generations
to come.

Mission:

To Produce Competent, Disciplined and Quality Engineers & Administrators through Service Par
Excellence.

Department of Computer Science and Engineering

Vision:

To become the centre of excellence in computer education and research and to create the platform
for industrial consultancy
Mission:

 To produce globally competent and quality computer professionals by educating


computer concepts and techniques

 To facilitate the students to work with recent tools and technologies

 To mould the students by inculcating the spirit of ethical values contributing to the
societal ethics

Programme Educational Objectives (PEOs) – M.E Computer Science and Engineering

After 3 to 5 years of completing the Programme on Master of Computer Science and


Engineering, the post graduates will become:
PEO PEO
No.
Competent Computer/Software Engineer rendering expertise to the industrial and
1 societal needs in an effective manner
.2 Sustained learner to bring out novel ideas by addressing the research issues
.
3 Trainer/Philosopher to guide others towards the development of technology
.

1
Programme Outcomes (POs) – M.E Computer Science and Engineering
During the course of the programme on Master of Computer Science and Engineering
the learners will acquire the ability to:

POs
POs
Number
Apply knowledge of mathematics, science and information science in computer
1. engineering in advance level
Design a Computer system with components and processes of desired needs within
2. realistic constraints such as economic, environmental, social, political, ethical, health
and safety
Identify and modify the functions of the internal of computer components such as
3. operating systems and compilers
4. Apply Software Engineering principles, techniques and tools in software development
Create, collect, process, view, organize, store, mine and retrieve data in both local and
5. remote locations in a secure and effective manner
Design and conduct experiments, as well as to analyze and interpret data to lay a
6. foundation for solving complex problems
Engage in life-long learning to acquire knowledge of contemporary issues to meet the
7. challenges in the career
Apply the skills and techniques in computer engineering and inter-disciplinary domains
8. for providing solutions in a global, economic, environmental, and societal context
9. Develop research skills and innovative ideas
10. Model the real world problems to address and share the research issues
11. Share their knowledge and express their ideas in any technical forum
12. Present their ideas to prepare for a position to educate and guide others

PEOs – POs Mapping:

1 2 3 4 5 6 7 8 9 10 11 12

1.       

2.   

3.    

2
Course Syllabus

Subject Name : PARALLEL PROGRAMMING LABORATORY


Prepared By : Dr.S.Devi Mahalakshmi
Approved By : Dr.K.Muneeswaran

Course Objective:
1. To learn the design of message passing paradigm using MPI
2. To explore the shared memory paradigm with Pthreads
3. To learn the implementation of shared memory paradigm with OpenMP.
4. To learn the GPU based parallel programming using OpenCL.

Course Outcome:
1. Implement message passing parallel programs using MPI framework
2. Implement shared memory parallel programs using Pthreads
3. Work with shared memory parallel programs using OpenMP
4. Design and develop OpenCL programs

Course Prerequisite:
Computer Organization, Multi core Architecture and Any programming language such C,C++,Java

SYLLABUS FOR THE LAB:


• Message passing and Message matching in MPI
• Broadcast and reduction in MPI
• Working with MPI derived types
• Pthreads- synchronization using mutex and busy waiting
• Pthreads- synchronization using semaphores- Producer Consumer problem
• Pthreads-thread synchronization using barriers and condition variables
• Implementation of concurrent List with Pthread read write locks
• Reduction clause in OpenMP
• Program using parallel for directive in OpenMP
• Scheduling loops in OpenMP – Odd even transposition sort
• Synchronization in OpenMP - Producer Consumer problem
• Working with OpenCL buffers and image object
TOTAL: 45 PERIODS
REFERENCE BOOKS:
1. Peter S. Pacheco, “An introduction to parallel programming”, Morgan Kaufmann, 2011.

2. A. Munshi, B. Gaster, T. G. Mattson, J. Fung, and D. Ginsburg, “OpenCL programming guide”,


Addison Wesley, 2011

3. M. J. Quinn, “Parallel programming in C with MPI and OpenMP”, Tata McGraw Hill, 2003.

4. W. Gropp, E. Lusk, and R. Thakur, “Using MPI-2: Advanced features of the message passing
interface”, MIT Press, 1999.

5. W. Gropp, E. Lusk, and A. Skjellum, “Using MPI: Portable parallel programming with the
message passing interface”, Second Edition, MIT Press, 1999.

6. B. Chapman, G. Jost, and Ruud van der Pas, “Using OpenMP”, MIT Press, 2008.

7. D. R. Butenhof, “Programming with POSIX Threads”, Addison Wesley, 1997.

8. B. Lewis and D. J. Berg, “Multithreaded programming with Pthreads”, Sun Microsystems


Press, 1998.

3
Course Outcomes–Programme Outcomes mapping
(3- Substantially, 2-Moderately, 1-Slightly)

Highest *Mode of *Assessment Weightag Programme Outcomes


CO Delivery Components e for AC
Cognitiv 1 2 3 4 5 6 7 8 9 1 11 12 13
No. (AC)
e Level 0
8 0.3
1 A 1,3 10 0.4 3 3 3
12 0.3
8 0.3
2 A 1,3 10 0.4 3 3 3
12 0.3
8 0.3
3 A 1,3 10 0.4 3 3 3
12 0.3
8 0.3
4 A 1,3 10 0.4 3 3 3
12 0.3
8 0.3
5 A 1,3 10 0.4 3 3 3
12 0.3

*Mode of Delivery:
1. Oral presentation
2. Tutorial
3. Hands on/Demonstration
4. Seminar/Guest lecture
5. Videos
6. Field visit

**Assessment Methods:
1. Internal Test
2. Assignment
3. Course Seminar
4. Course Participation
5. Course Quiz
6. Demo
7. Case Study
8. Record Work
9. Lab Mini Project
10. Lab Model Exam
11. Project Review
12. Lab Observation
13. Poster Presentation
14. Group Discussion

4
PLAN OF EXERCISES

Ex.No. EXERCISE No. of hours CO Number


1. Simple C programs to review of Process creation, 3
synchronization and IPC
2. Programs using Message passing and Message 3 1
matching in MPI
3. Programs using Broadcast and reduction in MPI 3 1
4. Working with MPI derived types 3 1
5. Pthreads- synchronization using mutex and busy 3 2
waiting
6. Pthreads- synchronization using semaphores- Producer 3 2
Consumer problem
7. Pthreads-thread synchronization using barriers and 3 2
condition variables

8. Implementation of concurrent List with Pthread read 3 2


write locks
9. Reduction clause in OpenMP 3 3
10. Program using parallel for directive in OpenMP 3 3
11. Scheduling loops in OpenMP – Odd even transposition 3
sort
12. Synchronization in OpenMP - Producer Consumer 3 3
problem
13. Working with OpenCL buffers and image object 3 4

DETAILED PLAN OF EXERCISES

LU-1 Programs using Message passing and Message matching in MPI Period: 3

LU Outcomes CO Number:1
Design and implement Programs using Message passing and Message matching
in MPI
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Ping pong message passing C

2. Finding the global sum of N elements using P=4 processor then C


P=8 processor.
3. Implement message passing between mater process and N-1 slave C
process.

LU-2 Programs using Broadcast and reduction in MPI Period: 3

LU Objectives
Implement Programs using Broadcast and reduction in MPI
LU Outcomes CO Number:1
Design Concurrent Linked list using coarse grained synchronization
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level

1. Finding the global sum of N elements using P=4 processor then C


P=8 processor.

5
2. Matrix vector multiplication C

3. Pi- calculation C

LU-3 Working with MPI derived types Period: 3

LU Outcomes CO Number:1
Design MPI programs using MPI derived types
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Find the percentage of 5 subject marks using MPI derived types C
2. Calculate interest using MPI derived types C

3. Calculate area using Trapezoidal rule using MPI derived types C

LU-4 Pthreads- synchronization using mutex and busy waiting. Period: 3


LU Objectives
Implement Programs using Pthread with mutex and busy waiting for
synchronization
LU Outcomes CO Number:1
Create concurrent queue using various locking and synchronization mechanisms
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Implement summation of N problem using mutex C
2. Implement Producer Consumer problem using mutex C

3. Calculate area using Trapezoidal rule using mutex C

LU-5 Pthreads- synchronization using semaphores- Producer Consumer Period: 3


problem
LU Outcomes CO Number:1
Implement Producer Consumer problem using semaphores
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Implement summation of N problem using semaphores C
2. Implement Producer Consumer problem using semaphores C

3. Calculate area using Trapezoidal rule C

LU-6 Pthreads-thread synchronization using barriers and condition Period: 3


variables
LU Outcomes CO Number:2
Design and implement programs in Pthreads with thread synchronization using
barriers and condition variables
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Implement sum of N elements using using barriers for C
synchronization
2. Implement sum of N elements using using condition variables for C
synchronization
3. Implement Pi calculation using using barriers and condition C
variables for synchronization

LU-7 Implementation of concurrent List with Pthread read write locks Period: 3
LU Outcomes CO Number:2
Construct concurrent List with Pthread read write locks

6
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Implement concurrent linked list for maintaining student C
information (ordered and unordered)
2. Implement circular and doubly linked list C
3. Implement Queue using linked list C
4. Add 2 polynomial using linked List C

LU-8 Reduction clause in OpenMP Period: 3

LU Outcomes CO Number:3
1. Work with Reduction clause in OpenMP
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Find the sum of N numbers using Reduction clause C
2. Calculate Pi using Reduction clause C
3. Find the factorial of N using Reduction clause C

LU-9 Program using parallel for directive in OpenMP Period: 3

LU Outcomes CO Number:4
1. Develop programs using parallel for directive in OpenMP
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Find the sum of N numbers using parallel for directive C
2. Calculate Pi using parallel for directive C
3. Count the number of positive elements in a list of n parallel for C
directive

LU-10 Scheduling loops in OpenMP – Odd even transposition sort Period: 3

LU Outcomes CO Number:4
1. Implement programs using OpenMP and schedule the loops
2. Implement Odd even transposition sort

Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)


Sl.No Test Questions Level
1. Find the sum of series using static scheduling C
2. Perform Odd even transposition sort C

LU-11 Synchronization in OpenMP - Producer Consumer problem Period: 3

LU Outcomes CO Number:5
1. Design and implement programs with synchronization in OpenMP - Producer
Consumer problem
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Implement producer consumer problem C
2. Generate the Fibonacci series C
3. Calculate area using Trapezoidal rule C

LU-12 Working with OpenCL buffers and image object Period: 3


LU Outcomes CO Number:5

7
1. Design and implement programs using OpenCL buffers and image object
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Find sum of 1000 elements suing OpenCL buffer C
2. Read and write images using image object C

8
Instructions for Compilation and Execution of the Program:

1. cc filename.c -c - for compilation only


2. cc filename.c - for compilation & linking
3. cc filename.c -o name2 - for compilation, linking and giving executable file named
name2
4. cc filename.c -lm - for compilation & linking with math.h

Creating makefile
 create a file MakeFile with no extension

i. $vi makefile

output.exe:headimp.o headapp.o
cc headimp.o headapp.o -o output.exe
headapp.o:headapp.c head.h
cc -c headapp.c
headimp.o:headimp.c head.h
cc -c headimp.c
ii. $make
iii. ./output.exe

9
System Calls for Inter process communication:

Process creation using fork

fork() - Fork system call use for creates a new process, which is called child process, which runs
concurrently with process (which process called system call fork) and this process is called parent
process.
After a new child process created, both processes will execute the next instruction following the fork()
system call. A child process use same pc(program counter), same CPU registers, same open files which
use in parent process.
It take no parameters and return integer value. Below are different values returned by fork().
 Negative Value: creation of a child process was unsuccessful.
 Zero: Returned to the newly created child process.
 Positive value: Returned to parent or caller. The value contains process ID of newly created
child process.

System calls
a. fork
Creates a child process that differs from the parent process only in its PID and
PPID
Prototype
pid_t fork(void)
 On success, the PID of the child process is returned in the parent's
thread of execution, and a 0 is returned in the child's thread of
execution.
 On failure, a -1 will be returned in the parent's context, no child
process will be created.

b. getpid
Returns the process ID of the calling process.
Prototype
pid_t getpid(void)

c. getppid
Returns the process ID of the parent of the calling process.
Prototype
pid_t getppid(void)

d. exit
Cause normal program termination
Prototype
void exit(int status)
 Where status is an integer between 0 and 255.
 When a process exits with a status of zero that means it didn't
encounter any problems. when a process exit with a non-zero status

10
that means it did have problems a non zero status that means it did
have problems

e. wait
Blocks the calling process until one of its child processes exits or a signal is
received.
Prototype
int wait(int *status)
 status is a pointer to an integer where the UNIX system stores the
value returned by the child process.
 returns the process ID of the process that ended.

IPC using Pipe


System call - pipe
The pipe() function creates a pipe, which is an object allowing bidirectional data flow.
Header File
#include <unistd.h>
Prototype
int pipe (int file_descriptors[2])
 file_descriptors[2] is an array that pipe() fills with a file
 file_descriptor[0] -descriptor opened for reading,
 file_descriptor[1] -opened for writing

Algorithm using pipe for IPC

algorithm ipcPipe(string) {
//string- data to be shared among two processes
create a pipe using pipe() system call
create a Child Process
if (child created successfully) {
close the read end of child’s pipe
write the string in to child’s queue using write( ) system call
}
else {
create the child process again;
}
if (parent get focus) {
close the write end of parent’s pipe;
retrieve the data in pipe using read( ) system call;
display the data;
}
}

Inter Process communication using shared memory


Inter Process Communication through shared memory is a concept where two or more process can
access the common memory. And communication is done via this shared memory where changes made
by one process can be viewed by another process.

System calls used


a. shmget
Allocates a shared memory segment .

11
Prototype
int shmget(key_t key, int size, int shmflg);

Field Description

key Specifies either IPC_PRIVATE or a system-wide unique key

size Size of new shared memory segment

IPC_CREAT
shmflg Create the segment if it doesn't already exist in the kernel.
IPC_EXCL
When used with IPC_CREAT, fail if segment already exists

shared memory segment identifier on success


return value
-1 on error

b. shmat()
Attaches the shared memory segment identified by shmid to the address space
of the calling process
Header file
#include <sys/shm.h>
Prototype
void *shmat(int shmid, const void *shmaddr, int shmflg)

Field Description

Is a unique positive integer created by a shmget system call and


shmid associated with a segment of shared memory.

shmaddr Points to the desired address of the shared memory segment

Specifies a set of flags that indicate the specific shared memory conditions
and options to implement
Eg. SHM_RDONLY - the segment is attached for reading and the process
shmflg
must have read permission for the segment.
SHM_REMAP - indicate that the mapping of the segment should replace
any existing mapping in the range starting at shmaddr and continuing for
the size of the segment.

return address at which segment was attached to the process, or


value -1 on error

c. shmdt
The shmdt() function detaches from the calling process's data segment the
shared memory segment located at the address specified by shmaddr.
Header File
include <sys/shm.h>

12
Prototype
int shmdt(const void *shmaddr)

Field Description

shmaddr
Is the data segment start address of a shared memory segment

If successful, shmdt() decrements the shm_nattach associated with the shared


return
memory segment and returns zero.
value
On failure, it returns -1

d. shmctl
Perform shared memory control operations

Header File
#include <sys/shm.h>
Prototype
int shmctl(int shmid, int cmd, struct shmid_ds *buf);

Field Description

Is a unique positive integer created by a shmget system call and associated with a
shmid
segment of shared memory.

cmd
Specifies one of IPC_STAT, IPC_SET, or IPC_RMID

buf Points to the data structure used for sending or receiving data during execution of
shared memory control operations

return
If successful, shmctl() returns zero. On failure, it returns -1
value

Server Side

algorithm sharedMemoryServer(data) {
// data- content to be shared among processes
Create a global data space
Create a new shared memory using shmget() system call ;
attach the global data space shmat( ) system call;
write data in to the global data space;
wait for client to read the data space;
}
Client side

algorithm sharedMemoryClient( ) {
get the shared memory created by server using shmget( )
read the data from shared memory
process data
}

13
MPI Programming for Process communication

Message Passing Interface


MPI is a library of routines that runs with standard C or Fortran programs, using commonly-available
operating system services to create parallel processes and exchange information among these processes.
MPI can also support distributed program execution on heterogenous hardware. That is, you may
run a program that starts processes on multiple computer systems to work on the same problem. This is
useful with a workstation farm.
MPI Functions
 MPI_Init
Initializes the MPI execution environment. This function must be called in every MPI program, must
be called before any other MPI functions and must be called only once in an MPI program.
Syntax :
MPI_Init (&argc,&argv)
MPI_INIT (ierr)

 MPI_Comm_size
Returns the total number of MPI processes in the specified communicator, such as
MPI_COMM_WORLD. If the communicator is MPI_COMM_WORLD, then it represents the number of MPI
tasks available to your application.
Syntax :
MPI_Comm_size (comm,&size)
 MPI_Comm_rank
Returns the rank of the calling MPI process within the specified communicator. Initially, each process
will be assigned a unique integer rank between 0 and number of tasks - 1 within the communicator
MPI_COMM_WORLD. This rank is often referred to as a task ID. If a process becomes associated with
other communicators, it will have a unique rank within each of these as well.
Syntax :
MPI_Comm_rank (comm,&rank)

 MPI_Send
Basic blocking send operation. Routine returns only after the application buffer in the sending task is
free for reuse.
Syntax :
int MPI Send(
void∗ msg_buf_p,
int msg_size,
MPI_Datatype msg_type,
int destination,
int tag,
MPI_Comm communicator
);

The first three arguments, msg_buf_p, msg_size, and msg_type, determine the contents of the
message.
The remaining arguments, destination, tag, and communicator, determine the destination of the
message. The fourth argument, destination, specifies the rank of the process that should receive the
message. The fifth argument, tag,is a nonnegative int. It can be used to distinguish messages that are
otherwise identical.

14
 MPI_Recv
Receive a message and block until the requested data is available in the application buffer in the
receiving task.
Syntax :
int MPI Recv(
void∗ msg_buf_p,
int buf_size,
MPI_Datatype buf_type,
int source,
int tag,
MPI_Comm communicator,
MPI_Status∗ status_p);

The first three arguments specify the memory available for receiving the message: msg_buf_p
points to the block of memory, buf_size determines the number of objects that can be stored in the
block, and buf_type indicates the type of the objects.

The next three arguments identify the message. The source argument specifies the process from
which the message should be received. The tag argument should match the tag argument of the
message being sent, and the communicator argument must match the communicator used by the
sending process. For status_p, the special MPI constant MPI_STATUS_IGNORE can be passed.
Message Matching
Suppose process q calls MPI_Send with
MPI_Send(send_buf_p, send_buf_sz, send_type, dest, send_tag, send_comm);
Also suppose that process r calls MPI_Recv with
MPI_Recv(recv_buf_p, recv_buf_sz, recv_type, src, recv_tag, recv_comm, &status);
Then the message sent by q with the above call to MPI_Send can be received by r with the call to
MPI_Recv if
 recv_comm = send_comm,
 recv_tag = send_tag,
 dest = r, and
 src = q.
These conditions aren’t quite enough for the message to be successfully received, however. The
parameters specified by the first three pairs of arguments, send_buf_p/recv_buf_p,
send_buf_sz/recv_buf_sz, send_type/recv_type, must specify compatible buffers. Most of the
time, the following rule will suffice:
 If recv_type = send_type and recv_buf_p >= send_buf_sz, then the message sent by q
can be successfully received by r.

Algorithm for Ping Pong Communication


A ping-pong is a communication in which two messages are sent, first from process A to process
B(ping) and then from process B to process A(pong).

 Pseudocode
Ping_Pong Communication :

15
If Process 0 :
Send a message to Process 1;
Receive the response from Process 1 and print it.
If Process 1 :
Receive the message from Process 0 and print it;
Send the acknowledgement back to Process 1.

MPI Functions for Collective Communications

 MPI_Reduce
MPI_Reduce combines the elements provided in the input buffer of each process in the group, using
the operation operator, and returns the combined value in the output buffer of the process with rank
dest_process.
Syntax :
int MPI_Reduce(
void∗ input_data_p,
void∗ output_data_p,
int count,
MPI_Datatype datatype,
MPI_Op operator,
int dest_process,
MPI_Comm comm
);

 MPI_Allreduce
Combines values from all processes and distributes the result back to all processes.
Syntax :
int MPI_Allreduce(
void∗ input_data_p,
void∗ output_data_p,
int count,
MPI_Datatype datatype,
MPI_Op operator,
MPI_Comm comm
);

 MPI_Bcast
Broadcasts a message from the process with rank "source_proc" to all other processes of the
communicator.
Syntax :
int MPI_Bcast(
void∗ data_p,
int count,
MPI_Datatype datatype,
int source_proc,
MPI_Comm comm
);

MPI Functions for Collective Communications

 MPI_Scatter

16
MPI Scatter divides the data referenced by send buf_p into comm_sz pieces.
Syntax :
int MPI_Scatter(
void* send_buf_p,
int send_count,
MPI_Datatype send_type,
Void* recv_buf_p,
int recv_count,
MPI_Datatype recv_type,
int src_proc,
MPI_Comm comm
);

 MPI_Bcast
Broadcasts a message from the process with rank "source_proc" to all other processes of the
communicator.
Syntax :
int MPI_Bcast(
void∗ data_p,
int count,
MPI_Datatype datatype,
int source_proc,
MPI_Comm comm
);

Algorithm for Global Sum estimation


 Pseudocode
Global Sum :
If Process 0
Get ‘n’ array elements from the user
Broadcast the size of the array to all the processes
Scatter the array elements among all the processes involved in communication
Compute the sum of ‘local_n’ elements in the array
Receive the local sum computed by all other process
Compute the global sum of all the local sums and print it.
Else
Receive scattered array elements from process 0
Compute the sum of ‘local_n’ elements
Send the computed local sum to process 0.

MPI Functions derived data types

 MPI_Type_create_struct
MPI_Type_create_struct is used to build a derived datatype that consists of individual elements that
have different basic types.
Syntax :
int MPI_Type_create_struct(
int count,
int array_of_blocklengths[],
MPI_Aint array_of_displacements[],
MPI_Datatype array_of_types[],
MPI_Datatype* new_type_p

17
);

 MPI_Get_address
It returns the address of the memory location.
Syntax :
int MPI_Get_address(
void* location_p,
MPI_Aint* address_p
);

 MPI_Type_commit
It commits the MPI Derived datatype.
Syntax :
int MPI_Type_commit(
MPI_Datatype new_mpi_t_p
);

18
Pthread Programming for multithreaded programming

POSIX Threads, usually referred to as Pthreads, is an execution model that exists independently
from a language, as well as a parallel execution model. It allows a program to control multiple different
flows of work that overlap in time. Each flow of work is referred to as a thread, and creation and control
over these flows is achieved by making calls to the POSIX Threads API. API are used for thread creation
and synchronization.
Thread Creation
The pthread_create() function starts a new thread in the calling process.
int pthread_create(
pthread_t* thread_p,
const pthread_attr_t* attr_p,
void* (*start_routine)(void*),
void* arg_p
);

The function that’s started by pthread_create should have a prototype that looks something like this:
void* thread_function(void* args_p);
Thread Termination
The pthread_join() function waits for the thread specified by thread to terminate. If that thread
has already terminated, then pthread_join() returns immediately.
int pthread_join(
pthread_t thread,
void** ret_val_p
);

Busy Waiting
In busy waiting, a thread repeatedly tests a condition, but effectively, does no useful work until the
condition has the appropriate value. It may also be said to be "polling". When two or more processes
want to enter the same critical section, something has to be done to prevent more than one process from
entering it.
Mutex
Mutex is an abbreviation of mutual exclusion, and a mutex is a special type of variable that, together
with a couple of special functions, can be used to restrict access to a critical section to a single thread at
a time. Thus, a mutex can be used to guarantee that one thread “excludes” all other threads while it
executes the critical section.
 A variable of type pthread_mutex_t needs to be initialized by the system before it's used. This can be
done with a call
int pthread_mutex_init(
pthread_mutex_t* mutex_p,
const pthread_mutexattr_t* attr_p
);
 We won't make use of the second argument, so we'll just pass in NULL. When a Pthreads program
finishes using a mutex, it should call
int pthread_mutex_destroy(
pthread_mutex_t* mutex_p
);
 To gain access to a critical section, a thread calls
int pthread_mutex_lock(
pthread_mutex_t* mutex_p
);

19
 When a thread is finished executing the code in a critical section, it should call
int pthread_mutex_unlock(pthread_mutex_t* mutex_p);
The call to pthread_mutex_lock will cause the thread to wait until no other thread is in the critical
section, and the call to pthread_mutex_unlock notifies the system that the calling thread has completed
execution of the code in the critical section.

Pthread Semaphores

A semaphores is a variable or abstract data type that is used for controlling access, by multiple
processes, to a common resource in a concurrent system such as a multiprogramming operating system.
Syntax
 int sem_init(Sem_t* semaphore_p,int shared,unsigned initial_val);
- sem_init() initializes the semaphore at the address pointed to by semaphore_p.
- The val argument specifies the initial value for the semaphore.
- The shared argument indicates whether this semaphore is to be share between the threads of a
process, or between processes.
 int sem_destroy(sem_t* semaphore_p);
- Destroys the semaphore at the address pointed to by semaphore_p.
 int sem_post(sem_t* semaphore_p);
 int sem_wait(sem_t* semaphore_p);
- A thread that executes sem_wait will block if the semaphore is 0.If the semaphore is non-zero,
it will decrement the semaphore and proceed.

After executing the code in the critical section, a thread calls sem_post, which increments the
semaphore, and a thread waiting in sem_wait can proceed.when thread can not proceed until another
thread has taken some acton, is sometimes called producer – consumer synchronization.

Barrier Point

A barrier is a point where the thread is going to wait for other threads and will proceed further only when
predefined number of threads reach the same barrier in their respective programs.

Pthread functions Used

pthread_mutex_init
int pthread_mutex_init (
pthread_mutex_t *mutex,
const pthread_mutexattr_t *attr
);

The pthread_mutex_init() function initialises the mutex referenced by mutex with attributes specified by
attr. If attr is NULL, the default mutex attributes are used. Upon successful initialisation, the state of the
mutex becomes initialised and unlocked.

pthread_cond_init()
int pthread_cond_init (
pthread_cond_t *cond,

20
const pthread_condattr_t *attr
);
The function pthread_cond_init() initialises the condition variable referenced by cond with attributes
referenced by attr. If attr is NULL, the default condition variable attributes are used; the effect is the
same as passing the address of a default condition variable attributes object. Upon successful
initialisation, the state of the condition variable becomes initialised.

Algorithm for Sum of ‘n’ Elements with barrier points


Initialize pthread_mutex
Get range value from the user
For thread = 0 to thread_count
Create pthread
For thread = 0 to thread_count
Perform pthread joint operation
Calculate pi value
Print pi value
For i = my_first to my_last
sum += factor / (2 * i + 1)
ans = sum * 4;
print sum value
Signal the waiting threads by unlocking the semaphore
total += ans
sem_wait(&sem)
Increment counter variable whenever the thread finishes its task
counter++
Verify whether all the threads had entered the barrier
Broadcast the condition variable indicating that all the threads
Unlock mutex

Concurrent Data Structures : Pthread functions Used

pthread_rwlock_wrlock()
The pthread_rwlock_rdlock() function shall apply a read lock to the read-write lock referenced by
rwlock. The calling thread acquires the read lock if a writer does not hold the lock and there are no
writers blocked on the lock.
Syntax
int pthread_rwlock_rdlock(
pthread_rwlock_t *rwlock
);
int pthread_rwlock_tryrdlock(
pthread_rwlock_t *rwlock

21
);
int pthread_rwlock_wrlock(
pthread_rwlock_t *rwlock
);
pthread_rwlock_init(
pthread_rwlock_t *rwlock
);
pthread_rwlock_unlock(
pthread_rwlock_t *rwlock
);

22
OpenMP programming for concurrency

OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-
platform shared memory multiprocessing programming in C, C++, and Fortran, on most platforms,
processor architectures and operating systems, including Solaris, AIX, HP-UX, Linux, OS X, and Windows.
It consists of a set of compiler directives, library routines, and environment variables that influence run-
time behavior.

System Calls Used

Reduction Clause

A reduction operator is a binary operation (such as addition or multiplication) and a reduction is a


computation that repeatedly applies the same reduction operator to a sequence of operands in order to
get a single result.

Syntax
reduction(<operator>: <variable list>)

Example
# pragma omp parallel num_threads(thread count) \
reduction(+: global_result)
global result += Local trap(double a, double b, int n);

Program Implementation
Sum of ‘n’ Elements
 Procedure

Intialize thread_count, sum, gsum.


Get the size of the element and get the elements from the user.
Assign independent work to all threads by using parallel for.
Calculate the Local sum
Calculate the Global sum
For k=0 to n
Sum = Sum +a[k]
Print the Global sum

Parallel For directive

The parallel for directive forks a team of threads to execute the following structured block. The structured
block following the parallel for directive must be a for loop. Furthermore, with the parallel for directive

23
the system parallelizes the for loop by dividing the iterations of the loop among the threads.

Legal Forms of Parallelizable for statements

Example
fibo[0] = fibo[1] = 1;
# pragma omp parallel for num_threads(thread count)
for (i = 2; i < n; i++)
fibo[i] = fibo[i 1] + fibo[i 2];



24

You might also like