Copyright 107
Copyright 107
Submitted to
COPYRIGHT OFFICE
DEPARTMENT FOR PROMOTION OF INDUSTRY AND
INTERNAL TRADE
MINISTRY OF COMMERCE AND INDUSTRY
NEW DELHI-110078
Submitted by
June 2020
TABLE OF CONTENTS
2. Course Syllabus 3
4. Plan of Exercises 5
Vision:
Envisioning a world lead by our engineers, holding a beacon of hope and confidence for generations
to come.
Mission:
To Produce Competent, Disciplined and Quality Engineers & Administrators through Service Par
Excellence.
Vision:
To become the centre of excellence in computer education and research and to create the platform
for industrial consultancy
Mission:
To mould the students by inculcating the spirit of ethical values contributing to the
societal ethics
1
Programme Outcomes (POs) – M.E Computer Science and Engineering
During the course of the programme on Master of Computer Science and Engineering
the learners will acquire the ability to:
POs
POs
Number
Apply knowledge of mathematics, science and information science in computer
1. engineering in advance level
Design a Computer system with components and processes of desired needs within
2. realistic constraints such as economic, environmental, social, political, ethical, health
and safety
Identify and modify the functions of the internal of computer components such as
3. operating systems and compilers
4. Apply Software Engineering principles, techniques and tools in software development
Create, collect, process, view, organize, store, mine and retrieve data in both local and
5. remote locations in a secure and effective manner
Design and conduct experiments, as well as to analyze and interpret data to lay a
6. foundation for solving complex problems
Engage in life-long learning to acquire knowledge of contemporary issues to meet the
7. challenges in the career
Apply the skills and techniques in computer engineering and inter-disciplinary domains
8. for providing solutions in a global, economic, environmental, and societal context
9. Develop research skills and innovative ideas
10. Model the real world problems to address and share the research issues
11. Share their knowledge and express their ideas in any technical forum
12. Present their ideas to prepare for a position to educate and guide others
1 2 3 4 5 6 7 8 9 10 11 12
1.
2.
3.
2
Course Syllabus
Course Objective:
1. To learn the design of message passing paradigm using MPI
2. To explore the shared memory paradigm with Pthreads
3. To learn the implementation of shared memory paradigm with OpenMP.
4. To learn the GPU based parallel programming using OpenCL.
Course Outcome:
1. Implement message passing parallel programs using MPI framework
2. Implement shared memory parallel programs using Pthreads
3. Work with shared memory parallel programs using OpenMP
4. Design and develop OpenCL programs
Course Prerequisite:
Computer Organization, Multi core Architecture and Any programming language such C,C++,Java
3. M. J. Quinn, “Parallel programming in C with MPI and OpenMP”, Tata McGraw Hill, 2003.
4. W. Gropp, E. Lusk, and R. Thakur, “Using MPI-2: Advanced features of the message passing
interface”, MIT Press, 1999.
5. W. Gropp, E. Lusk, and A. Skjellum, “Using MPI: Portable parallel programming with the
message passing interface”, Second Edition, MIT Press, 1999.
6. B. Chapman, G. Jost, and Ruud van der Pas, “Using OpenMP”, MIT Press, 2008.
3
Course Outcomes–Programme Outcomes mapping
(3- Substantially, 2-Moderately, 1-Slightly)
*Mode of Delivery:
1. Oral presentation
2. Tutorial
3. Hands on/Demonstration
4. Seminar/Guest lecture
5. Videos
6. Field visit
**Assessment Methods:
1. Internal Test
2. Assignment
3. Course Seminar
4. Course Participation
5. Course Quiz
6. Demo
7. Case Study
8. Record Work
9. Lab Mini Project
10. Lab Model Exam
11. Project Review
12. Lab Observation
13. Poster Presentation
14. Group Discussion
4
PLAN OF EXERCISES
LU-1 Programs using Message passing and Message matching in MPI Period: 3
LU Outcomes CO Number:1
Design and implement Programs using Message passing and Message matching
in MPI
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Ping pong message passing C
LU Objectives
Implement Programs using Broadcast and reduction in MPI
LU Outcomes CO Number:1
Design Concurrent Linked list using coarse grained synchronization
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
5
2. Matrix vector multiplication C
3. Pi- calculation C
LU Outcomes CO Number:1
Design MPI programs using MPI derived types
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Find the percentage of 5 subject marks using MPI derived types C
2. Calculate interest using MPI derived types C
LU-7 Implementation of concurrent List with Pthread read write locks Period: 3
LU Outcomes CO Number:2
Construct concurrent List with Pthread read write locks
6
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Implement concurrent linked list for maintaining student C
information (ordered and unordered)
2. Implement circular and doubly linked list C
3. Implement Queue using linked list C
4. Add 2 polynomial using linked List C
LU Outcomes CO Number:3
1. Work with Reduction clause in OpenMP
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Find the sum of N numbers using Reduction clause C
2. Calculate Pi using Reduction clause C
3. Find the factorial of N using Reduction clause C
LU Outcomes CO Number:4
1. Develop programs using parallel for directive in OpenMP
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Find the sum of N numbers using parallel for directive C
2. Calculate Pi using parallel for directive C
3. Count the number of positive elements in a list of n parallel for C
directive
LU Outcomes CO Number:4
1. Implement programs using OpenMP and schedule the loops
2. Implement Odd even transposition sort
LU Outcomes CO Number:5
1. Design and implement programs with synchronization in OpenMP - Producer
Consumer problem
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Implement producer consumer problem C
2. Generate the Fibonacci series C
3. Calculate area using Trapezoidal rule C
7
1. Design and implement programs using OpenCL buffers and image object
Possible Assessment Questions: (Rating the level of questions – R, U, A, L, E, C)
Sl.No Test Questions Level
1. Find sum of 1000 elements suing OpenCL buffer C
2. Read and write images using image object C
8
Instructions for Compilation and Execution of the Program:
Creating makefile
create a file MakeFile with no extension
i. $vi makefile
output.exe:headimp.o headapp.o
cc headimp.o headapp.o -o output.exe
headapp.o:headapp.c head.h
cc -c headapp.c
headimp.o:headimp.c head.h
cc -c headimp.c
ii. $make
iii. ./output.exe
9
System Calls for Inter process communication:
fork() - Fork system call use for creates a new process, which is called child process, which runs
concurrently with process (which process called system call fork) and this process is called parent
process.
After a new child process created, both processes will execute the next instruction following the fork()
system call. A child process use same pc(program counter), same CPU registers, same open files which
use in parent process.
It take no parameters and return integer value. Below are different values returned by fork().
Negative Value: creation of a child process was unsuccessful.
Zero: Returned to the newly created child process.
Positive value: Returned to parent or caller. The value contains process ID of newly created
child process.
System calls
a. fork
Creates a child process that differs from the parent process only in its PID and
PPID
Prototype
pid_t fork(void)
On success, the PID of the child process is returned in the parent's
thread of execution, and a 0 is returned in the child's thread of
execution.
On failure, a -1 will be returned in the parent's context, no child
process will be created.
b. getpid
Returns the process ID of the calling process.
Prototype
pid_t getpid(void)
c. getppid
Returns the process ID of the parent of the calling process.
Prototype
pid_t getppid(void)
d. exit
Cause normal program termination
Prototype
void exit(int status)
Where status is an integer between 0 and 255.
When a process exits with a status of zero that means it didn't
encounter any problems. when a process exit with a non-zero status
10
that means it did have problems a non zero status that means it did
have problems
e. wait
Blocks the calling process until one of its child processes exits or a signal is
received.
Prototype
int wait(int *status)
status is a pointer to an integer where the UNIX system stores the
value returned by the child process.
returns the process ID of the process that ended.
algorithm ipcPipe(string) {
//string- data to be shared among two processes
create a pipe using pipe() system call
create a Child Process
if (child created successfully) {
close the read end of child’s pipe
write the string in to child’s queue using write( ) system call
}
else {
create the child process again;
}
if (parent get focus) {
close the write end of parent’s pipe;
retrieve the data in pipe using read( ) system call;
display the data;
}
}
11
Prototype
int shmget(key_t key, int size, int shmflg);
Field Description
IPC_CREAT
shmflg Create the segment if it doesn't already exist in the kernel.
IPC_EXCL
When used with IPC_CREAT, fail if segment already exists
b. shmat()
Attaches the shared memory segment identified by shmid to the address space
of the calling process
Header file
#include <sys/shm.h>
Prototype
void *shmat(int shmid, const void *shmaddr, int shmflg)
Field Description
Specifies a set of flags that indicate the specific shared memory conditions
and options to implement
Eg. SHM_RDONLY - the segment is attached for reading and the process
shmflg
must have read permission for the segment.
SHM_REMAP - indicate that the mapping of the segment should replace
any existing mapping in the range starting at shmaddr and continuing for
the size of the segment.
c. shmdt
The shmdt() function detaches from the calling process's data segment the
shared memory segment located at the address specified by shmaddr.
Header File
include <sys/shm.h>
12
Prototype
int shmdt(const void *shmaddr)
Field Description
shmaddr
Is the data segment start address of a shared memory segment
d. shmctl
Perform shared memory control operations
Header File
#include <sys/shm.h>
Prototype
int shmctl(int shmid, int cmd, struct shmid_ds *buf);
Field Description
Is a unique positive integer created by a shmget system call and associated with a
shmid
segment of shared memory.
cmd
Specifies one of IPC_STAT, IPC_SET, or IPC_RMID
buf Points to the data structure used for sending or receiving data during execution of
shared memory control operations
return
If successful, shmctl() returns zero. On failure, it returns -1
value
Server Side
algorithm sharedMemoryServer(data) {
// data- content to be shared among processes
Create a global data space
Create a new shared memory using shmget() system call ;
attach the global data space shmat( ) system call;
write data in to the global data space;
wait for client to read the data space;
}
Client side
algorithm sharedMemoryClient( ) {
get the shared memory created by server using shmget( )
read the data from shared memory
process data
}
13
MPI Programming for Process communication
MPI_Comm_size
Returns the total number of MPI processes in the specified communicator, such as
MPI_COMM_WORLD. If the communicator is MPI_COMM_WORLD, then it represents the number of MPI
tasks available to your application.
Syntax :
MPI_Comm_size (comm,&size)
MPI_Comm_rank
Returns the rank of the calling MPI process within the specified communicator. Initially, each process
will be assigned a unique integer rank between 0 and number of tasks - 1 within the communicator
MPI_COMM_WORLD. This rank is often referred to as a task ID. If a process becomes associated with
other communicators, it will have a unique rank within each of these as well.
Syntax :
MPI_Comm_rank (comm,&rank)
MPI_Send
Basic blocking send operation. Routine returns only after the application buffer in the sending task is
free for reuse.
Syntax :
int MPI Send(
void∗ msg_buf_p,
int msg_size,
MPI_Datatype msg_type,
int destination,
int tag,
MPI_Comm communicator
);
The first three arguments, msg_buf_p, msg_size, and msg_type, determine the contents of the
message.
The remaining arguments, destination, tag, and communicator, determine the destination of the
message. The fourth argument, destination, specifies the rank of the process that should receive the
message. The fifth argument, tag,is a nonnegative int. It can be used to distinguish messages that are
otherwise identical.
14
MPI_Recv
Receive a message and block until the requested data is available in the application buffer in the
receiving task.
Syntax :
int MPI Recv(
void∗ msg_buf_p,
int buf_size,
MPI_Datatype buf_type,
int source,
int tag,
MPI_Comm communicator,
MPI_Status∗ status_p);
The first three arguments specify the memory available for receiving the message: msg_buf_p
points to the block of memory, buf_size determines the number of objects that can be stored in the
block, and buf_type indicates the type of the objects.
The next three arguments identify the message. The source argument specifies the process from
which the message should be received. The tag argument should match the tag argument of the
message being sent, and the communicator argument must match the communicator used by the
sending process. For status_p, the special MPI constant MPI_STATUS_IGNORE can be passed.
Message Matching
Suppose process q calls MPI_Send with
MPI_Send(send_buf_p, send_buf_sz, send_type, dest, send_tag, send_comm);
Also suppose that process r calls MPI_Recv with
MPI_Recv(recv_buf_p, recv_buf_sz, recv_type, src, recv_tag, recv_comm, &status);
Then the message sent by q with the above call to MPI_Send can be received by r with the call to
MPI_Recv if
recv_comm = send_comm,
recv_tag = send_tag,
dest = r, and
src = q.
These conditions aren’t quite enough for the message to be successfully received, however. The
parameters specified by the first three pairs of arguments, send_buf_p/recv_buf_p,
send_buf_sz/recv_buf_sz, send_type/recv_type, must specify compatible buffers. Most of the
time, the following rule will suffice:
If recv_type = send_type and recv_buf_p >= send_buf_sz, then the message sent by q
can be successfully received by r.
Pseudocode
Ping_Pong Communication :
15
If Process 0 :
Send a message to Process 1;
Receive the response from Process 1 and print it.
If Process 1 :
Receive the message from Process 0 and print it;
Send the acknowledgement back to Process 1.
MPI_Reduce
MPI_Reduce combines the elements provided in the input buffer of each process in the group, using
the operation operator, and returns the combined value in the output buffer of the process with rank
dest_process.
Syntax :
int MPI_Reduce(
void∗ input_data_p,
void∗ output_data_p,
int count,
MPI_Datatype datatype,
MPI_Op operator,
int dest_process,
MPI_Comm comm
);
MPI_Allreduce
Combines values from all processes and distributes the result back to all processes.
Syntax :
int MPI_Allreduce(
void∗ input_data_p,
void∗ output_data_p,
int count,
MPI_Datatype datatype,
MPI_Op operator,
MPI_Comm comm
);
MPI_Bcast
Broadcasts a message from the process with rank "source_proc" to all other processes of the
communicator.
Syntax :
int MPI_Bcast(
void∗ data_p,
int count,
MPI_Datatype datatype,
int source_proc,
MPI_Comm comm
);
MPI_Scatter
16
MPI Scatter divides the data referenced by send buf_p into comm_sz pieces.
Syntax :
int MPI_Scatter(
void* send_buf_p,
int send_count,
MPI_Datatype send_type,
Void* recv_buf_p,
int recv_count,
MPI_Datatype recv_type,
int src_proc,
MPI_Comm comm
);
MPI_Bcast
Broadcasts a message from the process with rank "source_proc" to all other processes of the
communicator.
Syntax :
int MPI_Bcast(
void∗ data_p,
int count,
MPI_Datatype datatype,
int source_proc,
MPI_Comm comm
);
MPI_Type_create_struct
MPI_Type_create_struct is used to build a derived datatype that consists of individual elements that
have different basic types.
Syntax :
int MPI_Type_create_struct(
int count,
int array_of_blocklengths[],
MPI_Aint array_of_displacements[],
MPI_Datatype array_of_types[],
MPI_Datatype* new_type_p
17
);
MPI_Get_address
It returns the address of the memory location.
Syntax :
int MPI_Get_address(
void* location_p,
MPI_Aint* address_p
);
MPI_Type_commit
It commits the MPI Derived datatype.
Syntax :
int MPI_Type_commit(
MPI_Datatype new_mpi_t_p
);
18
Pthread Programming for multithreaded programming
POSIX Threads, usually referred to as Pthreads, is an execution model that exists independently
from a language, as well as a parallel execution model. It allows a program to control multiple different
flows of work that overlap in time. Each flow of work is referred to as a thread, and creation and control
over these flows is achieved by making calls to the POSIX Threads API. API are used for thread creation
and synchronization.
Thread Creation
The pthread_create() function starts a new thread in the calling process.
int pthread_create(
pthread_t* thread_p,
const pthread_attr_t* attr_p,
void* (*start_routine)(void*),
void* arg_p
);
The function that’s started by pthread_create should have a prototype that looks something like this:
void* thread_function(void* args_p);
Thread Termination
The pthread_join() function waits for the thread specified by thread to terminate. If that thread
has already terminated, then pthread_join() returns immediately.
int pthread_join(
pthread_t thread,
void** ret_val_p
);
Busy Waiting
In busy waiting, a thread repeatedly tests a condition, but effectively, does no useful work until the
condition has the appropriate value. It may also be said to be "polling". When two or more processes
want to enter the same critical section, something has to be done to prevent more than one process from
entering it.
Mutex
Mutex is an abbreviation of mutual exclusion, and a mutex is a special type of variable that, together
with a couple of special functions, can be used to restrict access to a critical section to a single thread at
a time. Thus, a mutex can be used to guarantee that one thread “excludes” all other threads while it
executes the critical section.
A variable of type pthread_mutex_t needs to be initialized by the system before it's used. This can be
done with a call
int pthread_mutex_init(
pthread_mutex_t* mutex_p,
const pthread_mutexattr_t* attr_p
);
We won't make use of the second argument, so we'll just pass in NULL. When a Pthreads program
finishes using a mutex, it should call
int pthread_mutex_destroy(
pthread_mutex_t* mutex_p
);
To gain access to a critical section, a thread calls
int pthread_mutex_lock(
pthread_mutex_t* mutex_p
);
19
When a thread is finished executing the code in a critical section, it should call
int pthread_mutex_unlock(pthread_mutex_t* mutex_p);
The call to pthread_mutex_lock will cause the thread to wait until no other thread is in the critical
section, and the call to pthread_mutex_unlock notifies the system that the calling thread has completed
execution of the code in the critical section.
Pthread Semaphores
A semaphores is a variable or abstract data type that is used for controlling access, by multiple
processes, to a common resource in a concurrent system such as a multiprogramming operating system.
Syntax
int sem_init(Sem_t* semaphore_p,int shared,unsigned initial_val);
- sem_init() initializes the semaphore at the address pointed to by semaphore_p.
- The val argument specifies the initial value for the semaphore.
- The shared argument indicates whether this semaphore is to be share between the threads of a
process, or between processes.
int sem_destroy(sem_t* semaphore_p);
- Destroys the semaphore at the address pointed to by semaphore_p.
int sem_post(sem_t* semaphore_p);
int sem_wait(sem_t* semaphore_p);
- A thread that executes sem_wait will block if the semaphore is 0.If the semaphore is non-zero,
it will decrement the semaphore and proceed.
After executing the code in the critical section, a thread calls sem_post, which increments the
semaphore, and a thread waiting in sem_wait can proceed.when thread can not proceed until another
thread has taken some acton, is sometimes called producer – consumer synchronization.
Barrier Point
A barrier is a point where the thread is going to wait for other threads and will proceed further only when
predefined number of threads reach the same barrier in their respective programs.
pthread_mutex_init
int pthread_mutex_init (
pthread_mutex_t *mutex,
const pthread_mutexattr_t *attr
);
The pthread_mutex_init() function initialises the mutex referenced by mutex with attributes specified by
attr. If attr is NULL, the default mutex attributes are used. Upon successful initialisation, the state of the
mutex becomes initialised and unlocked.
pthread_cond_init()
int pthread_cond_init (
pthread_cond_t *cond,
20
const pthread_condattr_t *attr
);
The function pthread_cond_init() initialises the condition variable referenced by cond with attributes
referenced by attr. If attr is NULL, the default condition variable attributes are used; the effect is the
same as passing the address of a default condition variable attributes object. Upon successful
initialisation, the state of the condition variable becomes initialised.
pthread_rwlock_wrlock()
The pthread_rwlock_rdlock() function shall apply a read lock to the read-write lock referenced by
rwlock. The calling thread acquires the read lock if a writer does not hold the lock and there are no
writers blocked on the lock.
Syntax
int pthread_rwlock_rdlock(
pthread_rwlock_t *rwlock
);
int pthread_rwlock_tryrdlock(
pthread_rwlock_t *rwlock
21
);
int pthread_rwlock_wrlock(
pthread_rwlock_t *rwlock
);
pthread_rwlock_init(
pthread_rwlock_t *rwlock
);
pthread_rwlock_unlock(
pthread_rwlock_t *rwlock
);
22
OpenMP programming for concurrency
OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-
platform shared memory multiprocessing programming in C, C++, and Fortran, on most platforms,
processor architectures and operating systems, including Solaris, AIX, HP-UX, Linux, OS X, and Windows.
It consists of a set of compiler directives, library routines, and environment variables that influence run-
time behavior.
Reduction Clause
Syntax
reduction(<operator>: <variable list>)
Example
# pragma omp parallel num_threads(thread count) \
reduction(+: global_result)
global result += Local trap(double a, double b, int n);
Program Implementation
Sum of ‘n’ Elements
Procedure
The parallel for directive forks a team of threads to execute the following structured block. The structured
block following the parallel for directive must be a for loop. Furthermore, with the parallel for directive
23
the system parallelizes the for loop by dividing the iterations of the loop among the threads.
Example
fibo[0] = fibo[1] = 1;
# pragma omp parallel for num_threads(thread count)
for (i = 2; i < n; i++)
fibo[i] = fibo[i 1] + fibo[i 2];
24