0% found this document useful (0 votes)
163 views18 pages

Handbook HPC 23-24

Uploaded by

Coading Captain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
163 views18 pages

Handbook HPC 23-24

Uploaded by

Coading Captain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Savitribai Phule Pune University

Fourth Year of Computer Engineering (2019 Course)


410250: High Performance Computing
Teaching Scheme: Credit Examination Scheme:
TH: 3 Hours/Week 3 In- Sem (TH) : 30
End- Sem (TH): 70

Prerequisites Courses: -Microprocessor (210254), Principles of Programming


Languages(210255), Computer Networks and Security(310244)
Companion Course: Laboratory Practice V(410254)

Course Objectives:
• To understand different parallel programming models
• To analyze the performance and modeling of parallel programs
• To illustrate the various techniques to parallelize the algorithm
• To implement parallel communication operations.
• To discriminate CUDA Architecture and its components.
• To Understand Scope of Parallel Computing and its search algorithms.

Course Outcomes:
CO1: Understand various Parallel Paradigm
CO2: Design and Develop an efficient parallel algorithm to solve given problem
CO3: Illustrate data communication operations on various parallel architecture
CO4: Analyze and measure performance of modern parallel computing systems
CO5: Apply CUDA architecture for parallel programming
CO6: Analyze the performance of HPC applications

Course Contents
Unit I :Introduction to Parallel Computing 07 Hours
Introduction to Parallel Computing: Motivating Parallelism, Modern Processor: Stored-program
computer architecture, General-purpose Cache-based Microprocessor architecture. Parallel Programming
Platforms: Implicit Parallelism, Dichotomy of Parallel Computing Platforms, Physical Organization of
Parallel Platforms, Communication Costs in Parallel Machines. Levels of parallelism, Models: SIMD,
MIMD, SIMT, SPMD, Data Flow Models, Demand-driven Computation, Architectures: N-wide
superscalar architectures, multi-core, multi-threaded.
#Exemplar/Case Studies Case study: Multi-core System
*Mapping of Course Outcomes for Unit I - CO1

Unit II :Parallel Algorithm Design 07 Hours


Principles of Parallel Algorithm Design: Preliminaries, Decomposition Techniques, Characteristics of
Tasks and Interactions, Mapping Techniques for Load Balancing, Methods for Containing Interaction
Overheads, Parallel Algorithm Models: Data, Task, Work Pool and Master Slave Model, Complexities:
Sequential and Parallel Computational Complexity, Anomalies in Parallel Algorithms.
#Exemplar/Case Studies Foster's parallel algorithm design methodology.
(http://compsci.hunter.cuny.edu/~sweiss/course_materials/csci493.65/lecture_notes/chapter03.pdf)
*Mapping of Course Outcomes for Unit II CO2

Unit III : Parallel Communication 07 Hours


Basic Communication: One-to-All Broadcast, All-to-One Reduction, All-to-All Broadcast and Reduction,
All-Reduce and Prefix-Sum Operations, Collective Communication using MPI: Scatter, Gather, Broadcast,
Blocking and non blocking MPI, All-to-All Personalized Communication, Circular Shift, Improving the
speed of some communication operations.
#Exemplar/Case Studies Monte-Carlo Pi computing using MPI
*Mapping of Course Outcomes for Unit III CO3

Unit IV :Analytical Modeling of Parallel Programs 07 Hours


Sources of Overhead in Parallel Programs, Performance Measures and Analysis: Amdahl's and Gustafson's
Laws, Speedup Factor and Efficiency, Cost and Utilization, Execution Rate and Redundancy, The Effect of
Granularity on Performance, Scalability of Parallel Systems, Minimum Execution Time and Minimum
Cost, Optimal Execution Time, Asymptotic Analysis of Parallel Programs. Matrix Computation: Matrix-
Vector Multiplication, Matrix-Matrix
Multiplication.
#Exemplar/Case Studies The DAG Model of parallel computation
*Mapping of Course Outcomes for Unit IV CO4

Unit V : CUDA Architecture 07 Hours


Introduction to GPU: Introduction to GPU Architecture overview, Introduction to CUDA C- CUDA
programming model, write and launch a CUDA kernel, Handling Errors, CUDA memory model, Manage
communication and synchronization, Parallel programming in CUDA- C.
#Exemplar/Case Studies GPU applications using SYCL and CUDA on NVIDIA
*Mapping of Course Outcomes for Unit V CO5

Unit VI : High Performance Computing Applications 07 Hours


Scope of Parallel Computing, Parallel Search Algorithms: Depth First Search(DFS), Breadth First Search(
BFS), Parallel Sorting: Bubble and Merge, Distributed Computing: Document classification, Frameworks –
Kuberbets, GPU Applications, Parallel Computing for AI/ ML
#Exemplar/Case Studies Disaster detection and management/ Smart Mobility/ Urban planning
*Mapping of Course Outcomes for Unit VI CO6
Learning Resources
Text Books:
1. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar, "Introduction to Parallel
Computing", 2nd edition, Addison-Wesley, 2003, ISBN: 0-201-64865-2
2. Seyed H. Roosta, “Parallel Processing and Parallel Algorithms Theory and Computation‖”, Springer-
Verlag 2000 ,ISBN 978-1-4612-7048-5 ISBN 978-1-4612-1220-1
3. John Cheng, Max Grossman, and Ty McKercher, “Professional CUDA C Programming”, John Wiley &
Sons, Inc., ISBN: 978-1-118-73932-7
Reference Books :
1. Kai Hwang,, "Scalable Parallel Computing", McGraw Hill 1998.
2. George S. Almasi and Alan Gottlieb, "Highly Parallel Computing", The Benjamin and Cummings Pub.
Co., Inc
3. Jason sanders, Edward Kandrot, “CUDA by Example”, Addison-Wesley, ISBN-13: 978- 0-13-138768-3
4. Pacheco, Peter S., “An Introduction to Parallel Programming”, Morgan Kaufmann Publishers ISBN
978-0-12-374260-5
5. Rieffel WH.EG, Polak, “Quantum Computing: A gentle introduction”, MIT Press, 2011, ISBN 978-0-
262-01506-6
6. Ajay D. Kshemkalyani , Mukesh Singhal, “Distributed Computing: Principles, Algorithms, and
Systems”, Cambridge March 2011, ISBN: 9780521189842

e Books :
1.http://prdrklaina.weebly.com/uploads/5/7/7/3/5773421/introduction_to_high_performance_computing_f
or_scientists_and_engineers.pdf
2. https://www.vssut.ac.in/lecture_notes/lecture1428643084.pdf

NPTEL/YouTube video lecture link


● https://nptel.ac.in/courses/106108055
● https://www.digimat.in/nptel/courses/video/106104120/L01.html
Sinhgad Technical Education Society’s

RMD SINHGAD SCHOOL OF ENGINEERING, PUNE


Department of Computer Engineering
TEACHING PLAN
Academic Year: 2023-24(Semester: VIII)
Course Title: Subject Code: Class: Division:
High Performance Computing B.E. A and B
410250
Term: I Date of commencement of classes: Date of conclusion of teaching:
11/12/2023 19/04/2024
Lecture Examination Scheme
Practical/Tutorial
Schedule: Theory:100 M Term Work Practical Oral
Schedule:
3Hrs/ In Sem:30 M (1 Hr)
Week End Sem: 70(2Hrs.30 min.) - - -

Subject Mrs.P.V.Kasture Previous 3 Years University 2020-21 2021-22 2022-23


Teacher Mrs. H. Kumbhar Result

UNIT – I: Introduction to parallel computing (07 Hours )


Introduction to Parallel Computing: Motivating Parallelism, Modern Processor: Stored-
program computer architecture, General-purpose Cache-based Microprocessor architecture.
Parallel Programming Platforms: Implicit Parallelism, Dichotomy of Parallel Computing
Platforms, Physical Organization of Parallel Platforms, Communication Costs in Parallel
Machines. Levels of parallelism, Models: SIMD, MIMD, SIMT, SPMD, Data Flow Models,
Demand-driven Computation, Architectures: N-wide superscalar architectures, multi-core,
multi-threaded.
PLAN ACTUAL
Reasons
Lect.
Date Topics Date Topics covered for
No.
Deviation
Introduction to Parallel
1. Computing: Motivating
Parallelism,
Modern Processor: Stored-
program computer
2. architecture, General-purpose
Cache-based Microprocessor
architecture
Parallel Programming
Platforms: Implicit
3. Parallelism, Dichotomy of
Parallel Computing Model

Platforms, Physical
4.
Organization of Parallel
Platforms, Communication
Costs in Parallel Machines

Levels of parallelism,
5. Models: SIMD, MIMD,
SIMT ,SPMD

Data Flow Models, Demand-


6. driven Computation,

7. Architectures: N-wide
superscalar architectures,
multi-core, multi-threaded.

Make up Classes

Contents Beyond syllabus

UNIT –II: Parallel Algorithm Design (07 Hours )


Principles of Parallel Algorithm Design: Preliminaries, Decomposition Techniques Characteristics
of Tasks and Interactions, Mapping Techniques for Load Balancing Methods for Containing
Interaction Overheads Methods for Containing Interaction Overheads ,Parallel Algorithm Models:
Data, Task, Work Pool and Master Slave Model Complexities: Sequential and Parallel
Computational Complexity Anomalies in Parallel Algorithms.

PLAN ACTUAL
Reasons
Lect.
Date Topics Date Topics covered for
No.
Deviation
8. Principles of Parallel
Algorithm Design:
Preliminaries
9.
Decomposition Techniques

10. Characteristics of Tasks and


Interactions, Mapping
Techniques for Load
Balancing
11. Methods for Containing
Interaction Overheads
12. Parallel Algorithm Models:
Data, Task, Work Pool and
Master Slave Model
13. Complexities: Sequential
and Parallel Computational
Complexity
14. Anomalies in Parallel
Algorithms.

Make up Classes

Contents Beyond syllabus

UNIT –III: Parallel Communication 07 Hours


Basic Communication: One-to-All Broadcast, All-to-One Reduction, All-to-All Broadcast and
Reduction, All-Reduce and Prefix-Sum Operations, Collective Communication using MPI: Scatter,
Gather, Broadcast, Blocking and non blocking MPI, All-to-All Personalized Communication,
Circular Shift, Improving the speed of some communication operations.
PLAN ACTUAL
Reasons
Lect.
Date Topics Date Topics covered for
No.
Deviation
Basic Communication: One-
15. to-All Broadcast, All-to-One
Reduction
All-to-All Broadcast and
16.
Reduction
All-Reduce and Prefix-Sum
17.
Operations
Collective Communication
18.
using MPI: Scatter
Gather, Broadcast, Blocking
19.
, and non blocking MPI
All-to-All Personalized
20. Communication

Circular Shift, Improving


21. the speed of some
communication operations
Make up Classes

Contents Beyond syllabus


UNIT –IV: Analytical Modeling of Parallel Programs(07 Hours)
Sources of Overhead in Parallel Programs, Performance Measures and Analysis: Amdahl's
and Gustafson's Laws, Speedup Factor and Efficiency, Cost and Utilization, Execution Rate
and Redundancy, The Effect of Granularity on Performance, Scalability of Parallel
Systems, Minimum Execution Time and Minimum Cost, Optimal Execution Time,
Asymptotic Analysis of Parallel Programs. Matrix Computation: Matrix-Vector
Multiplication, Matrix-Matrix Multiplication.
PLAN ACTUAL
Reasons
Lect.
Date Topics Date Topics covered for
No.
Deviation
22. Sources of Overhead in
Parallel Programs
23. Performance Measures and
Analysis: Amdahl's and
Gustafson's Laws,
24. Speedup Factor and
Efficiency, Cost and
Utilization, Execution Rate
25. and Redundancy, The
Effect of Granularity on
Performance, Scalability of
Parallel Systems
26. Minimum Execution Time
and Minimum Cost,
Optimal Execution Time,
27. Analysis of Parallel
Programs.
28. Matrix Computation:
Matrix-Vector
Multiplication, Matrix-
Matrix
Multiplication.

Make up Classes

Contents Beyond syllabus

UNIT –V: CUDA Architecture (07 Hours)


Introduction to GPU: Introduction to GPU Architecture overview, Introduction to CUDA C- CUDA
programming model, write and launch a CUDA kernel, Handling Errors, CUDA memory model,
Manage communication and synchronization, Parallel programming in CUDA- C.

PLAN ACTUAL
Reasons
Lect.
Date Topics Date Topics covered for
No.
Deviation
29. Introduction to GPU:
Introduction to GPU
Architecture overview
30.
Introduction to CUDA C
31. CUDA programming model,
write and launch a CUDA
kernel
32. Handling Errors in CUDA
C
33.
CUDA memory model
34. Manage communication and
synchronization
35. Parallel programming in
CUDA- C.

Make up Classes

Contents Beyond syllabus

UNIT –VI: High Performance Computing Applications (07 Hours)


Scope of Parallel Computing, Parallel Search Algorithms: Depth First Search(DFS), Breadth First
Search( BFS), Parallel Sorting: Bubble and Merge, Distributed Computing: Document
classification, Frameworks – Kuberbets, GPU Applications, Parallel Computing for AI/ ML
PLAN ACTUAL
Reasons
Lect.
Date Topics Date Topics covered for
No.
Deviation
Scope of Parallel Computing,
36.
Parallel Search Algorithms
37. Depth First Search(DFS)

38. Breadth First Search( BFS)


39. Parallel Sorting: Bubble Sort

40. Parallel Sorting: Merge Sort


Distributed Computing:
41. Document classification,
Frameworks – Kuberbets
GPU Applications, Parallel
42.
Computing for AI/ ML
Make up Classes

Contents Beyond syllabus

Date: Name and Sign of Subject Teacher Head of Department

SUMMARY
No. of lectures allotted by university 42

Total no. of lectures conducted

Percentage of syllabus covered

Total no. of makeup classes 00

Date: Name and Sign of Subject Teacher Head of Department


Subject- High Performance Computing

BE Computer – SEM II

Course outcomes from SPPU syllabus


On successful completion of this course, the learner will be able to:

CO1 To understand different parallel programming models

CO2 To analyze the performance and modeling of parallel programs

CO3 To illustrate the various techniques to parallelize the algorithm

CO4 To implement parallel communication operations.

CO5 To discriminate CUDA Architecture and its components.

CO6 To Understand Scope of Parallel Computing and its search algorithms.

PO of Computer Engineering Department


PO1- Engineering knowledge: Apply the knowledge of mathematics, science, engineering fundamentals,
and an engineering specialization to the solution of complex engineering problems.
PO2- Problem analysis: Identify, formulate, review research literature, and analyze complex engineering
problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and
engineering sciences.
PO3-. Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for the
public health and safety, and the cultural, societal, and environmental considerations.
PO4- Conduct investigations of complex problems: Use research-based knowledge and research methods
including design of experiments, analysis and interpretation of data, and synthesis of the information to
provide valid conclusions.
PO5- Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities with an
understanding of the limitations.
PO6- The engineer and society: Apply reasoning informed by the contextual knowledge to assess societal,
health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional
engineering practice.
PO7- Environment and sustainability: Understand the impact of the professional engineering solutions in
societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.
PO8- Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of
the engineering practice.
SINHGAD TECHNICAL EDUCATION SOCIETY'
RMD SINHGAD SCHOOL OF ENGINEERING WARJE -58
DEPARTMENT OF COMPUTER ENGINEERING

Unit Test I 2023-24

A Set

Subject- High Performance Computing

Duration -Time: 1 Hour . Maximum Marks -30

Instructions- Solve any one question from Q.1 or Q.2 & Q.3 or Q.4

Marks CO PO
Q.1 a Write a note on : Bus- Based Networks, Crossbar Networks, Fully 6 CO3 PO1
Connected Network, Meshes and Tree Based Networks
b Explain Dichotomy of Parallel Computing Platforms 5 CO3 PO1

c Explain following architecture in detail, 4 CO3 PO1


i.N wide superscalar architecture.
ii. Multi-threaded architecture.
OR
Q.2 a Write about Scope of Parallelism. 6 CO3 PO1

b Explain Communication Model of Parallel Platform 5 CO3 PO1


c Write Short Note on Motivation for Parallelism. 4 CO3 PO1

Q.3 a Explain Principles of Parallel Algorithm Design 6 CO4 PO1


b Explain the Characteristics of Tasks and Interactions. 5 CO4 PO2
c Explain Decomposition, Tasks and Dependency Graphs. 4 CO4 PO2
OR
Q.4 a Explain Parallel Algorithm Models 6 CO4 PO1
b Explain the different methods for containing Interaction Overheads. 5 CO4 PO1
c Explain with suitable example for each, 4 CO4 PO1
1. Recursive Decomposition
2. Data Decomposition
SINHGAD TECHNICAL EDUCATION SOCIETY'
RMD SINHGAD SCHOOL OF ENGINEERING WARJE -58
DEPARTMENT OF COMPUTER ENGINEERING

Unit Test I 2023-24

B Set

Subject- High Performance Computing

Duration -Time: 1 Hour . Maximum Marks -30

Instructions- Solve any one question from Q.1 or Q.2 & Q.3 or Q.4

Marks CO PO
Q.1 a Write a note on : Bus- Based Networks, Crossbar Networks, Fully 6 CO3 PO3
Connected Network, Meshes and Tree Based Networks
b Differentiate between NUMA and UMA 5 CO3 PO1

c Explain multi-core processor architecture. 4 CO3 PO1


OR
Q.2 a Explain the Basic Working Principle of VLIW Processor. 6 CO3 PO1
b Explain Physical Organization of Parallel Platforms 5 CO3 PO1
c Explain N wide Superscalar Architecture 4 CO3 PO1

Q.3 a Explain recursive decomposition with suitable example 6 CO4 PO1


b Explain characteristics of task w.r.t. following, 5 CO4 PO2
Task Generation
Task Sizes
Size of Data associated with tasks
c Explain randomized block distribution and hierarchical mappings 4 CO4 PO2
OR
Q.4 a What are different Mapping techniques for Load Balancing 6 CO4 PO1
b Explain the different methods for containing Interaction Overheads. 5 CO4 PO1
c Explain Decomposition, Tasks and Dependency Graphs. 4 CO4 PO1
SINHGAD TECHNICAL EDUCATION SOCIETY'
RMD SINHGAD SCHOOL OF ENGINEERING WARJE -58
DEPARTMENT OF COMPUTER ENGINEERING

Unit Test II 2023-24

A Set

Subject- High Performance Computing

Duration -Time: 1 Hour. Maximum Marks -30

Instructions- Solve any one question from Q.1 or Q.2 & Q.3 or Q.4

Marks CO PO
Q.1 a Describe the following for one-to-all broadcast and all-to-one reduction 6 CO3 PO3
communication operations: Linear Array, Mesh, HyperCube.
b Explain Prefix sum operation for eight node hypercube. 5 CO3 PO1

c Explain the concept of Scatter and Gather. 4 CO3 PO1


OR
Q.2 a What is all to all broadcast communication operation? Explain all to all 6 CO3 PO1
broadcast on an eight node ring with step wise diagrams. (Show first two
steps and last communication step).
b Explain Circular shift operation on mesh and hypercube network 5 CO3 PO1
c Explain in detail Blocking and Non-Blocking Communication Using MPI 4 CO3 PO1

Q.3 a Explain various sources of overhead in parallel systems? 6 CO4 PO1


b Write a note on Minimum and cost optimal execution time 5 CO4 PO2

c Explain the effects of granularity on the performance of a parallel system. 4 CO4 PO2

OR
Q.4 a Explain Parallel Matrix-Matrix Multiplication algorithm with example 6 CO4 PO3
b Explain Different performance Matrix for parallel systems 5 CO4 PO1
c Explain “Scaling Down (downsizing)” a parallel system with example. 4 CO4 PO1
SINHGAD TECHNICAL EDUCATION SOCIETY'
RMD SINHGAD SCHOOL OF ENGINEERING WARJE -58
DEPARTMENT OF COMPUTER ENGINEERING

Unit Test II 2023-24

B Set

Subject- High Performance Computing

Duration -Time: 1 Hour . Maximum Marks -30

Instructions- Solve any one question from Q.1 or Q.2 & Q.3 or Q.4

Marks CO PO
Q.1 a Explain Broadcast and Reduce Operation with the help of diagram. 6 CO3 PO3
b Write a short note on All-to-one reduction with suitable example. 5 CO3 PO1

c Write a note on: Total Exchange on a Ring and Mesh. 4 CO3 PO1
OR
Q.2 a Explain all-reduce and prefix sum operations 6 CO3 PO1

b Explain all-to-all personalized communication and its applications 5 CO3 PO1


c How to improve speed of communication operations 4 CO3 PO1
OR
Q.3 a Explain the performance Metrics for Parallel Systems 6 CO4 PO1
b Explain Matrix-Vector Multiplication 5 CO4 PO2
c What is scalability Characteristics of parallel programs? 4 CO4 PO2
OR
Q.4 a What are the different partitioning techniques used in Matrix-vector 6 CO4 PO3
Multiplication?
b Explain Cannon’s Algorithm for Matrix - matrix multiplication in detail. . 5 CO4 PO3
c Explain sources of overhead in parallel systems. 4 CO4 PO3
SINHGAD TECHNICAL EDUCATION SOCIETY'
RMD SINHGAD SCHOOL OF ENGINEERING WARJE -58
DEPARTMENT OF COMPUTER ENGINEERING

Preliminary Examination 2023-2024

A Set

Subject- High Performance Computing

Duration -Time : 2½ Hours . Maximum Marks -70

Instructions- Solve any one question from Q.1 or Q.2 & Q.3 or Q.4 , Q.5 or Q.6,
Q.7 or Q.8.

Marks CO PO
Q.1 a Explain with diagram One-to-all broadcast on an eight-node ring with 7 CO3 PO3
recursive doubling technique. Node 0 is the source of the broadcast. Also
Explain all to one reduction with node 0 as destination
b Explain scatter and gather communication operation with diagram 6 CO3 PO1
c Explain circular shift operation? 4 CO3 PO1
OR
Q.2 a What is all to all broadcast communication operation? Explain all to all 7 CO3 PO2
broadcast on an eight node ring with step wise diagrams. (Show first two
steps and last communication step).

b Explain in detail Blocking and Non-Blocking Communication Using MPI 6 CO3 PO1
c Write a short note on prefix-sum operation. 4 CO3 PO1

Q.3 a Explain various sources of overhead in parallel systems? 7 CO4 PO1


b What is granularity? What are effects of granularity on performance of 6 CO4 PO2
parallel systems
c Explain “Scaling Down (downsizing)” a parallel system with example. 4 CO4 PO2
OR
Q.4 a Explain Parallel Matrix-Matrix Multiplication algorithm m with example 7 CO4 PO3
b Explain Different performance Matrix for parallel systems 6 CO4 PO1
c What are scalability Characteristics of parallel programs? 4 CO4 PO1
Q.5 a What is kernel in CUDA? What is kernel launch? Explain arguments that can 8 CO5 PO3
be specified in a kernel launch.
b Draw the diagram for CUDA memory hierarchy 6 CO5 PO1
c Describe processing flow of CUDA-C program with diagram 4 CO5 PO2
OR
Q.6 a What is CUDA? Explain different programming language support in CUDA. 8 CO5 PO2
Discuss any three applications of CUDA
b Justify Parallel Programming in CUDA-C 6 CO5 PO1
c Explain the following terms in CUDA: device, host, device code, 4 CO5 PO1
Kernel.

Q.7 a Explain different Communication Strategies in parallel Best-First Tree 8 CO6 PO2
Search.
b Explain odd even transportation in bubble sort using parallel formulation. 6 CO6 PO2
c Explain Parallel Depth First Search algorithm in detail? 4 CO6 PO1
OR
Q.8 a What are issues in sorting on parallel computers ? explain with appropriate 8 CO6 PO1
example
b What is kubernets? Explain its features and applications 6 CO6 PO2
c Write a short note on 4 CO6 PO2
1) Parallel Merge sort
2) GPU applications
SINHGAD TECHNICAL EDUCATION SOCIETY'
RMD SINHGAD SCHOOL OF ENGINEERING WARJE -58
DEPARTMENT OF COMPUTER ENGINEERING

Preliminary Examination 2023-24

B Set

Subject- High Performance Computing

Duration -Time : 2½ Hours . Maximum Marks -70

Instructions- Solve any one question from Q.1 or Q.2 & Q.3 or Q.4 & Q.5 or Q.6
& Q.7 or Q.8.

Marks CO PO
Q.1 a Briefly explain one to-all broadcast and all-to-one reduction on eight node 7 CO3 PO3
hypercube How to find the cost of communication for one to all broadcast on
eight node hypercube
b Explain different approaches of communication operations .What is total 6 CO3 PO1
Exchange Method?
c Explain Improving the speed of some communication operations 4 CO3 PO1
OR
Q.2 a With suitable diagram and example, explain All-to-All Broadcast and All-to- 7 CO3 PO2
All Reduction
b Explain scatter and gather communication operation with diagram 6 CO3 PO1
c Write a short note on prefix-sum operation. 4 CO3 PO1

Q.3 a Explain performance matrix of parallel systems 7 CO4 PO2


b What is granularity? What are effects of granularity on performance of 6 CO4 PO2
parallel systems

c Explain source of overhead in parallel systems. 4 CO4 PO1


OR
Q.4 a Explain Parallel Matrix-Matrix Multiplication algorithm m with example 7 CO4 PO1
b Write a note on Minimum and cost optimal execution time. 6 CO4 PO1
c What are scalability Characteristics of parallel programs? 4 CO4 PO1
Q.5 a What is CUDA ? Draw and explain CUDA architecture in details 8 CO5 PO1
b Justify Parallel Programming in CUDA-C 6 CO5 PO2
c What is kernel in CUDA ? What is kernel launch? 4 CO5 PO2
OR
Q.6 a Explain how CUDA-C program executes at kernel level with example 8 CO5 PO2
b Describe CUDA communication and synchronization along with CUDA C 6 CO5 PO1
function
c Write a short note on : Managing GUD memory 4 CO5 PO1

Q.7 a What are issues in sorting on parallel computers ? explain with appropriate 8 CO6 PO2
example.
b Explain the terms: 6 CO6 PO1
i. Bitonic Sequence
ii. Bitonic Sort
iii. Bitonic Merge
iv. Bitonic Split with example
c Explain Parallel Depth-First Search with example 4 CO6 PO2

Q.8 a Explain odd even transportation in bubble sort using parallel formulation. 8 CO6 PO2
Give one stepwise example solution using odd-even transportation
b What is kubernets? Explain its features and applications 6 CO6 PO1
c Indicate the sorting issues in parallel computers 4 CO6 PO1

You might also like