GPU_Assignment-3_Solution

Uploaded by

Cat

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

GPU_Assignment-3_Solution

Uploaded by

Cat

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

NPTEL Online Certification Courses

Indian Institute of Technology Kharagpur

GPU Architectures
and Programming
Assignment- Week 3

TYPE OF QUESTION: Objective

Number of questions: 10
Total mark: 10 X 1 = 10

QUESTION 1:
How are CUDA threads invoked to execute a kernel from the host?
Options:
A) Using a loop structure
B) With the <<<...>>> execution configuration syntax
C) By specifying thread IDs in the main function
D) Automatically by the GPU scheduler
Answer:
B) With the <<<...>>> execution configuration syntax
QUESTION 2:
What is the purpose of the threadIdx built-in variable in a CUDA kernel?
Options:
A) Provides a random number
B) Identifies the current CUDA block
C) Gives the total number of threads
D) Provides a unique identifier for each thread
Answer:
D) Provides a unique identifier for each thread
QUESTION 3:
Any function that is launched by the host and executed by a GPU kernel should be qualified by which
keyword?
Options:
A) __device__
B) __host__
C) __kernel__
D) __global__
Answer:
D) __global__
QUESTION 4:
What does the <<<1, N>>> syntax signify in the kernel invocation VecAdd<<<1, N>>>(A, B, C)?
Options:
A) 1 block of threads, N threads per block
B) N blocks of threads, 1 thread per block
C) N blocks with variable thread count
D) 1 thread per block, 1 block in total
Answer:
A) 1 block of threads, N threads per block

QUESTION 5:
Given a GPU with 10 streaming multiprocessors, each supporting a maximum of 1024 threads per SM, and a
CUDA kernel is launched with a block size of 128 threads, calculate the maximum number of active blocks
on the GPU.
Options:
A. 80
B. 100
C. 200
D. 1280
Answer:
A. 80
Detailed Solution:
Maximum active blocks per SM = Total threads per SM / Threads per block
Maximum active blocks on GPU = Maximum active blocks per SM * Number of SMs

QUESTION 6:
Calculate the execution time (in seconds) for a CUDA kernel that processes 8192 elements with a block size
of 128 threads and an average execution time of 2 milliseconds per block, considering that only one SM is
available on the target GPU for executing the blocks.
Options:
A. 0.512 seconds
B. 0.256 seconds
C. 1.024 seconds
D. 0.128 seconds
Answer:
D. 0.128 seconds
Detailed Solution:
Execution time is calculated as the product of the number of blocks and the average execution time per
block.
QUESTION 7:
Given a CUDA kernel with a grid size of 2 blocks and 256 threads per block, calculate the total number of
threads launched by the kernel.
Options:
A. 256
B. 512
C. 1024
D. 4096
Answer:
B. 512
Detailed Solution:
Total threads launched = Block size * Threads per block
QUESTION 8:
What is the CUDA function call required to copy an array h_A from the CPU memory to the GPU
memory, where it is known as d_A?
Options:
A. cudaMemcpy(h_A, d_A, size, cudaMemcpyHostToDevice);
B. cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);
C. cudaMemcpy(h_A, d_A, size, cudaMemcpyDeviceToHost);
D. cudaMemcpy(d_A, h_A, size, cudaMemcpyDeviceToHost);
Answer:
B. cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);

QUESTION 9:
Which of the following options is true regarding the matrix multiplication kernel in the code shown
below:

#where d_M and d_N are matrices and N is the row and column sizes
and d_P is the product matrix
__global__ void Matrix MulKernel ( float * d_M , float * d_N , float * d_P , int N ) {

int i = blockIdx.y * blockDim.y + threadIdx.y ;

int j = blockIdx.x * blockDim.x + threadIdx.x ;
if (( i < N ) && (j < N ) ) {
float Pvalue = 0.0;
for ( int k = 0; k < N ; ++k ) {
Pvalue += d_M [i*N + k]* d_N [k*N + j];
}
d_P [i*N+j] = Pvalue ;
}
}
Options:
A. The kernel iterates over each element of the output matrix (d_P) parallelly and calculates its
value using a nested loop that iterates over the corresponding row of the first matrix (d_M)
and the corresponding column of the second matrix (d_N) sequentially.
B. The kernel iterates over each element of the output matrix (d_P) sequentially and calculates
its value using a nested loop that iterates over the corresponding row of the first matrix
(d_M) and the corresponding column of the second matrix (d_N) parallelly.
C. The computation of individual elements in the product matrix d_P can be carried out
parallelly using threads along a different dimension than the ones used for the parallel
computation of the entire product matrix.
D. The computation of individual elements in the product matrix d_P can be carried out
parallelly using threads along one of the same dimensions as the ones used for the parallel
computation of the entire product matrix.
Answer:
B. The kernel iterates over each element of the output matrix (d_P) sequentially and calculates its
value using a nested loop that iterates over the corresponding row of the first matrix (d_M) and the
corresponding column of the second matrix (d_N) parallelly.
QUESTION 10:
Which of the following statements regarding CUDA memory allocation is false?
Options:
A. It is possible to allocate memory in a CUDA device kernel for an integer array.
B. It is possible to allocate memory in a CUDA device kernel by passing the pointer to an
integer array.
C. An array created inside a CUDA device kernel cannot be directly dereferenced in the host
side.
D. An array created inside a CUDA device kernel can be copied to another CUDA device
kernel by calling the function cudaMemcpy using the flag cudaMemcpyDeviceToDevice.
Answer: B. It is possible to allocate memory in a CUDA device kernel by passing the pointer to an
integer array.
Detailed Solution: It is not possible to allocate memory in a CUDA device kernel by passing the
pointer to an integer array, the pointer has to be typecast to void before passing.

Melomega v. Bieber
No ratings yet
Melomega v. Bieber
59 pages
Paper Pinball Laser Sisters Seb W 02
100% (1)
Paper Pinball Laser Sisters Seb W 02
1 page
ECE408 S19 ZJUI Exam1 Study Guide
No ratings yet
ECE408 S19 ZJUI Exam1 Study Guide
25 pages
Processors
No ratings yet
Processors
25 pages
Coursera Quiz Week1 Spring 2014 Heterogeneous Programming
100% (5)
Coursera Quiz Week1 Spring 2014 Heterogeneous Programming
4 pages
BCS3413 Principle & Applications of Parallel Programming Quiz 2: Gpgpu Cuda
No ratings yet
BCS3413 Principle & Applications of Parallel Programming Quiz 2: Gpgpu Cuda
3 pages
CUDA Putting It All Together
No ratings yet
CUDA Putting It All Together
39 pages
HPC Int2 Key
No ratings yet
HPC Int2 Key
10 pages
ECE408 2012 Practice Exam1
No ratings yet
ECE408 2012 Practice Exam1
10 pages
CUDA_part-2
No ratings yet
CUDA_part-2
49 pages
How To Optimize A CUDA Matmul Kernel For CuBLAS-like Performance - A Worklog
No ratings yet
How To Optimize A CUDA Matmul Kernel For CuBLAS-like Performance - A Worklog
23 pages
20 Quiz 14
No ratings yet
20 Quiz 14
12 pages
Threads
No ratings yet
Threads
54 pages
Matrix Mult
100% (1)
Matrix Mult
55 pages
cs239 Ejer1
No ratings yet
cs239 Ejer1
2 pages
Lab Report 6
No ratings yet
Lab Report 6
12 pages
217 Lec3
No ratings yet
217 Lec3
46 pages
D. Granularity
No ratings yet
D. Granularity
24 pages
Coursera Quiz Week2 Fall 2012
No ratings yet
Coursera Quiz Week2 Fall 2012
3 pages
BECOA157 Parallel Matrix Multiplication
No ratings yet
BECOA157 Parallel Matrix Multiplication
3 pages
cuda_mode_lecture2
No ratings yet
cuda_mode_lecture2
33 pages
Introduction To CUDA: CAP 4730 Spring 2012
No ratings yet
Introduction To CUDA: CAP 4730 Spring 2012
35 pages
CENG443_2023_Final
No ratings yet
CENG443_2023_Final
4 pages
CUDA Exercises
No ratings yet
CUDA Exercises
185 pages
Hpc file
No ratings yet
Hpc file
22 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
3-CUDA
No ratings yet
3-CUDA
5 pages
217 Lec2
No ratings yet
217 Lec2
24 pages
5-computation
No ratings yet
5-computation
13 pages
Lab 1 Parallel
No ratings yet
Lab 1 Parallel
4 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
121 pages
Lecture 4
No ratings yet
Lecture 4
48 pages
Lecture2 Cuda Basic 2010
No ratings yet
Lecture2 Cuda Basic 2010
44 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
8 Cud A 1
No ratings yet
8 Cud A 1
38 pages
250121_L5
No ratings yet
250121_L5
59 pages
CUDA Introduction
No ratings yet
CUDA Introduction
39 pages
3 Cuda
No ratings yet
3 Cuda
5 pages
GPU - Mid - Gradescope
No ratings yet
GPU - Mid - Gradescope
11 pages
Parallel Computing Lab4
No ratings yet
Parallel Computing Lab4
13 pages
Opencl Programming For The Cuda Architecture
No ratings yet
Opencl Programming For The Cuda Architecture
23 pages
CUDA_part-1
No ratings yet
CUDA_part-1
52 pages
GPU Programming: CUDA
No ratings yet
GPU Programming: CUDA
29 pages
HPC-Practical-4Addition of two large vectors
No ratings yet
HPC-Practical-4Addition of two large vectors
4 pages
HPC
No ratings yet
HPC
90 pages
2
No ratings yet
2
58 pages
Pgi Cuda Tutorial
No ratings yet
Pgi Cuda Tutorial
58 pages
HPC 4 B
No ratings yet
HPC 4 B
5 pages
AcceleratingAIAdvancements Pre Print Doube Blind
No ratings yet
AcceleratingAIAdvancements Pre Print Doube Blind
9 pages
Multithreaded Architectures: Memory and Data Locality
No ratings yet
Multithreaded Architectures: Memory and Data Locality
39 pages
002 - Introduction To CUDA Programming - 1
No ratings yet
002 - Introduction To CUDA Programming - 1
54 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
PDC assignment
No ratings yet
PDC assignment
9 pages
01 Cuda c Basics
No ratings yet
01 Cuda c Basics
32 pages
CUDA Programming: Johan Seland Johan - Seland@sintef - No
No ratings yet
CUDA Programming: Johan Seland Johan - Seland@sintef - No
76 pages
Lect11 12 Cuda Threads
No ratings yet
Lect11 12 Cuda Threads
25 pages
CUDA 2D Stencil Computations For The Jacobi Method: Jos e Mar Ia Cecilia, Jos e Manuel Garc Ia, and Manuel Ujald On
No ratings yet
CUDA 2D Stencil Computations For The Jacobi Method: Jos e Mar Ia Cecilia, Jos e Manuel Garc Ia, and Manuel Ujald On
4 pages
Benchmarking the cost of thread divergence in CUDA
No ratings yet
Benchmarking the cost of thread divergence in CUDA
8 pages
Chapter 3 Multidimensional Grids a 2023 Programming Massively Parallel Pro
No ratings yet
Chapter 3 Multidimensional Grids a 2023 Programming Massively Parallel Pro
22 pages
6-computation
No ratings yet
6-computation
11 pages
Class 10
No ratings yet
Class 10
13 pages
LPIC-1 Primer
From Everand
LPIC-1 Primer
John Greene
4.5/5 (3)
C1 U3 Vocabulary Revision
No ratings yet
C1 U3 Vocabulary Revision
2 pages
Axis Bank
No ratings yet
Axis Bank
31 pages
Draft Guidelines For Green Rating of HighwaysG-3 PDF
100% (1)
Draft Guidelines For Green Rating of HighwaysG-3 PDF
37 pages
Background PP
No ratings yet
Background PP
10 pages
Skittles Project Part 7
No ratings yet
Skittles Project Part 7
4 pages
LX EL N-Type GG BIF SW M144 580-600W 182 EN Unlocked
No ratings yet
LX EL N-Type GG BIF SW M144 580-600W 182 EN Unlocked
2 pages
SAP - Evolution of SAP: SAP at A Glance
No ratings yet
SAP - Evolution of SAP: SAP at A Glance
57 pages
Oocl
No ratings yet
Oocl
42 pages
12th Accountancy EM Quarterly Exam 2022 Original Question Paper Tenkasi District English Medium PDF Download
No ratings yet
12th Accountancy EM Quarterly Exam 2022 Original Question Paper Tenkasi District English Medium PDF Download
4 pages
The Development of 990 Gold-Titanium, and Its Production Use and Properties
No ratings yet
The Development of 990 Gold-Titanium, and Its Production Use and Properties
9 pages
Cell Structure and Cell Organisation
No ratings yet
Cell Structure and Cell Organisation
9 pages
IAT-I Question Paper with Solution of 18EGH18 Technical English 1 January-2021-Ms.Princy John
No ratings yet
IAT-I Question Paper with Solution of 18EGH18 Technical English 1 January-2021-Ms.Princy John
9 pages
Entertainment Industry
100% (14)
Entertainment Industry
12 pages
[FREE PDF sample] The Nature of Endangerment in India: Tigers, 'Tribes', Extermination & Conservation, 1818-2020 Ezra Rashkow ebooks
100% (1)
[FREE PDF sample] The Nature of Endangerment in India: Tigers, 'Tribes', Extermination & Conservation, 1818-2020 Ezra Rashkow ebooks
57 pages
Warband - Human - Ostermark Road Wardens
No ratings yet
Warband - Human - Ostermark Road Wardens
10 pages
Caught in Between Faith and Cash The Ottoman Land System of Crete, 1645-1670 in Memory of Pinelopi Stathi
No ratings yet
Caught in Between Faith and Cash The Ottoman Land System of Crete, 1645-1670 in Memory of Pinelopi Stathi
32 pages
Red Oak Layout: Magazine
100% (3)
Red Oak Layout: Magazine
33 pages
Carboxylic Acid and Amines Worksheet PDF
No ratings yet
Carboxylic Acid and Amines Worksheet PDF
22 pages
21109867-005 r000 HST T1 UserGuide
No ratings yet
21109867-005 r000 HST T1 UserGuide
340 pages
Q1 M2 Diss Answer Sheet With Performance Task
100% (1)
Q1 M2 Diss Answer Sheet With Performance Task
4 pages
Save These Instructions
No ratings yet
Save These Instructions
1 page
Comparision 0f Igbt Mosfet BJT
No ratings yet
Comparision 0f Igbt Mosfet BJT
2 pages
Skeletal System Sample Detailed Lesson Plan
100% (1)
Skeletal System Sample Detailed Lesson Plan
9 pages
Incident Investigation - Tripod by Syed Naman Shah (Pakistan Refinery Limited)
100% (2)
Incident Investigation - Tripod by Syed Naman Shah (Pakistan Refinery Limited)
11 pages
Manifold Flow in Pressure-Distribution Systems
No ratings yet
Manifold Flow in Pressure-Distribution Systems
7 pages
Patient RBC Phenotyping
No ratings yet
Patient RBC Phenotyping
25 pages
Intra Class Correlation Icc
No ratings yet
Intra Class Correlation Icc
23 pages
Electrical Conductivity of Electrolytes and Non - Intro en Abstract
No ratings yet
Electrical Conductivity of Electrolytes and Non - Intro en Abstract
3 pages

GPU_Assignment-3_Solution

Uploaded by

GPU_Assignment-3_Solution

Uploaded by

NPTEL Online Certification Courses

Indian Institute of Technology Kharagpur

TYPE OF QUESTION: Objective

int i = blockIdx.y * blockDim.y + threadIdx.y ;

You might also like