GPU_Assignment-3_Solution
GPU_Assignment-3_Solution
GPU Architectures
and Programming
Assignment- Week 3
Number of questions: 10
Total mark: 10 X 1 = 10
QUESTION 1:
How are CUDA threads invoked to execute a kernel from the host?
Options:
A) Using a loop structure
B) With the <<<...>>> execution configuration syntax
C) By specifying thread IDs in the main function
D) Automatically by the GPU scheduler
Answer:
B) With the <<<...>>> execution configuration syntax
QUESTION 2:
What is the purpose of the threadIdx built-in variable in a CUDA kernel?
Options:
A) Provides a random number
B) Identifies the current CUDA block
C) Gives the total number of threads
D) Provides a unique identifier for each thread
Answer:
D) Provides a unique identifier for each thread
QUESTION 3:
Any function that is launched by the host and executed by a GPU kernel should be qualified by which
keyword?
Options:
A) __device__
B) __host__
C) __kernel__
D) __global__
Answer:
D) __global__
QUESTION 4:
What does the <<<1, N>>> syntax signify in the kernel invocation VecAdd<<<1, N>>>(A, B, C)?
Options:
A) 1 block of threads, N threads per block
B) N blocks of threads, 1 thread per block
C) N blocks with variable thread count
D) 1 thread per block, 1 block in total
Answer:
A) 1 block of threads, N threads per block
QUESTION 5:
Given a GPU with 10 streaming multiprocessors, each supporting a maximum of 1024 threads per SM, and a
CUDA kernel is launched with a block size of 128 threads, calculate the maximum number of active blocks
on the GPU.
Options:
A. 80
B. 100
C. 200
D. 1280
Answer:
A. 80
Detailed Solution:
Maximum active blocks per SM = Total threads per SM / Threads per block
Maximum active blocks on GPU = Maximum active blocks per SM * Number of SMs
QUESTION 6:
Calculate the execution time (in seconds) for a CUDA kernel that processes 8192 elements with a block size
of 128 threads and an average execution time of 2 milliseconds per block, considering that only one SM is
available on the target GPU for executing the blocks.
Options:
A. 0.512 seconds
B. 0.256 seconds
C. 1.024 seconds
D. 0.128 seconds
Answer:
D. 0.128 seconds
Detailed Solution:
Execution time is calculated as the product of the number of blocks and the average execution time per
block.
QUESTION 7:
Given a CUDA kernel with a grid size of 2 blocks and 256 threads per block, calculate the total number of
threads launched by the kernel.
Options:
A. 256
B. 512
C. 1024
D. 4096
Answer:
B. 512
Detailed Solution:
Total threads launched = Block size * Threads per block
QUESTION 8:
What is the CUDA function call required to copy an array h_A from the CPU memory to the GPU
memory, where it is known as d_A?
Options:
A. cudaMemcpy(h_A, d_A, size, cudaMemcpyHostToDevice);
B. cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);
C. cudaMemcpy(h_A, d_A, size, cudaMemcpyDeviceToHost);
D. cudaMemcpy(d_A, h_A, size, cudaMemcpyDeviceToHost);
Answer:
B. cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);
QUESTION 9:
Which of the following options is true regarding the matrix multiplication kernel in the code shown
below:
#where d_M and d_N are matrices and N is the row and column sizes
and d_P is the product matrix
__global__ void Matrix MulKernel ( float * d_M , float * d_N , float * d_P , int N ) {