govind_6
govind_6
Practical-06
Aim: Write a simple CUDA program to print “Hello World!”
1. Thread
Description: The smallest unit of execution in CUDA.
2. Block
Description: A group of threads that execute together and share resources.
o Threads within a block can share shared memory and synchronize using syncthreads().
3. Grid
Description: A collection of blocks.
o Each block operates independently but can work on a portion of the data.
Memory Management
cudaMalloc() – Allocates memory on the GPU.
1. Host-to-Device Setup
Allocate memory on the GPU.
Transfer data from the CPU (host) to the GPU (device).
2. Kernel Launch
Write and launch a kernel (function that runs on the GPU).
Specify the number of threads and blocks for parallel execution.
3. Device-to-Host Cleanup
Copy results back from the GPU to the CPU.
Free GPU memory to avoid leaks.
Source code:
%%writefile govind_6.cu
#include<stdio.h>
#include<cuda.h>
#include<cuda_runtime_api.h>
global void ker()
{
printf("\n Hello World from GPU!");
}
int main(){
ker<<<1,1>>>();
cudaDeviceSynchronize();
}
Output:
{
printf("\n Hello World from GPU!thread%d of Block
Name: KISTIPATI GOVINDREDDY
Enrollment No: 2203031240607 3
Division: 6AI8
Faculty of Engineering & Technology
Subject Name: High Performance
ComputingSubject Code: 303105356
B-Tech-CSE (AI) 3rd Year 6th Semester
%d",threadIdx.x,blockIdx.x);
}
int main(){
ker<<<3,2>>>();
cudaDeviceSynchronize();
}
Output:
Conclusion:
This simple CUDA program demonstrates the parallel execution model of CUDA:
1. Parallelism:
o Multiple threads execute the kernel simultaneously, each printing a unique message.
2. Synchronization:
3. Scalability:
o The program structure scales naturally with more threads and blocks, showcasing
CUDA's hierarchical execution model.