0% found this document useful (0 votes)
17 views4 pages

govind_6

This document outlines a practical assignment for a CUDA programming course, focusing on writing a simple 'Hello World!' program. It explains the CUDA programming model, including key concepts like threads, blocks, grids, and memory hierarchy, along with common functions and execution steps. The document also includes example code demonstrating the execution of CUDA kernels and emphasizes parallelism, synchronization, and scalability in CUDA programming.

Uploaded by

reddygovind4550
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views4 pages

govind_6

This document outlines a practical assignment for a CUDA programming course, focusing on writing a simple 'Hello World!' program. It explains the CUDA programming model, including key concepts like threads, blocks, grids, and memory hierarchy, along with common functions and execution steps. The document also includes example code demonstrating the execution of CUDA kernels and emphasizes parallelism, synchronization, and scalability in CUDA programming.

Uploaded by

reddygovind4550
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Faculty of Engineering & Technology

Subject Name: High Performance


ComputingSubject Code: 303105356
B-Tech-CSE (AI) 3rd Year 6th Semester

Practical-06
Aim: Write a simple CUDA program to print “Hello World!”

What is CUDA Programming?


CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming
model developed by NVIDIA. It enables developers to harness the power of NVIDIA GPUs for
general-purpose computing (GPGPU) tasks. CUDA provides a set of extensions to the C/C++
programming languages and other APIs like Python, Fortran, and more, enabling developers to
write programs that execute efficiently on GPUs.

GPU programming layers.

1. Thread
 Description: The smallest unit of execution in CUDA.

o Each thread performs a portion of the computation.

o Threads have their own registers and access to local memory.

o A thread is identified by its threadIdx.

2. Block
 Description: A group of threads that execute together and share resources.

o Threads within a block can share shared memory and synchronize using syncthreads().

o A block is identified by its blockIdx.

o The number of threads in a block is defined by blockDim.

3. Grid
 Description: A collection of blocks.

o Each block operates independently but can work on a portion of the data.

o A grid is identified by its gridDim (number of blocks in the grid).

Name: KISTIPATI GOVINDREDDY


Enrollment No: 2203031240607 1
Division: 6AI8
Faculty of Engineering & Technology
Subject Name: High Performance
ComputingSubject Code: 303105356
B-Tech-CSE (AI) 3rd Year 6th Semester
4. Data/Memory Hierarchy
 Description: Different memory types used to manage data efficiently.

o Registers: Fastest memory, private to each thread.

o Shared Memory: Shared among threads within a block (low latency).

o Global Memory: Accessible to all threads but has high latency.

Common CUDA Functions:

Memory Management
 cudaMalloc() – Allocates memory on the GPU.

 cudaMemcpy() – Copies data between host (CPU) and device (GPU).

 cudaFree() – Frees GPU memory.

 cudaMemset() – Initializes or resets memory on the GPU.

Kernel Launch and Execution


 <<<...>>> – Syntax for launching CUDA kernels.

 global – Specifies a GPU kernel function.

 device – Indicates a GPU function callable from other device functions.

 host – Marks a CPU function.

Steps to Cuda program Execution.

1. Host-to-Device Setup
 Allocate memory on the GPU.
 Transfer data from the CPU (host) to the GPU (device).
2. Kernel Launch
 Write and launch a kernel (function that runs on the GPU).
 Specify the number of threads and blocks for parallel execution.

3. Device-to-Host Cleanup
 Copy results back from the GPU to the CPU.
 Free GPU memory to avoid leaks.

Name: KISTIPATI GOVINDREDDY


Enrollment No: 2203031240607 2
Division: 6AI8
Faculty of Engineering & Technology
Subject Name: High Performance
ComputingSubject Code: 303105356
B-Tech-CSE (AI) 3rd Year 6th Semester

1) Simple Cuda program Hello world!

Source code:
%%writefile govind_6.cu
#include<stdio.h>
#include<cuda.h>
#include<cuda_runtime_api.h>
global void ker()
{
printf("\n Hello World from GPU!");
}
int main(){
ker<<<1,1>>>();
cudaDeviceSynchronize();
}

Output:

2) Simple cuda program with multiple blocks and threads


Source code:
%%writefile govind_6.cu
#include<stdio.h>
#include<cuda.h>
#include<cuda_runtime_api.h>
global void ker()

{
printf("\n Hello World from GPU!thread%d of Block
Name: KISTIPATI GOVINDREDDY
Enrollment No: 2203031240607 3
Division: 6AI8
Faculty of Engineering & Technology
Subject Name: High Performance
ComputingSubject Code: 303105356
B-Tech-CSE (AI) 3rd Year 6th Semester

%d",threadIdx.x,blockIdx.x);
}
int main(){
ker<<<3,2>>>();
cudaDeviceSynchronize();
}
Output:

Conclusion:
This simple CUDA program demonstrates the parallel execution model of CUDA:

1. Parallelism:

o Multiple threads execute the kernel simultaneously, each printing a unique message.

2. Synchronization:

o Proper synchronization (cudaDeviceSynchronize()) ensures predictable behavior.

3. Scalability:

o The program structure scales naturally with more threads and blocks, showcasing
CUDA's hierarchical execution model.

Name: KISTIPATI GOVINDREDDY


Enrollment No: 2203031240607 4
Division: 6AI8

You might also like