0% found this document useful (0 votes)
4 views

GPU Programming Slides 1

The document outlines a GPU Programming course (CSGG3018) taught by Amit Gurung, focusing on parallel programming with GPU architecture and APIs. It includes course objectives, outcomes, evaluation methods, and key concepts in concurrency and parallel programming. The course emphasizes the importance of concurrency in GPU programming and covers tools like CUDA and OpenCL for efficient computation.

Uploaded by

pillai.siddhart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

GPU Programming Slides 1

The document outlines a GPU Programming course (CSGG3018) taught by Amit Gurung, focusing on parallel programming with GPU architecture and APIs. It includes course objectives, outcomes, evaluation methods, and key concepts in concurrency and parallel programming. The course emphasizes the importance of concurrency in GPU programming and covers tools like CUDA and OpenCL for efficient computation.

Uploaded by

pillai.siddhart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

GPU Programming

Course Code: CSGG3018


Instructor: AMIT GURUNG
Email: [email protected]

Welcomes

12 programmes Ranked 501-600 in Ranked 801-850 in Ranked 46th in


A Grade accredited Rankings 2025 World Rankings 2025 Rankings 2024*
* University Category

Jan – May, 2025


1
Course overview

Course Code Course name L T P C


GPU Programming
3 0 0 3

Total Units to be Covered: 6 Total Contact Hours:34

Basic of Programming
Prerequisite(s): Basic knowledge of Computer Architecture
Course Objectives

• The objective of this course to


provide a deep knowledge of
parallel programming with GPU
architecture and APIs with its
practical applications.
Course Outcomes
On completion of this course, the students will be able to

CO1 Describe the GPU computer architecture, GPU


programming environments and Data Parallelism.
CO2 Explore the contents of Data parallel Execution
Model and CUDA Memories
CO3 Elaborate the data parallelism concepts in OpenCL
& OpenACC and compare OpenACC & CUDA
CO4 Illustrate the programs to solve problems and
execute on GPU.
Recommendations

Textbooks

1.IBM ICE Publications


Modes of Evaluation
Quiz/Assignment/ presentation/ extempore/ Written
Examination
• Examination Scheme
Components IA MID SEM End Sem Total
Weightage (%) 30 20 50 100

Internal Assessment
What is Concurrent
Programming?
Concurrent programming is the practice of
executing multiple tasks or processes
simultaneously.
• Key Concepts:
• Tasks can interact or run independently.
• Focuses on tasks appearing to run at the same time (not on how
hardware are used).
• Applications:
• Real-time systems (e.g., operating systems).
• Gaming and multimedia.
• Data processing.
Concurrency vs. Parallelism
Concurrency Parallelism

Tasks progress simultaneously. Tasks execute simultaneously.

May involve task switching. Requires multiple


(single core) processors/cores.

Example: Multi-threading. Example: GPU computations.


Why Concurrency is important?
1. Performance:
• Allows better utilization of resources.
2. Responsiveness:
• Keeps systems responsive during long-running tasks. (due to
switching)
3. Scalability:
• Handles large datasets and complex computations efficiently.
4. Modern Hardware:
• Exploits multi-core CPUs and GPUs for better performance.
Real-World Examples of Concurrency

• Web Servers:
• Handle multiple client requests simultaneously.
• Video Games:
• Render graphics, play audio, and handle user input
concurrently.
• Data Analytics:
• Process data streams in real time.
• Autonomous Systems:
• Sensor data processing and decision-making in
parallel.
Key Concepts in Concurrency
• Threads:
• Lightweight processes that run independently and can
share resources.
• Synchronization:
• Mechanisms to ensure threads access shared data safely.
• Example: Locks prevent simultaneous access (only single
thread).
• Example: Semaphores control thread access to resources
(multiple threads management).
Key Concepts in Concurrency
• Deadlocks:
• Happens when two or more tasks wait for each other indefinitely.
• Example: Task A locks Resource 1 and waits for Resource 2, while
Task B locks Resource 2 and waits for Resource 1.
• Prevention: Use a consistent order for locking resources or implement
timeout mechanisms.
• Race Conditions:
• Occurs when multiple threads modify shared data without proper
synchronization, leading to unpredictable outcomes.
• Example: Two threads incrementing a shared counter simultaneously.
• Solution: Use atomic operations or locks to synchronize access
Concurrency in GPU Programming
• GPUs are designed for:
• Massive parallelism.
• Executing thousands of threads
simultaneously.
• Why GPUs need concurrency:
• Efficiently handle compute-intensive tasks.
• Parallelize tasks like matrix operations, image
processing, etc.
Tools and Frameworks
1. CPU-based Concurrency:
POSIX Threads (Pthreads or Portable Operating
System Interface):
• Provides a standard API for creating and managing
threads.
• Offers flexibility but requires careful handling of
synchronization.
• OpenMP:
• Simplifies parallel programming with compiler
directives.
• Ideal for loop-level parallelism in shared-memory systems
Tools and Frameworks
2. GPU-based Concurrency:
• CUDA (Compute Unified Device Architecture):
• NVIDIA’s framework for parallel programming on GPUs.
• OpenCL (Open Computing Language):
• A portable framework for programming across GPUs and
other accelerators.
Summary on Concurrent Programming
• Concurrency enables multitasking and efficient use of
resources.
• Key concepts include threads, synchronization, deadlocks,
and race conditions.
• GPUs exploit concurrency to achieve massive parallelism.
• Understanding concurrency is essential for GPU
programming.
Parallel Programming
Parallel Programming
• Parallel programming allows multiple computations to run
simultaneously, improving speed and efficiency.
• Applications include scientific simulations, data analytics, and
machine learning.
• Key Concepts
• Task Parallelism: Dividing the problem into tasks processed
concurrently.
• Data Parallelism: Processing large datasets by distributing data
across cores.
• Synchronization: Managing dependencies between tasks.
Parallel programming workflow:
Eg. of master-worker model

Model - 1
Model - 2
Parallel Programming
Parallel Architectures
• Classifications of parallel programming models can
be divided broadly into two areas:
• Process interaction [Shared memory (e.g., multicore
CPUs) vs. Distributed memory (e.g., clusters)]
• Problem decomposition (data/task parallelism)
Parallel Programming
Process interaction
• Process interaction relates to the mechanisms by
which parallel processes are able to
communicate with each other.
• The most common forms of interaction are
shared memory and message passing, but
interaction can also be implicit (invisible to the
programmer).
Parallel Programming
Shared Memory
• Shared memory is an efficient means of passing data
between processes.
• Parallel processes share a global address space that
they read and write to asynchronously.
• Asynchronous concurrent access can lead to race
conditions
• Solution: mechanisms such as locks, semaphores and
monitors can be used to avoid these.
Parallel Programming
Message Passing
• In a message-passing model, parallel
processes exchange data through
passing messages to one another.
• It can be asynchronous, where a message
can be sent before the receiver is ready, or
synchronous, where the receiver must be
ready.
Parallel Programming
• Programming Models
• Thread-based (e.g., Pthreads, OpenMP).
• Message Passing Interface (MPI for distributed memory) is a
programming interface that allows for communication between
processes in a parallel computing environment.
• Data-Parallel (e.g., CUDA, OpenCL).
• Libraries: BLAS, TensorFlow (support for parallelism).
History of Graphics Processors

Assignment uploaded in LMS


Graphics Processing Units (GPUs)
• Architecture
• Specialized for data-parallel tasks like matrix
multiplication, making them ideal for graphics and
deep learning.
• Thousands of smaller cores compared to CPUs.
• Applications Beyond Graphics
• AI/ML training, high-performance computing (HPC),
cryptocurrency mining.
General-Purpose GPUs (GPGPUs)
• Definition
• GPUs used for tasks beyond graphics, such as
scientific computing and simulations.
• Programming GPGPUs
• CUDA (NVIDIA), OpenCL (not specific to any vendor).
• Libraries: cuDNN for deep learning, Thrust for
parallel programming.
Comparison: CPU vs GPU

CPU GPU
Simplified view of a GPU
Comparison: CPU vs GPU
• CPU Characteristics
• Few powerful cores.
• Optimized for sequential processing.
• Suitable for task-switching and latency-sensitive tasks.
• GPU Characteristics
• Thousands of smaller cores.
• Designed for parallelism and throughput.
• Best for tasks like image rendering, simulations, and neural
network training.
• Example Use Cases
• CPUs: Operating systems, databases.
• GPUs: Graphics, AI model training.
Heterogeneous Computing
• Definition
• Combining CPUs, GPUs, and other processors for
optimal workload distribution.
• Examples
• Hybrid systems like NVIDIA DGX or Intel Xeon with
integrated GPUs.
• Benefits include higher efficiency, cost-effectiveness, and
power savings.
• Nvidia DGX (Deep GPU Xceleration) represents a series of servers and workstations
designed by Nvidia, primarily geared towards enhancing deep learning applications
through the use of general-purpose computing on graphics processing units (GPGPU)
Programming GPUs using
CUDA/OpenCL/OpenACC
• CUDA
• Proprietary to NVIDIA GPUs.
• Features: Kernels, shared memory, warp-based parallelism.
• OpenCL
• Open standard supporting CPUs, GPUs, FPGAs.
• Portable but less optimized than CUDA.
• OpenACC
• High-level directives for parallelism.
• Ideal for researchers who need quick solutions without diving
into low-level programming.
References

https://en.wikipedia.org/

https://www.nvidia.com/content/PDF/nvidia-ampere-ga-
102-gpu-architecture-whitepaper-v2.pdf

You might also like