GPU Programming Slides 1
GPU Programming Slides 1
Welcomes
Basic of Programming
Prerequisite(s): Basic knowledge of Computer Architecture
Course Objectives
Textbooks
Internal Assessment
What is Concurrent
Programming?
Concurrent programming is the practice of
executing multiple tasks or processes
simultaneously.
• Key Concepts:
• Tasks can interact or run independently.
• Focuses on tasks appearing to run at the same time (not on how
hardware are used).
• Applications:
• Real-time systems (e.g., operating systems).
• Gaming and multimedia.
• Data processing.
Concurrency vs. Parallelism
Concurrency Parallelism
• Web Servers:
• Handle multiple client requests simultaneously.
• Video Games:
• Render graphics, play audio, and handle user input
concurrently.
• Data Analytics:
• Process data streams in real time.
• Autonomous Systems:
• Sensor data processing and decision-making in
parallel.
Key Concepts in Concurrency
• Threads:
• Lightweight processes that run independently and can
share resources.
• Synchronization:
• Mechanisms to ensure threads access shared data safely.
• Example: Locks prevent simultaneous access (only single
thread).
• Example: Semaphores control thread access to resources
(multiple threads management).
Key Concepts in Concurrency
• Deadlocks:
• Happens when two or more tasks wait for each other indefinitely.
• Example: Task A locks Resource 1 and waits for Resource 2, while
Task B locks Resource 2 and waits for Resource 1.
• Prevention: Use a consistent order for locking resources or implement
timeout mechanisms.
• Race Conditions:
• Occurs when multiple threads modify shared data without proper
synchronization, leading to unpredictable outcomes.
• Example: Two threads incrementing a shared counter simultaneously.
• Solution: Use atomic operations or locks to synchronize access
Concurrency in GPU Programming
• GPUs are designed for:
• Massive parallelism.
• Executing thousands of threads
simultaneously.
• Why GPUs need concurrency:
• Efficiently handle compute-intensive tasks.
• Parallelize tasks like matrix operations, image
processing, etc.
Tools and Frameworks
1. CPU-based Concurrency:
POSIX Threads (Pthreads or Portable Operating
System Interface):
• Provides a standard API for creating and managing
threads.
• Offers flexibility but requires careful handling of
synchronization.
• OpenMP:
• Simplifies parallel programming with compiler
directives.
• Ideal for loop-level parallelism in shared-memory systems
Tools and Frameworks
2. GPU-based Concurrency:
• CUDA (Compute Unified Device Architecture):
• NVIDIA’s framework for parallel programming on GPUs.
• OpenCL (Open Computing Language):
• A portable framework for programming across GPUs and
other accelerators.
Summary on Concurrent Programming
• Concurrency enables multitasking and efficient use of
resources.
• Key concepts include threads, synchronization, deadlocks,
and race conditions.
• GPUs exploit concurrency to achieve massive parallelism.
• Understanding concurrency is essential for GPU
programming.
Parallel Programming
Parallel Programming
• Parallel programming allows multiple computations to run
simultaneously, improving speed and efficiency.
• Applications include scientific simulations, data analytics, and
machine learning.
• Key Concepts
• Task Parallelism: Dividing the problem into tasks processed
concurrently.
• Data Parallelism: Processing large datasets by distributing data
across cores.
• Synchronization: Managing dependencies between tasks.
Parallel programming workflow:
Eg. of master-worker model
Model - 1
Model - 2
Parallel Programming
Parallel Architectures
• Classifications of parallel programming models can
be divided broadly into two areas:
• Process interaction [Shared memory (e.g., multicore
CPUs) vs. Distributed memory (e.g., clusters)]
• Problem decomposition (data/task parallelism)
Parallel Programming
Process interaction
• Process interaction relates to the mechanisms by
which parallel processes are able to
communicate with each other.
• The most common forms of interaction are
shared memory and message passing, but
interaction can also be implicit (invisible to the
programmer).
Parallel Programming
Shared Memory
• Shared memory is an efficient means of passing data
between processes.
• Parallel processes share a global address space that
they read and write to asynchronously.
• Asynchronous concurrent access can lead to race
conditions
• Solution: mechanisms such as locks, semaphores and
monitors can be used to avoid these.
Parallel Programming
Message Passing
• In a message-passing model, parallel
processes exchange data through
passing messages to one another.
• It can be asynchronous, where a message
can be sent before the receiver is ready, or
synchronous, where the receiver must be
ready.
Parallel Programming
• Programming Models
• Thread-based (e.g., Pthreads, OpenMP).
• Message Passing Interface (MPI for distributed memory) is a
programming interface that allows for communication between
processes in a parallel computing environment.
• Data-Parallel (e.g., CUDA, OpenCL).
• Libraries: BLAS, TensorFlow (support for parallelism).
History of Graphics Processors
CPU GPU
Simplified view of a GPU
Comparison: CPU vs GPU
• CPU Characteristics
• Few powerful cores.
• Optimized for sequential processing.
• Suitable for task-switching and latency-sensitive tasks.
• GPU Characteristics
• Thousands of smaller cores.
• Designed for parallelism and throughput.
• Best for tasks like image rendering, simulations, and neural
network training.
• Example Use Cases
• CPUs: Operating systems, databases.
• GPUs: Graphics, AI model training.
Heterogeneous Computing
• Definition
• Combining CPUs, GPUs, and other processors for
optimal workload distribution.
• Examples
• Hybrid systems like NVIDIA DGX or Intel Xeon with
integrated GPUs.
• Benefits include higher efficiency, cost-effectiveness, and
power savings.
• Nvidia DGX (Deep GPU Xceleration) represents a series of servers and workstations
designed by Nvidia, primarily geared towards enhancing deep learning applications
through the use of general-purpose computing on graphics processing units (GPGPU)
Programming GPUs using
CUDA/OpenCL/OpenACC
• CUDA
• Proprietary to NVIDIA GPUs.
• Features: Kernels, shared memory, warp-based parallelism.
• OpenCL
• Open standard supporting CPUs, GPUs, FPGAs.
• Portable but less optimized than CUDA.
• OpenACC
• High-level directives for parallelism.
• Ideal for researchers who need quick solutions without diving
into low-level programming.
References
https://en.wikipedia.org/
https://www.nvidia.com/content/PDF/nvidia-ampere-ga-
102-gpu-architecture-whitepaper-v2.pdf