0% found this document useful (0 votes)

4 views

GPU Programming Slides 1

The document outlines a GPU Programming course (CSGG3018) taught by Amit Gurung, focusing on parallel programming with GPU architecture and APIs. It includes course objectives, outcomes, evaluation methods, and key concepts in concurrency and parallel programming. The course emphasizes the importance of concurrency in GPU programming and covers tools like CUDA and OpenCL for efficient computation.

Uploaded by

pillai.siddhart

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

GPU Programming Slides 1

Uploaded by

pillai.siddhart

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

GPU Programming

Course Code: CSGG3018

Instructor: AMIT GURUNG
Email: [email protected]

Welcomes

12 programmes Ranked 501-600 in Ranked 801-850 in Ranked 46th in

A Grade accredited Rankings 2025 World Rankings 2025 Rankings 2024*
* University Category

Jan – May, 2025

1
Course overview

Course Code Course name L T P C

GPU Programming
3 0 0 3

Total Units to be Covered: 6 Total Contact Hours:34

Basic of Programming
Prerequisite(s): Basic knowledge of Computer Architecture
Course Objectives

• The objective of this course to

provide a deep knowledge of
parallel programming with GPU
architecture and APIs with its
practical applications.
Course Outcomes
On completion of this course, the students will be able to

CO1 Describe the GPU computer architecture, GPU

programming environments and Data Parallelism.
CO2 Explore the contents of Data parallel Execution
Model and CUDA Memories
CO3 Elaborate the data parallelism concepts in OpenCL
& OpenACC and compare OpenACC & CUDA
CO4 Illustrate the programs to solve problems and
execute on GPU.
Recommendations

Textbooks

1.IBM ICE Publications

Modes of Evaluation
Quiz/Assignment/ presentation/ extempore/ Written
Examination
• Examination Scheme
Components IA MID SEM End Sem Total
Weightage (%) 30 20 50 100

Internal Assessment
What is Concurrent
Programming?
Concurrent programming is the practice of
executing multiple tasks or processes
simultaneously.
• Key Concepts:
• Tasks can interact or run independently.
• Focuses on tasks appearing to run at the same time (not on how
hardware are used).
• Applications:
• Real-time systems (e.g., operating systems).
• Gaming and multimedia.
• Data processing.
Concurrency vs. Parallelism
Concurrency Parallelism

Tasks progress simultaneously. Tasks execute simultaneously.

May involve task switching. Requires multiple

(single core) processors/cores.

Example: Multi-threading. Example: GPU computations.

Why Concurrency is important?
1. Performance:
• Allows better utilization of resources.
2. Responsiveness:
• Keeps systems responsive during long-running tasks. (due to
switching)
3. Scalability:
• Handles large datasets and complex computations efficiently.
4. Modern Hardware:
• Exploits multi-core CPUs and GPUs for better performance.
Real-World Examples of Concurrency

• Web Servers:
• Handle multiple client requests simultaneously.
• Video Games:
• Render graphics, play audio, and handle user input
concurrently.
• Data Analytics:
• Process data streams in real time.
• Autonomous Systems:
• Sensor data processing and decision-making in
parallel.
Key Concepts in Concurrency
• Threads:
• Lightweight processes that run independently and can
share resources.
• Synchronization:
• Mechanisms to ensure threads access shared data safely.
• Example: Locks prevent simultaneous access (only single
thread).
• Example: Semaphores control thread access to resources
(multiple threads management).
Key Concepts in Concurrency
• Deadlocks:
• Happens when two or more tasks wait for each other indefinitely.
• Example: Task A locks Resource 1 and waits for Resource 2, while
Task B locks Resource 2 and waits for Resource 1.
• Prevention: Use a consistent order for locking resources or implement
timeout mechanisms.
• Race Conditions:
• Occurs when multiple threads modify shared data without proper
synchronization, leading to unpredictable outcomes.
• Example: Two threads incrementing a shared counter simultaneously.
• Solution: Use atomic operations or locks to synchronize access
Concurrency in GPU Programming
• GPUs are designed for:
• Massive parallelism.
• Executing thousands of threads
simultaneously.
• Why GPUs need concurrency:
• Efficiently handle compute-intensive tasks.
• Parallelize tasks like matrix operations, image
processing, etc.
Tools and Frameworks
1. CPU-based Concurrency:
POSIX Threads (Pthreads or Portable Operating
System Interface):
• Provides a standard API for creating and managing
threads.
• Offers flexibility but requires careful handling of
synchronization.
• OpenMP:
• Simplifies parallel programming with compiler
directives.
• Ideal for loop-level parallelism in shared-memory systems
Tools and Frameworks
2. GPU-based Concurrency:
• CUDA (Compute Unified Device Architecture):
• NVIDIA’s framework for parallel programming on GPUs.
• OpenCL (Open Computing Language):
• A portable framework for programming across GPUs and
other accelerators.
Summary on Concurrent Programming
• Concurrency enables multitasking and efficient use of
resources.
• Key concepts include threads, synchronization, deadlocks,
and race conditions.
• GPUs exploit concurrency to achieve massive parallelism.
• Understanding concurrency is essential for GPU
programming.
Parallel Programming
Parallel Programming
• Parallel programming allows multiple computations to run
simultaneously, improving speed and efficiency.
• Applications include scientific simulations, data analytics, and
machine learning.
• Key Concepts
• Task Parallelism: Dividing the problem into tasks processed
concurrently.
• Data Parallelism: Processing large datasets by distributing data
across cores.
• Synchronization: Managing dependencies between tasks.
Parallel programming workflow:
Eg. of master-worker model

Model - 1
Model - 2
Parallel Programming
Parallel Architectures
• Classifications of parallel programming models can
be divided broadly into two areas:
• Process interaction [Shared memory (e.g., multicore
CPUs) vs. Distributed memory (e.g., clusters)]
• Problem decomposition (data/task parallelism)
Parallel Programming
Process interaction
• Process interaction relates to the mechanisms by
which parallel processes are able to
communicate with each other.
• The most common forms of interaction are
shared memory and message passing, but
interaction can also be implicit (invisible to the
programmer).
Parallel Programming
Shared Memory
• Shared memory is an efficient means of passing data
between processes.
• Parallel processes share a global address space that
they read and write to asynchronously.
• Asynchronous concurrent access can lead to race
conditions
• Solution: mechanisms such as locks, semaphores and
monitors can be used to avoid these.
Parallel Programming
Message Passing
• In a message-passing model, parallel
processes exchange data through
passing messages to one another.
• It can be asynchronous, where a message
can be sent before the receiver is ready, or
synchronous, where the receiver must be
ready.
Parallel Programming
• Programming Models
• Thread-based (e.g., Pthreads, OpenMP).
• Message Passing Interface (MPI for distributed memory) is a
programming interface that allows for communication between
processes in a parallel computing environment.
• Data-Parallel (e.g., CUDA, OpenCL).
• Libraries: BLAS, TensorFlow (support for parallelism).
History of Graphics Processors

Assignment uploaded in LMS

Graphics Processing Units (GPUs)
• Architecture
• Specialized for data-parallel tasks like matrix
multiplication, making them ideal for graphics and
deep learning.
• Thousands of smaller cores compared to CPUs.
• Applications Beyond Graphics
• AI/ML training, high-performance computing (HPC),
cryptocurrency mining.
General-Purpose GPUs (GPGPUs)
• Definition
• GPUs used for tasks beyond graphics, such as
scientific computing and simulations.
• Programming GPGPUs
• CUDA (NVIDIA), OpenCL (not specific to any vendor).
• Libraries: cuDNN for deep learning, Thrust for
parallel programming.
Comparison: CPU vs GPU

CPU GPU
Simplified view of a GPU
Comparison: CPU vs GPU
• CPU Characteristics
• Few powerful cores.
• Optimized for sequential processing.
• Suitable for task-switching and latency-sensitive tasks.
• GPU Characteristics
• Thousands of smaller cores.
• Designed for parallelism and throughput.
• Best for tasks like image rendering, simulations, and neural
network training.
• Example Use Cases
• CPUs: Operating systems, databases.
• GPUs: Graphics, AI model training.
Heterogeneous Computing
• Definition
• Combining CPUs, GPUs, and other processors for
optimal workload distribution.
• Examples
• Hybrid systems like NVIDIA DGX or Intel Xeon with
integrated GPUs.
• Benefits include higher efficiency, cost-effectiveness, and
power savings.
• Nvidia DGX (Deep GPU Xceleration) represents a series of servers and workstations
designed by Nvidia, primarily geared towards enhancing deep learning applications
through the use of general-purpose computing on graphics processing units (GPGPU)
Programming GPUs using
CUDA/OpenCL/OpenACC
• CUDA
• Proprietary to NVIDIA GPUs.
• Features: Kernels, shared memory, warp-based parallelism.
• OpenCL
• Open standard supporting CPUs, GPUs, FPGAs.
• Portable but less optimized than CUDA.
• OpenACC
• High-level directives for parallelism.
• Ideal for researchers who need quick solutions without diving
into low-level programming.
References

https://en.wikipedia.org/

https://www.nvidia.com/content/PDF/nvidia-ampere-ga-
102-gpu-architecture-whitepaper-v2.pdf


An Introduction to Parallel Programming 2. Edition Pacheco - eBook PDFinstant download
100% (4)
An Introduction to Parallel Programming 2. Edition Pacheco - eBook PDFinstant download
30 pages
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
InfoWorks ICM Overview 60 Mins PDF
0% (2)
InfoWorks ICM Overview 60 Mins PDF
31 pages
2023 CSC14120 Lecture00 CourseIntroduction
No ratings yet
2023 CSC14120 Lecture00 CourseIntroduction
30 pages
IntroGPUs
No ratings yet
IntroGPUs
36 pages
PP Cuda Unit1 1
No ratings yet
PP Cuda Unit1 1
77 pages
Cuda
No ratings yet
Cuda
69 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
cs179_2024_lec01
No ratings yet
cs179_2024_lec01
26 pages
GPU_Architecture_and_Programming_Lecture
No ratings yet
GPU_Architecture_and_Programming_Lecture
9 pages
UNIT-4
No ratings yet
UNIT-4
48 pages
Parallel Programming Module 4
No ratings yet
Parallel Programming Module 4
93 pages
Owens
No ratings yet
Owens
67 pages
CPU Parallelism & GPU
No ratings yet
CPU Parallelism & GPU
12 pages
chapter-8
No ratings yet
chapter-8
58 pages
Syllabus
No ratings yet
Syllabus
2 pages
cs179 2017 Lec01
No ratings yet
cs179 2017 Lec01
24 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
Programming For Graphics Processing Units (Gpus) : Parallel
No ratings yet
Programming For Graphics Processing Units (Gpus) : Parallel
35 pages
2
No ratings yet
2
58 pages
06 Intro Gpus
No ratings yet
06 Intro Gpus
33 pages
Parralel 01
No ratings yet
Parralel 01
38 pages
cuuda nvidai guide_Part1
No ratings yet
cuuda nvidai guide_Part1
15 pages
Seminar Igor Kamzic COSC3P93
No ratings yet
Seminar Igor Kamzic COSC3P93
58 pages
Barnett Haskins
No ratings yet
Barnett Haskins
29 pages
GPUParallelProgramming PDF
No ratings yet
GPUParallelProgramming PDF
104 pages
Unit 5'
No ratings yet
Unit 5'
33 pages
CSE5006 Multicore-Architectures ETH 1 AC41
No ratings yet
CSE5006 Multicore-Architectures ETH 1 AC41
9 pages
Day1 1
No ratings yet
Day1 1
25 pages
001__DDS-IIIT-Jan-10th
No ratings yet
001__DDS-IIIT-Jan-10th
34 pages
217 Lec1
No ratings yet
217 Lec1
35 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
Lecture 17-Introduction to GPU
No ratings yet
Lecture 17-Introduction to GPU
36 pages
GPUProgramming Talk
No ratings yet
GPUProgramming Talk
18 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
Ppar2017 Gpu 1
No ratings yet
Ppar2017 Gpu 1
61 pages
CUDA
No ratings yet
CUDA
46 pages
An Approach To Parallel Processing: Yashraj Rai Puja Padiya
No ratings yet
An Approach To Parallel Processing: Yashraj Rai Puja Padiya
3 pages
Parallel
No ratings yet
Parallel
4 pages
GPU Architecture and Programming
No ratings yet
GPU Architecture and Programming
3 pages
Parallel ProgrammingSyllabus
No ratings yet
Parallel ProgrammingSyllabus
2 pages
GPU Architecture and Programming
No ratings yet
GPU Architecture and Programming
2 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
DS1822 - Parallel Computing-unit3
No ratings yet
DS1822 - Parallel Computing-unit3
6 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
25-04 Gpu Programming Without Cuda
No ratings yet
25-04 Gpu Programming Without Cuda
38 pages
PostgreSQL OpenCL Procedural Language
No ratings yet
PostgreSQL OpenCL Procedural Language
29 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
QUESTION BANK HPC
No ratings yet
QUESTION BANK HPC
4 pages
Overview of Parallel Computing: Shawn T. Brown
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
46 pages
course-7
No ratings yet
course-7
21 pages
upcrc_opencl_lec1
No ratings yet
upcrc_opencl_lec1
38 pages
14 Parallel Algorithms CUDA Basics s20
No ratings yet
14 Parallel Algorithms CUDA Basics s20
89 pages
1
No ratings yet
1
44 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
Ilovepdf Merged (4)
No ratings yet
Ilovepdf Merged (4)
47 pages
Lecture2 GPU Architecture_2025
No ratings yet
Lecture2 GPU Architecture_2025
46 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
chapter 2
No ratings yet
chapter 2
21 pages
Cloud Computing - Unit-1
No ratings yet
Cloud Computing - Unit-1
54 pages
Simegnew Yihunie
No ratings yet
Simegnew Yihunie
80 pages
DEA-41T1 - Exam Topics1 002
No ratings yet
DEA-41T1 - Exam Topics1 002
24 pages
Verification of Producer-Consumer Synchronization in GPU Programs
No ratings yet
Verification of Producer-Consumer Synchronization in GPU Programs
11 pages
Main GPU
No ratings yet
Main GPU
87 pages
D3-04-METASAT by Leonidas Kosmidis
No ratings yet
D3-04-METASAT by Leonidas Kosmidis
31 pages
GPGPU FFT Ocean Simulation
No ratings yet
GPGPU FFT Ocean Simulation
46 pages
Yagawa-Oishi2021 Book ComputationalMechanicsWithNeur
No ratings yet
Yagawa-Oishi2021 Book ComputationalMechanicsWithNeur
233 pages
Amd Fusion: Mithun.M
No ratings yet
Amd Fusion: Mithun.M
8 pages
Eberly, David H GPGPU Programming For Games and Science
100% (1)
Eberly, David H GPGPU Programming For Games and Science
464 pages
High Speed Cipher Cracking: The Case of Keeloq On CUDA: Rypto Eeloq Eeloq
No ratings yet
High Speed Cipher Cracking: The Case of Keeloq On CUDA: Rypto Eeloq Eeloq
12 pages
GCC Unit - 1 Notes
No ratings yet
GCC Unit - 1 Notes
32 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
Tutorial No 3
No ratings yet
Tutorial No 3
2 pages
Instant Access to (Ebook) Image-Based Visualization by Christophe Hurter ISBN 9781627057585, 9781627058384, 1627057587, 1627058389 ebook Full Chapters
100% (5)
Instant Access to (Ebook) Image-Based Visualization by Christophe Hurter ISBN 9781627057585, 9781627058384, 1627057587, 1627058389 ebook Full Chapters
81 pages
Chap6 Heter Computing
No ratings yet
Chap6 Heter Computing
22 pages
Aman Seminar Report
No ratings yet
Aman Seminar Report
52 pages
Orientation Computing Mcqs
No ratings yet
Orientation Computing Mcqs
41 pages
Haskell Arrays Accelerated With GPUs
100% (1)
Haskell Arrays Accelerated With GPUs
47 pages
CUDA Based Minimum Spanning Tree
No ratings yet
CUDA Based Minimum Spanning Tree
8 pages
CLOUD COMPUTING_1
No ratings yet
CLOUD COMPUTING_1
41 pages
Gpu Computing Gems Jade PDF
No ratings yet
Gpu Computing Gems Jade PDF
3 pages
Computation 08 00004 PDF
No ratings yet
Computation 08 00004 PDF
24 pages
The Artificial Intelligence Renaissance Deep Learning and The Road To Human Level Machine Intelligence
No ratings yet
The Artificial Intelligence Renaissance Deep Learning and The Road To Human Level Machine Intelligence
19 pages
Dell 7910 PDF
No ratings yet
Dell 7910 PDF
2 pages
Performance Evaluation of Fast Smith-Waterman Algorithm For Sequence Database Searches Using CUDA GPU-Based Parallel Computing
No ratings yet
Performance Evaluation of Fast Smith-Waterman Algorithm For Sequence Database Searches Using CUDA GPU-Based Parallel Computing
9 pages

GPU Programming Slides 1

Uploaded by

GPU Programming Slides 1

Uploaded by

GPU Programming

Course Code: CSGG3018

12 programmes Ranked 501-600 in Ranked 801-850 in Ranked 46th in

Jan – May, 2025

Course Code Course name L T P C

Total Units to be Covered: 6 Total Contact Hours:34

• The objective of this course to

CO1 Describe the GPU computer architecture, GPU

1.IBM ICE Publications

Tasks progress simultaneously. Tasks execute simultaneously.

May involve task switching. Requires multiple

Example: Multi-threading. Example: GPU computations.

Assignment uploaded in LMS

You might also like