Parallel Programming Models For Real-Time Graphics: Aaron Lefohn
Parallel Programming Models For Real-Time Graphics: Aaron Lefohn
Aaron Lefohn
Intel Corporation
Hardware Resources
Core Execution Context SIMD functional units On-chip memory
CPU-GPU System-on-a-Chip
Abstraction
Abstraction enables portability and system optimization
E.g., dynamic load balancing, SIMD utilization, producer-consumer
Execution Definitions
Execution context
The state required to execute an instruction stream: instruction pointer, registers, etc
(aka thread)
Work
A logically related set of instructions executed in a single execution context
(aka shader, instance of a kernel, task)
Concurrent execution
Multiple units of work that may execute simultaneously
(because they are logically independent)
Parallel execution
Multiple units of work whose execution contexts are guaranteed to be live simultaneously
(because you want them to be for locality, synchronization, etc)
Beyond Programmable Shading Course, ACM SIGGRAPH 2011
Synchronization
Synchronization between execution contexts
Enables inter-context communication Restricts when work is permitted to execute
What is abstracted?
Cores, execution contexts, SIMD functional units, memory
hierarchy
What is abstracted?
Nothing (ignoring preemption)
CPU
Launch a pthread per hardware execution context
GPU
Persistent threads
What is abstracted?
Mechanisms
CPU
Intel SPMD Program Compiler (ISPC) SPMD combined with other abstractions
OpenCL (some implementations) Intel Array Building Blocks
Beyond Programmable Shading Course, ACM SIGGRAPH 2011
Execution
Concurrent execution of many (likely different) units of work Work runs in a single execution context
What is abstracted?
Cores and execution contexts Not abstracted: SIMD functional units or memory hierarchy Between tasks
}
void main() { for( i = 0 to NumTasks - 1 ) { spawn myTask();
}
sync; // More work
}
Beyond Programmable Shading Course, ACM SIGGRAPH 2011
More code
Beyond Programmable Shading Course, ACM SIGGRAPH 2011
What is abstracted?
Summary of Concepts
Abstraction
When a parallel programming model abstracts a HW resource,
code written in that programming model scales across architectures with varying amounts of that resource
Concurrency provides scalability and portability Parallel execution permits explicit communication and capturing
Synchronization
Conclusions
Current real-time rendering programming uses a mix of
data-, task-, and pipeline-parallel programming (and conventional threads as means to an end)
Acknowledgements
Tim Foley and Matt Pharr at Intel Mike Houston at AMD Kayvon Fatahalian at CMU The Advanced Rendering Technology research team, Pete Baker, Aaron Coday, and Elliot Garbus at Intel
References
GPU-inspired compute languages DX11 DirectCompute, OpenCL (CPU+GPU+), CUDA The Fusion APU Architecture: A Programmers Perspective (Ben Gaster) http://developer.amd.com/afds/assets/presentations/2901_final.pdf Task systems (CPU and CPU+GPU+) Cilk, Thread Building Blocks (TBB), Grand Central Dispatch (GCD), ConcRT, Task Parallel Library Conventional CPU thread programming Pthreads GPU task systems and persistent threads (i.e., conventional thread programming on GPU) Aila et al, Understanding the Efficiency of Ray Traversal on GPUs, High Performance Graphics 2009 Tzeng et al, Task Management for Irregular-Parallel Workloads on the GPU, High Performance Graphics 2010 Parker et al, OptiX: A General Purpose Ray Tracing Engine, SIGGRAPH 2010 Additional input (concepts, terminology, patterns, etc) Foley, Parallel Programming for Graphics,
Beyond Programmable Shading SIGGRAPH 2009 Beyond Programmable Shading CS448s Stanford course Fatahalian, Running Code at a Teraflop: How a GPU Shader Core Works, Beyond Programmable Shading SIGGRAPH 2009-2010 Keutzer et al, A Design Pattern Language for Engineering (Parallel) Software: Merging the PLPP and OPL projects, ParaPLoP 2010