Basic Parallel Programming Methods
Basic Parallel Programming Methods
2
Parallel Programming for Speed-up
Sharpen single core Demonstrations
3
Scaling and Bottlenecks
Compiler Optimization 1 - Simple and Effective: turn on compiler optimization ~ 3x
– Turn on higher levels of optimization
– Level 3 optimization: –03 for gcc or g++
– Highest is -04, but requires feedback optimization
2 - Simple and Sometimes Effective: turn on NEON SIMD ~ 1.f x
SIMD Vector Instructions
– Turn on SIMD (NEON) instruction generation on ARM A-Series
– Flynn’s taxonomy
Using Multiple Cores 3 - Harder and Mostly Effective: Grid to Map and Reduce ~ 3.2x
– Shared Memory POSIX Threads
~ 70x
4
Theoretical Speed-Up – Linear at Best
Speed-Up
< Linear
5
Parallel Processing Speed-up
Grid Data Processing Speed-up
1. Multi-Core, Multi-threaded, Macro-blocks/Frames
2. SIMD, Vector Instructions Operating over Large Words (Many Times
Instruction Set Size)
3. Co-Processor Operates in Parallel to CPU(s)
6
Conceptual View of Hardware Resources
Three-Space View of CPU-bound HPC vs. RT or Fair
Utilization
Goal is to fully use
Requirements All resources to scale!
– CPU Margin?
– IO Latency (and CPU-Use
Bandwidth) Margin?
– Memory Capacity (and
Latency) Margin? CPU, I/O,
Mem bound
Upper Right Front Corner –
Low-Margin CPU, I/O
Mem Margin IO-Use
I/O-bound
Origin – High-Margin
Memory-Use
CPU + I/O + Memory
Bound?! – Bad day!
memory-bound
7
Copyright © 2019 University of Colorado