Pipeline Optimization Techniques
Pipeline Optimization Techniques
Overview
“We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil”
Locating the bottleneck
Performance measurements – Donald Knuth
Optimizations Make it run first, then optimize
Balancing the pipeline But only optimize where it makes any difference
Other optimizations: multi-processing, parallel processing Pipeline Optimization: Process to maximize the rendering speed,
then allow stages that are not bottlenecks to consume as much time
as the bottleneck.
ITCS 4010/5010:Game Engine Design 1 Pipeline Optimization ITCS 4010/5010:Game Engine Design 2 Pipeline Optimization
ITCS 4010/5010:Game Engine Design 3 Pipeline Optimization ITCS 4010/5010:Game Engine Design 4 Pipeline Optimization
Application (CPU) Stage the Bottleneck?
Geometry Stage the Bottleneck?
Use top, osview command on Unix, TaskManager on Windows.
Trickiest stage to test
If app uses (near) 100% of CPU time, then very likely application is
the bottleneck Why? Change in geometry workload usually changes application
and rasterizer workload.
Using a code profiler is safer.
Number of light sources only affects geometry stage:
Make CPU do less work (e.g., turn off collision-detection)
◦ Disable light sources (vertex shaders can make this simple).
Replace glVertex and glNormal with glColor
◦ If performance goes up, then geometry is bottleneck, and pro-
Makes the geometry and rasterizer do almost nothing gram transform-limited
No vertices to transform, no normals to compute lighting for, no tri- Alternately, enable all light sources; if performance stays the same,
angles to rasterize geometry stage NOT the bottleneck
If performance does not change, program is CPU-bound, or CPU- Alternately, test CPU and rasterizer instead
limited
ITCS 4010/5010:Game Engine Design 5 Pipeline Optimization ITCS 4010/5010:Game Engine Design 6 Pipeline Optimization
ITCS 4010/5010:Game Engine Design 7 Pipeline Optimization ITCS 4010/5010:Game Engine Design 8 Pipeline Optimization
Application Stage Optimization
Illustrating Optimization
Initial Steps:
◦ Turn on optimiziation flags in compiler
◦ Use code profilers, shows places where majority of time is spent
◦ This is time consuming stuff
Height of bar: time it takes for that stage for one frame Strategy 1: Efficient code
Highest bar is bottleneck ◦ Use fewer instructions
After optimization: bottleneck has moved to APP ◦ Use more efficient instructions
No use in optimizing GEOM, turn to optimizing APP instead ◦ Recode algorithmically
Strategy 2: Efficient memory access
ITCS 4010/5010:Game Engine Design 9 Pipeline Optimization ITCS 4010/5010:Game Engine Design 10 Pipeline Optimization
SIMD intstructions sets perfect for vector ops Code Optimization Tricks (contd)
◦ 2-4 operations in parallell
◦ SSE, SSE2, 3DNow! are examples Conditional branches are generally expensive;
Division is an expensive operation ◦ Avoid if-then-else if possible
◦ Between 4-39 times slower than most other instructions ◦ Sometimes branch prediction on CPUs works remarkably well
◦ Good usage Example: vector normalization: Math functions (sin, cos, tan, sqrt, exp, etc.) are expensive
Instead of ◦ Rough approximation might be sufficient
v = (vx/d, vy /d, vz /d) ◦ Can use first few terms in Taylor series
Do
Inline code is good (avoids function calls)
float (32 bits) is faster than double (64 bits); less data is sent down
d = v · v, f = 1/d, v = v ∗ f the pipeline
On some CPUs there √ are low-precision versions of (1/x) and square
root reciprocal (1/ x)
ITCS 4010/5010:Game Engine Design 11 Pipeline Optimization ITCS 4010/5010:Game Engine Design 12 Pipeline Optimization
Code Optimization Tricks (contd)
ITCS 4010/5010:Game Engine Design 13 Pipeline Optimization ITCS 4010/5010:Game Engine Design 14 Pipeline Optimization
ITCS 4010/5010:Game Engine Design 15 Pipeline Optimization ITCS 4010/5010:Game Engine Design 16 Pipeline Optimization
Geometry Stage: Optimization Geometry Stage: Optimization
ITCS 4010/5010:Game Engine Design 17 Pipeline Optimization ITCS 4010/5010:Game Engine Design 18 Pipeline Optimization
ITCS 4010/5010:Game Engine Design 19 Pipeline Optimization ITCS 4010/5010:Game Engine Design 20 Pipeline Optimization
Depth Complexity
Overall Optimization: General Techniques
ITCS 4010/5010:Game Engine Design 21 Pipeline Optimization ITCS 4010/5010:Game Engine Design 22 Pipeline Optimization
ITCS 4010/5010:Game Engine Design 23 Pipeline Optimization ITCS 4010/5010:Game Engine Design 24 Pipeline Optimization
Balancing the Pipeline Multiprocessing
Increase number of triangles (affects all stages)
More lights, more expensive (geometry)
More realistic animation, more accurate collision detection (applica-
tion)
More expensive texture filtering, blending, etc. (rasterizer)
If not fill-limited, increase window size
Note: there are FIFOs between stages (and at many other places Use this if application is bottleneck, and is affordable
too) to smooth out idleness of stages Two major ways: (1) Multiprocessor pipelining, (2) Parallel process-
More techniques in text. ing
ITCS 4010/5010:Game Engine Design 25 Pipeline Optimization ITCS 4010/5010:Game Engine Design 26 Pipeline Optimization
Summary