0% found this document useful (0 votes)
3K views

Pipeline Optimization Techniques

The document discusses pipeline optimization techniques for 3D graphics pipelines. It begins by stating that premature optimization should be avoided and performance should be measured before optimizing. The key steps are to locate the bottleneck stage, measure performance, and optimize that stage. Two common techniques for locating bottlenecks are reducing the workload of individual stages and measuring the impact on performance. Once identified, the bottleneck stage should be optimized through techniques like efficient coding practices, algorithm improvements, and leveraging SIMD instructions before optimizing other non-bottleneck stages.

Uploaded by

biswajit biswal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3K views

Pipeline Optimization Techniques

The document discusses pipeline optimization techniques for 3D graphics pipelines. It begins by stating that premature optimization should be avoided and performance should be measured before optimizing. The key steps are to locate the bottleneck stage, measure performance, and optimize that stage. Two common techniques for locating bottlenecks are reducing the workload of individual stages and measuring the impact on performance. Once identified, the bottleneck stage should be optimized through techniques like efficient coding practices, algorithm improvements, and leveraging SIMD instructions before optimizing other non-bottleneck stages.

Uploaded by

biswajit biswal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Pipeline Optimization

Overview
“We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil”
 Locating the bottleneck
 Performance measurements – Donald Knuth
 Optimizations  Make it run first, then optimize
 Balancing the pipeline  But only optimize where it makes any difference
 Other optimizations: multi-processing, parallel processing  Pipeline Optimization: Process to maximize the rendering speed,
then allow stages that are not bottlenecks to consume as much time
as the bottleneck.

ITCS 4010/5010:Game Engine Design 1 Pipeline Optimization ITCS 4010/5010:Game Engine Design 2 Pipeline Optimization

Locating the Bottleneck


Pipeline Optimization
 Two bottleneck location techniques:
 Stages execute in parallel  Technique 1:
 Always the slowest stage is the bottleneck of the pipeline ◦ Make a certain stage work less
 The bottleneck determines throughput (i.e., maximum speed) ◦ If performance is the better, then that stage is the bottleneck
 The bottleneck is the average bottleneck over a frame  Technique 2:
 Cannot measure intra-frame bottlenecks easily ◦ Make the other two stages work less or (better) not at all
◦ If performance is the same, then the stages not included above
 Bottlenecks can change over a frame
is the bottleneck
 Most important: find bottleneck, then optimize that stage!
 Complication: the bus between CPU and graphics card may be bot-
tleneck (not a typical stage)

ITCS 4010/5010:Game Engine Design 3 Pipeline Optimization ITCS 4010/5010:Game Engine Design 4 Pipeline Optimization
Application (CPU) Stage the Bottleneck?
Geometry Stage the Bottleneck?
 Use top, osview command on Unix, TaskManager on Windows.
 Trickiest stage to test
 If app uses (near) 100% of CPU time, then very likely application is
the bottleneck  Why? Change in geometry workload usually changes application
and rasterizer workload.
 Using a code profiler is safer.
 Number of light sources only affects geometry stage:
 Make CPU do less work (e.g., turn off collision-detection)
◦ Disable light sources (vertex shaders can make this simple).
 Replace glVertex and glNormal with glColor
◦ If performance goes up, then geometry is bottleneck, and pro-
 Makes the geometry and rasterizer do almost nothing gram transform-limited
 No vertices to transform, no normals to compute lighting for, no tri-  Alternately, enable all light sources; if performance stays the same,
angles to rasterize geometry stage NOT the bottleneck
 If performance does not change, program is CPU-bound, or CPU-  Alternately, test CPU and rasterizer instead
limited

ITCS 4010/5010:Game Engine Design 5 Pipeline Optimization ITCS 4010/5010:Game Engine Design 6 Pipeline Optimization

Rasterizer Stage the Bottleneck?


Optimization
 The easiest, and fastest to test
 Optimize the bottleneck stage
 Simply, decrease the size of the window you render to
 Only put enough effort, so that the bottleneck stage moves
◦ Does not change app. or geometry workload
◦ But rasterizer needs to fill fewer pixels  Did you get enough performance?
◦ If the performance goes up, then program is “fill-limited” or “fill- ◦ Yes! Quit optimizing
bound” ◦ NO! Continute optimizing the (possibly new) bottleneck
 Make rasterizer work less: Turn of texturing, fog, blending, depth  If close to maximum speed of system, might need to turn to acceler-
buffering etc (if your architecture have performance penalties for ation techniques (spatial data structures, occlusion culling, etc)
these)

ITCS 4010/5010:Game Engine Design 7 Pipeline Optimization ITCS 4010/5010:Game Engine Design 8 Pipeline Optimization
Application Stage Optimization
Illustrating Optimization
 Initial Steps:
◦ Turn on optimiziation flags in compiler
◦ Use code profilers, shows places where majority of time is spent
◦ This is time consuming stuff
 Height of bar: time it takes for that stage for one frame  Strategy 1: Efficient code
 Highest bar is bottleneck ◦ Use fewer instructions
 After optimization: bottleneck has moved to APP ◦ Use more efficient instructions
 No use in optimizing GEOM, turn to optimizing APP instead ◦ Recode algorithmically
 Strategy 2: Efficient memory access

ITCS 4010/5010:Game Engine Design 9 Pipeline Optimization ITCS 4010/5010:Game Engine Design 10 Pipeline Optimization

Appliction:Code Optimization Tricks

 SIMD intstructions sets perfect for vector ops Code Optimization Tricks (contd)
◦ 2-4 operations in parallell
◦ SSE, SSE2, 3DNow! are examples  Conditional branches are generally expensive;
 Division is an expensive operation ◦ Avoid if-then-else if possible
◦ Between 4-39 times slower than most other instructions ◦ Sometimes branch prediction on CPUs works remarkably well
◦ Good usage Example: vector normalization:  Math functions (sin, cos, tan, sqrt, exp, etc.) are expensive
Instead of ◦ Rough approximation might be sufficient
v = (vx/d, vy /d, vz /d) ◦ Can use first few terms in Taylor series

Do
 Inline code is good (avoids function calls)
 float (32 bits) is faster than double (64 bits); less data is sent down
d = v · v, f = 1/d, v = v ∗ f the pipeline
 On some CPUs there √ are low-precision versions of (1/x) and square
root reciprocal (1/ x)
ITCS 4010/5010:Game Engine Design 11 Pipeline Optimization ITCS 4010/5010:Game Engine Design 12 Pipeline Optimization
Code Optimization Tricks (contd)

 Compiler optimization: Hard to predict: –counter vs. counter–


Memory Optimization
 Use const in C and C++ to help to compiler with optimization
 Memory hierarchies (caches) in modern computers - primary, sec-
 Following often incur overhead: ondary caches.
◦ Dynamic casting (C++)  Bad memory access pattern can ruin performance
◦ Virtual methods  Not really about using less memory, though that can help
◦ Inherited constructors
◦ Passing structs by value

ITCS 4010/5010:Game Engine Design 13 Pipeline Optimization ITCS 4010/5010:Game Engine Design 14 Pipeline Optimization

Memory Optimization Tricks (contd)

Memory Optimization Tricks  Align data with size of cache line


◦ Example: on most Pentiums, the cache line size if 32 bytes
 Sequential access: Store data in order in memory: ◦ Now, assume that it takes 30 bytes to store a vertex
◦ Padding with another 2 bytes to 32 bytes will likely perform bet-
◦ Tex Coords #0, Position #0, Tex Coords #1, Position #1, Tex
ter.
coords #2, Position #2, etc.
 Following pointers (linked list) is expensive (if memory is allocated
 Cache prefetching is good, but hard to control
arbitrarily)
 malloc() and free() may be slow: Consider using a custom storage
◦ Does not use coherence well that cache usually exploits
allocator - allocate memory to a pool at startup
◦ That is, the address after the one we just used is likely to be
used soon
◦ Paper by Smits on ray tracing shows this.

ITCS 4010/5010:Game Engine Design 15 Pipeline Optimization ITCS 4010/5010:Game Engine Design 16 Pipeline Optimization
Geometry Stage: Optimization Geometry Stage: Optimization

 Normals must be normalized to get correct lighting


 Geometry stage does per-vertex ops
◦ Normalize them as a preprocess, and disable normalizing if pos-
◦ Best way to optimize: Use Triangle strips!!!
sible
 Lighting optimization:
 Lighting can be computed for both sides of a triangle; disable if not
◦ Spot lights expensive, point light cheaper, directional light needed.
cheapest
 If light sources are static with respect to geometry, and material is
◦ Disable lighting if possible only diffuse
◦ Use as few light sources as possible
◦ Precompute lighting on CPU
◦ If you use 1/d2 fallof, then if d > 10 (example), disable light
◦ Send only precomputed colors (not normals)

ITCS 4010/5010:Game Engine Design 17 Pipeline Optimization ITCS 4010/5010:Game Engine Design 18 Pipeline Optimization

Raster Stage: Optimization


Raster Stage: Optimization
 Rasterizer stage does per-pixel ops
 To make rasterization faster, need to rasterize fewer (or cheaper)
 Simple Optimization: turn on backface culling if possible
pixels:
 Turn off Z-buffering if possible:
◦ Make window smaller
◦ Example: after screen clear, draw large background polygon ◦ Render to a smaller texture, and then enlarge texture onto
◦ Using polygon-aligned BSP trees screen
 Draw in front-to-back order  Depth complexity is number of times a pixel has been written to
 Try disable features: texture filtering mode, fog, blending, multisam- ◦ Good for understanding behaviour of application
pling

ITCS 4010/5010:Game Engine Design 19 Pipeline Optimization ITCS 4010/5010:Game Engine Design 20 Pipeline Optimization
Depth Complexity
Overall Optimization: General Techniques

 Reduce number of primitives, eg. using polygon simplification algo-


rithms
 Preprocess geometry and data for the particular architecture
 Turn off features not in use such as:
◦ Depth buffering, Blending, Fog, Texturing

ITCS 4010/5010:Game Engine Design 21 Pipeline Optimization ITCS 4010/5010:Game Engine Design 22 Pipeline Optimization

Overall Optimization (contd) Balancing the Pipeline

 Minimize state changes by grouping objects


◦ Example: objects with the same texture should be rendered to-
gether
 If all pixels are always drawn, avoid color buffer clear
 Frame buffer reads are expensive  The bottleneck stage sets the frame rate
 Display lists may work faster  The other two stages will be idle for some time
 Precompile a list of primitives for faster rendering  Also, to sync with monitor, there might be idle time for all stages
 OpenGL API supports this  Exploit this time to make quality of images better if possible

ITCS 4010/5010:Game Engine Design 23 Pipeline Optimization ITCS 4010/5010:Game Engine Design 24 Pipeline Optimization
Balancing the Pipeline Multiprocessing
 Increase number of triangles (affects all stages)
 More lights, more expensive (geometry)
 More realistic animation, more accurate collision detection (applica-
tion)
 More expensive texture filtering, blending, etc. (rasterizer)
 If not fill-limited, increase window size
 Note: there are FIFOs between stages (and at many other places  Use this if application is bottleneck, and is affordable
too) to smooth out idleness of stages  Two major ways: (1) Multiprocessor pipelining, (2) Parallel process-
 More techniques in text. ing

ITCS 4010/5010:Game Engine Design 25 Pipeline Optimization ITCS 4010/5010:Game Engine Design 26 Pipeline Optimization

Summary

 Pipeline optimization is no substitute for good algorithms!


 Do optimization as a last step.
 Primarily for products that should be shipped
 Most often good to use triangle strips!

ITCS 4010/5010:Game Engine Design 27 Pipeline Optimization

You might also like