Benchmarking C++ Code - Bryce Adelstein Lelbach - CppCon 2015
Benchmarking C++ Code - Bryce Adelstein Lelbach - CppCon 2015
CALIFORNIA
Benchmarking C++ Code
CppCon 2015
The Problem with Performance
Source: http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html
Source: http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html
10
Statistical Best Practices
14
Case Study: CFD AMR Scaling
AMR Test, Strong-Scaling
Process:
• Form a hypothesis: how do you expect performance to change?
• Come up with a test to determine if your hypothesis is right.
• Gather data.
• Statistically analyze data.
• Draw conclusions.
𝑓 = 𝑎𝐴 ± 𝑏𝐵 𝜎𝑓 = 𝑎2 𝜎𝐴2 + 𝑏2 𝜎𝐵2
𝜎𝐴 2 𝜎𝐵 2
𝑓 = 𝐴𝐵 or 𝑓 = 𝐴/𝐵 𝜎𝑓 ≈ 𝑓 +
𝐴 𝐵
25
Case Study: CFD AMR Scaling
AMR Test, Strong-Scaling (with uncertainty)
28
Example: Boost.Accumulators
int main()
{
accumulator_set<
double, stats<tag::count, tag::mean, tag::median, tag::variance>
> acc;
acc(42);
int main()
{
accumulator_set<
double, stats<tag::count, tag::mean, tag::median, tag::variance>
> acc;
acc(42);
auto n = count(acc);
auto stdev = std::sqrt(variance(acc)*(n/(n-1.0)));
40
Case Study: HPX CS Overhead
Context Switching Overhead (95% CI), Intel Sandybridge
46
Time-Based Benchmarking
Source: cppreference.com
Clock Description
system_clock Wall clock time from the system-
wide realtime clock.
steady_clock Monotonic clock that will never be
adjusted.
high_resolution_clock The clock with the shortest tick
period available.
Time-Based Benchmarking
51
Example: high_resolution_timer
struct high_resolution_timer
{
high_resolution_timer() : start_time_(take_time_stamp()) {}
void restart()
{ start_time_ = take_time_stamp(); }
protected:
static std::uint64_t take_time_stamp()
{
return std::chrono::duration_cast<std::chrono::nanoseconds>
(std::chrono::steady_clock::now().time_since_epoch()).count();
}
private:
std::uint64_t start_time_;
};
53
Memory Benchmarking
Non-Time-Based Benchmarking
55
Example: Instrumenting operator new
struct A {
static std::size_t allocated;
std::size_t A::allocated = 0;
60
Intel VTune Amplifier
Source: https://software.intel.com/en-us/intel-advisor-xe/
Source: https://software.intel.com/en-us/intel-advisor-xe/
70
Write Performance Tests
Challenges:
• Implementing automated performance testing that follow the kind
of best practices we’ve been talking about requires a lot of
machinery.
• If your performance tests are stateful (rely on the results of
previous automated tests), you need even more machinery.
• You need more than just the machinery to run the tests - you
also need automated analysis to determine whether the test has
failed.
Copyright (C) 2015 Bryce Adelstein lelbach 71
Write Performance Tests
Analyze Implement
Debugging
Test
The test is executed to
see if the problem has
been solved.
Analyze Implement
Optimization
Test
The test is executed,
and statistical data is
recorded.