0% found this document useful (0 votes)
20 views

Benchmarking C++ Code - Bryce Adelstein Lelbach - CppCon 2015

Uploaded by

alan88w
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Benchmarking C++ Code - Bryce Adelstein Lelbach - CppCon 2015

Uploaded by

alan88w
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

UNIVERSITY OF

CALIFORNIA
Benchmarking C++ Code

Bryce Adelstein Lelbach aka wash <[email protected]>


Computer Architecture Group, Computing Research Division

CppCon 2015
The Problem with Performance

Problem: Code seg faults Debuggers and tools A potential fix is


are used to learn more implemented. Prior
• We solve this type of about the problem. fixes may be reverted.

problem with an iterative


workflow.
Analyze Implement
• We know when we’re done;
we can easily get a “yes/no” Debugging
answer during the testing
phase.
– Usually, there’s no random
Test
error when testing for this
The test is executed to
type of problem (excluding see if the problem has
race conditions). been solved.

Copyright (C) 2015 Bryce Adelstein lelbach 3


The Problem with Performance

Problem: Code is slow A potential fix is


??? implemented. Prior
• Producing a “yes/no” fixes may be reverted.

answer during the testing


phase is more difficult.
Analyze Implement
– Performance is not a Boolean
quantity. Optimization
 It is often unclear when the
problem is “fixed”.
 You never really finish
optimizing.
Test
– Performance data is subject
to random error due to ???
natural variability.

Copyright (C) 2015 Bryce Adelstein lelbach 4


What is Performance?

How do we define performance, anyways?


• Not “fast”, but “fast enough”.
• Real-world metrics:
– Ex: simulation-years/day
• Roofline:
– Ex: FLOP/s
• Deadline:
– Ex: takes 50 milliseconds

You need to be able to come up with meaningful definitions for


performance.

Copyright (C) 2015 Bryce Adelstein lelbach 5


Sources of Error

Observational Error: The difference between what you


measure and the true result.
• Random Error: Errors caused by natural variance.
• Systemic Error: Errors caused by an inaccuracy – usually
constant or proportional to the true result.

Observational error is unavoidable. Meaningful


performance analysis must account for error.
• E.g. statistical testing approach

Copyright (C) 2015 Bryce Adelstein lelbach 6


Variance

Computers can reproduce answers, not performance.


• Hardware jitter
– Instruction pipelines: The pipeline fill level has an effect on the
execution time for one instruction.
– Difference in CPU/memory bus clock cycles: The CPU clock cycle is
different from the memory bus clock speed. Your CPU sometimes has
to wait for the synchronization of memory accesses.
– CPU frequency scaling and power management: These features
cause heterogeneities in processing power.
– Shared hardware caches: Caches shared between multiple
cores/threads are subject to variance due to concurrent use.

Source: http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html

Copyright (C) 2015 Bryce Adelstein lelbach 7


Variance

Computers can reproduce answers, not performance.


• Larger memory segments may have variance in access times
due to physical distance from the CPU.
• Additionally, OS activities can cause non-determinism.
– Some hardware interrupts require OS handling immediately after
delivery.
– Migration of non-pinned processes can affect the performance of CPU
heuristics.

Observer Effect: all forms of instrumentation change the


results.

Source: http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html

Copyright (C) 2015 Bryce Adelstein lelbach 8


Variance

Source: Cy Chan, John Bachan

Copyright (C) 2015 Bryce Adelstein lelbach 9


Statistical Best Practices

10
Statistical Best Practices

Statistics: A great way to lie to yourself.

Copyright (C) 2015 Bryce Adelstein lelbach 11


Statistical Best Practices

Statistics: A way to extract conclusions from your data

Copyright (C) 2015 Bryce Adelstein lelbach 12


Statistical Best Practices

Statistics: The science of data…


collection
analysis
interpretation
presentation

Copyright (C) 2015 Bryce Adelstein lelbach 13


Case Study: CFD AMR Scaling

Statistical Best Practices

14
Case Study: CFD AMR Scaling
AMR Test, Strong-Scaling

Copyright (C) 2015 Bryce Adelstein lelbach 15


Case Study: CFD AMR Scaling
AMR Test, Strong-Scaling (with uncertainty)

Copyright (C) 2015 Bryce Adelstein lelbach 16


Case Study: CFD AMR Scaling
AMR Test, Strong-Scaling (with uncertainty)

Copyright (C) 2015 Bryce Adelstein lelbach 17


Statistical Best Practices

Process:
• Form a hypothesis: how do you expect performance to change?
• Come up with a test to determine if your hypothesis is right.
• Gather data.
• Statistically analyze data.
• Draw conclusions.

Copyright (C) 2015 Bryce Adelstein lelbach 18


Statistical Best Practices

Come up with a test to determine if your hypothesis is


right.
• Identify independent/dependent/control variables.
• Determine what relevant metric you’ll use (metric will be derived
from dependent variables).
• Consider the assumptions you’re making:
– Assumptions about independence of variables.
– Assumptions about distribution of samples.
 Usually we assume a normal distribution.

Copyright (C) 2015 Bryce Adelstein lelbach 19


Gathering Data

Amortizing: When measuring “small” events, we often


measure by amortization to reduce the observer effect.
• E.g. time an N-iteration for loop and divide by N to get the
amortized time per iteration.
high_resolution_timer t; // Start timing.

for (std::size_t i = 0; i < N; ++i)

A[i] = A[i] + B[i] * C[i];

double time_per_iteration = t.elapsed() / N;

• We treat this as one sample, not N samples.

Copyright (C) 2015 Bryce Adelstein lelbach 20


Gathering Data

Sampling: Each independent measurement we take is a


sample.
• Samples are representative of the “population” (AKA the true
performance).
• Our goal is to gather samples in sufficient quantity and quality to
be representative of the population.

It’s crucial to both sample within one execution of the


test and across multiple executions of the test.
• Gathering data across multiple executions gives a better
representation of system noise.

Copyright (C) 2015 Bryce Adelstein lelbach 21


Gathering Data

Running “hot” vs “cold”.


• Often, you need to make sure that both your test as a whole (e.g.
each execution), and the particular region your measuring (e.g.
each sample) are not running “cold” on the CPU.
– I/O, caching and branch prediction may be off if you’re running cold.

You can do this by doing some warmup executions/runs


before you start measuring.
• E.g. don’t measure first execution or first few iterations.

Copyright (C) 2015 Bryce Adelstein lelbach 22


Uncertainty

Uncertainty: representation of the amount of error in a


certain measurement.
• Instrument uncertainty: the inherent amount of uncertainty in an
instrument.
– Ex: if your clock ticks in microseconds, it would have an instrument
uncertainty of +/- 500 nanoseconds (1/2th the unit of measurement).
• The sample standard deviation of a set of samples is a
frequently used method for estimating the uncertainty of the
average of the samples.
When dealing with derived metrics that use averaged
data, you can formulate a derived uncertainty based on
the uncertainties of the averaged data.

Copyright (C) 2015 Bryce Adelstein lelbach 23


Uncertainty

Given uncorrelated averaged data 𝐴 and 𝐵 with


standard deviations 𝜎𝐴 and 𝜎𝐵 , and constants 𝑎 and 𝑏.

Function Standard Deviation


𝑓 = 𝑎𝐴 𝜎𝑓 = 𝑎𝜎𝐴

𝑓 = 𝑎𝐴 ± 𝑏𝐵 𝜎𝑓 = 𝑎2 𝜎𝐴2 + 𝑏2 𝜎𝐵2

𝜎𝐴 2 𝜎𝐵 2
𝑓 = 𝐴𝐵 or 𝑓 = 𝐴/𝐵 𝜎𝑓 ≈ 𝑓 +
𝐴 𝐵

Copyright (C) 2015 Bryce Adelstein lelbach 24


Case Study: CFD AMR Scaling

Statistical Best Practices

25
Case Study: CFD AMR Scaling
AMR Test, Strong-Scaling (with uncertainty)

Copyright (C) 2015 Bryce Adelstein lelbach 26


Case Study: CFD AMR Scaling
AMR Test, Walltime

Copyright (C) 2015 Bryce Adelstein lelbach 27


Example: Boost.Accumulators

Statistical Best Practices

28
Example: Boost.Accumulators

“Boost.Accumulators provides accumulators to which


numbers can be added to get, for example, the mean or
the standard deviation.”
The Boost C++ Libraries, Boris Schäling

Copyright (C) 2015 Bryce Adelstein lelbach 29


Example: Boost.Accumulators
using namespace boost::accumulators;

int main()
{
accumulator_set<
double, stats<tag::count, tag::mean, tag::median, tag::variance>
> acc;

acc(42);

// ... Accumulate data ...

auto stdev = std::sqrt(variance(acc));

std::cout << "Mean: " << mean(acc) << "\n"


<< "Median: " << median(acc) << "\n"
<< "Stdev: " << stdev << "\n";
}

Copyright (C) 2015 Bryce Adelstein lelbach 30


Example: Boost.Accumulators

Copyright (C) 2015 Bryce Adelstein lelbach 31


Example: Boost.Accumulators

Copyright (C) 2015 Bryce Adelstein lelbach 32


Example: Boost.Accumulators

Two different forms of standard deviation


• Uncorrected, takes the standard deviation of an entire
population:
1 𝑛 2
𝜎= 𝑖=1 𝑥𝑖 − 𝜇
𝑛

• Corrected, takes the standard deviation of a sample of a


population:
1 𝑛 2
𝜎= 𝑖=1 𝑥𝑖 − 𝜇
𝑛−1

Copyright (C) 2015 Bryce Adelstein lelbach 33


Example: Boost.Accumulators
using namespace boost::accumulators;

int main()
{
accumulator_set<
double, stats<tag::count, tag::mean, tag::median, tag::variance>
> acc;

acc(42);

// ... Accumulate data ...

auto n = count(acc);
auto stdev = std::sqrt(variance(acc)*(n/(n-1.0)));

std::cout << "Mean: " << mean(acc) << "\n"


<< "Median: " << median(acc) << "\n"
<< "Stdev: " << stdev << "\n";
}

Copyright (C) 2015 Bryce Adelstein lelbach 34


Gathering Data

Process for collecting good data:


• Take individual measurements in your code. Use amortization if
relevant.
• Accumulate multiple measurements and uncertainty estimations
in code.
• Gather results from multiple executions of the test, and
recompute uncertainty estimations.
– Given two averages, 𝜇1 and 𝜇2 (and a combined average 𝜇), of 𝑛1 and
𝑛2 data points, with sample standard deviations 𝜎1 and 𝜎2 , the
combined sample standard deviation of both datasets is:
𝑛12 𝜎12 +𝑛22 𝜎22 −𝑛2 𝜎12 −𝑛2 𝜎22 −𝑛1 𝜎12 −𝑛1 𝜎22 +𝑛𝑦 𝑛𝑥 𝜎12 +𝑛𝑦 𝑛𝑥 𝜎22 +𝑛1 𝑛2 𝜇1 −𝜇2 2
𝜎=
(𝑛1 +𝑛2 −1)(𝑛1 +𝑛2 )

Copyright (C) 2015 Bryce Adelstein lelbach 35


Confidence Intervals

Confidence Interval: a way to describe the amount of


uncertainty associated with a sample of a population.
• Constructed from three pieces of information:
– Confidence level (𝑟) - e.g. 90%, 95%, 99%.
– Statistical data, including sample size (𝑛).
– Uncertainty for the data (𝜎).
𝑧𝜎
𝐶𝐼 =
𝑛
• 𝑧 is the critical value. For large sample sizes, you can look this
up in a table. For small sample sizes, use the Student's t inverse
cumulative distribution function:
z = 𝑇𝑖𝑛𝑣 (1 − 𝑟, 𝑛 − 1)

Copyright (C) 2015 Bryce Adelstein lelbach 36


Confidence Intervals

One of the useful things you can do with confidence


intervals is determine the correct sample size, based on
an initial “pilot” set of samples.
• Given a margin of error 𝑒𝑚 , a critical value 𝑧, an uncertainty 𝜎,
and a mean 𝜇:
𝑧𝜎 2
𝑛= 𝑒𝑚
2
𝜇

• If this calculation indicates an unreasonable large sample size is


needed, the experiment may need to be redesigned.
• Typically, if your uncertainties are big relative to your data (mean
and standard deviation have the same magnitude), there is too
much noise to get meaningful results from your data.

Copyright (C) 2015 Bryce Adelstein lelbach 37


Confidence Intervals

Meaning of confidence intervals


• If the true performance lies outside of the 95% confidence
interval, then an event occurred which had a probability of 5% or
less of happening.
• A 95% confidence interval does not mean that 95% of the data
lies within the interval.
• A confidence interval isn’t a range of plausible values for a
sample mean. It can be interpreted as an estimate of plausible
values for the population.

Copyright (C) 2015 Bryce Adelstein lelbach 38


Confidence Intervals

Copyright (C) 2015 Bryce Adelstein lelbach 39


Case Study: HPX CS Overhead

Statistical Best Practices

40
Case Study: HPX CS Overhead
Context Switching Overhead (95% CI), Intel Sandybridge

Copyright (C) 2015 Bryce Adelstein lelbach 41


Case Study: HPX CS Overhead
Context Switching Overhead (UNC), Intel Sandybridge

Copyright (C) 2015 Bryce Adelstein lelbach 42


Mean-Median Test

Normality test: Tests to determine if a data-set fits a


normal distribution well.
• There are graphical (QQ plot), informal/back-of-the-envelope and
rigorous normality tests.

The mean (μ), median (m) and mode of normally


distributed data should be the same, so…
𝜇−𝑚
max(𝜇,𝑚)
• This will give you the relative difference between the mean and
median (a percentage represented as a decimal). If this is larger
than 1%, your data is probably not normally distributed.
Copyright (C) 2015 Bryce Adelstein lelbach 43
Case Study: HPX CS Overhead
Context Switching Overhead (Mean-Median Test), Intel Sandybridge

Copyright (C) 2015 Bryce Adelstein lelbach 44


Case Study: HPX CS Overhead
Context Switching Overhead (Scatter), Intel Sandybridge

Copyright (C) 2015 Bryce Adelstein lelbach 45


Time-Based Benchmarking

46
Time-Based Benchmarking

We have access to a few different clocksources for


benchmarking on modern (x86) CPUS:
• System-wide high-resolution clock:
– Monotonic, frequency-stable, higher latency and overhead.
– Resolution is in nanoseconds.
– Times can be passed between threads.
– *nix, this is accessed via clock_gettime reading CLOCK_MONOTONIC.
– Windows, this is accessed via QueryPerformanceCounter/Frequency.
– Suitable for measuring most events (microseconds and up).

Copyright (C) 2015 Bryce Adelstein lelbach 47


Time-Based Benchmarking

We have access to three different clocksources for


benchmarking on modern (x86) CPUS:
• Timestamp Counter (TSC):
– Monotonic, lower latency and overhead.
– Resolution is in CPU cycles (with caveats), tick is in base clock cycles.
 All newer (4-5 year old) CPUs guarantee a constant TSC frequency, even if
the CPU frequency changes (e.g. frequency scaling, Intel Turbo mode).
 Constant TSC frequency == timing data is not representative of # of cycles
executed.
 Ticks with the base clock, which runs at 100 or 133 Mhz (depending on
microarchitecture).
– Assembly instruction(s) for reading this counter.
– Cycle counts are thread-specific.
– Suitable for measuring short events (cycles to minutes).

Copyright (C) 2015 Bryce Adelstein lelbach 48


<chrono>

Standard facilities for manipulating dates and times,


introduced in C++11
• Three types:
– Duration: A span of time, defined as some number of ticks of some
time unit.
– Time Point: A duration of time that has passed since the epoch of
specific clock.
– Clocks: An object with a starting point and a tick rate, which can be
queried for the current time.

<chrono> is the best way to measure durations that are


microsecond magnitude or large.

Source: cppreference.com

Copyright (C) 2015 Bryce Adelstein lelbach 49


<chrono>

Clock Description
system_clock Wall clock time from the system-
wide realtime clock.
steady_clock Monotonic clock that will never be
adjusted.
high_resolution_clock The clock with the shortest tick
period available.

Copyright (C) 2015 Bryce Adelstein lelbach 50


Example: high_resolution_timer

Time-Based Benchmarking

51
Example: high_resolution_timer
struct high_resolution_timer
{
high_resolution_timer() : start_time_(take_time_stamp()) {}

void restart()
{ start_time_ = take_time_stamp(); }

double elapsed() const // Return elapsed time in seconds.


{ return double(take_time_stamp() - start_time_) * 1e-9; }

std::uint64_t elapsed_nanoseconds() const


{ return take_time_stamp() - start_time_; }

protected:
static std::uint64_t take_time_stamp()
{
return std::chrono::duration_cast<std::chrono::nanoseconds>
(std::chrono::steady_clock::now().time_since_epoch()).count();
}

private:
std::uint64_t start_time_;
};

Copyright (C) 2015 Bryce Adelstein lelbach 52


Non-Time-Based Benchmarking

53
Memory Benchmarking

Approaches to instrumenting memory allocation:


• What do we want to look at?
– Objects (allocated/deallocated)
– Memory (total, per object size, per object type)
• External tools:
– googleperftools/TCMalloc (MALLOCSTATS)
– MemTrack
• Overload operator new/delete
– Writing a member operator new/delete is a great technique for tracking
memory performance for a specific object.
– I suggest a static member variable to store the performance data; if
you need thread safety, use thread-local storage and accumulate
afterwards.

Copyright (C) 2015 Bryce Adelstein lelbach 54


Example: Instrumenting operator new

Non-Time-Based Benchmarking

55
Example: Instrumenting operator new

struct A {
static std::size_t allocated;

static void* operator new(std::size_t sz)


{
allocated += sz/sizeof(A);
return ::operator new(sz);
}
static void* operator new[](std::size_t sz)
{
allocated += sz/sizeof(A);
return ::operator new(sz);
}
};

std::size_t A::allocated = 0;

Copyright (C) 2015 Bryce Adelstein lelbach 56


Counting Copies/Moves

When we started transition the HPX codebase to


support move semantics a few years ago, we wrote
some tests to make sure we got it right.
• We passed mock objects that count copies/moves through our
framework and looked at the results.
• Once we were confident our interfaces were doing things right
(minimizing the number of copies, etc), we wrote unit tests to
verify the move/copy counts wouldn’t change.
• Especially important for us – HPX is an asynchronous
programming framework, so there are places where we duplicate
data to facilitate asynchrony.
– We wanted to ensure we only copied async() arguments once.

Copyright (C) 2015 Bryce Adelstein lelbach 57


Hardware Performance Counters

X86 processors have a diverse set of hardware


performance counters.
• Pros:
– Low overhead.
– Very diverse and descriptive information.
• Cons:
– Microarchitecture specific.
– Some counters are estimations, or suffer from inaccuracies
(overcounting, etc).
– You need very specialized knowledge to use these for performance
analysis.
 Fortunately, there’s an awesome tool which has this knowledge baked into it.

Copyright (C) 2015 Bryce Adelstein lelbach 58


Hardware Performance Counters

Low-level frameworks for accessing hardware counters


from within your code:
• Linux: PAPI framework
• Windows: Performance Counter framework
• Mac: kpc.h

There are some external sampling-based profiling tools


that provide access to this information.
• Ex: Intel VTune Amplifier.

Copyright (C) 2015 Bryce Adelstein lelbach 59


Performance Analysis Tools

60
Intel VTune Amplifier

Sampling-based profiling tool: runs your application,


and collects “snapshots” of performance metrics while
your program is running.
• Works on Intel processors, Windows/Linux/Mac OS X/Android,
not tied to any particular compiler.
• Requires no code changes to use.
• Multiple data sources: timers, hardware performance counters
and operating system metrics.
• Performance data can be viewed per function or at
assembly/source code granularity.
• Analyzes everything: kernel calls, sub-processes, threads, etc.

Copyright (C) 2015 Bryce Adelstein lelbach 61


Intel VTune Amplifier

Sampling-based profiling tool: runs your application,


and collects “snapshots” of performance metrics while
your program is running.
• Provides built-in analysis passes which derive useful, higher-
level performance metrics from micro-architecture specific raw
hardware counters.
• Also supports user-defined analysis passes.
• Support for instrumenting parallel and distributed code.
– Built-in support for OS-threading frameworks.
– Built-in support for OpenMP, MPI and Intel TBB.
– Provides an instrumentation API which parallel programming
frameworks can use to inform the profiler about their threading and
concurrency data structures.

Copyright (C) 2015 Bryce Adelstein lelbach 62


Intel VTune Amplifier

Sampling-based profiling tool: runs your application,


and collects “snapshots” of performance metrics while
your program is running.
• Powerful GUI.
– Standalone Windows/Linux/Mac GUI as well as integration with Visual
Studio and Eclipse.
– Data can be collected remotely via the command line interface and
then fed into the GUI.
– Great interface for filtering data (e.g. focusing in on just one section of
the program’s execution).
– Built-in analysis passes contain a lot of information about how to
interpret results.

Copyright (C) 2015 Bryce Adelstein lelbach 63


Intel VTune Amplifier

Copyright (C) 2015 Bryce Adelstein lelbach 64


Intel VTune Amplifier

Copyright (C) 2015 Bryce Adelstein lelbach 65


Intel VTune Amplifier

Copyright (C) 2015 Bryce Adelstein lelbach 66


Intel VTune Amplifier

Copyright (C) 2015 Bryce Adelstein lelbach 67


Intel Vectorization Adviser

New tool in Intel Parallel Studio XE 2016: Vectorization


Adviser.
• Integrates the Intel compiler’s vectorization reports into the GUI
performance profiling framework.

Source: https://software.intel.com/en-us/intel-advisor-xe/

Copyright (C) 2015 Bryce Adelstein lelbach 68


Intel Vectorization Adviser

SSE4.1 SSE4.2 AVX AVX2

Source: https://software.intel.com/en-us/intel-advisor-xe/

Copyright (C) 2015 Bryce Adelstein lelbach 69


Write Performance Tests

70
Write Performance Tests

Idea: Let’s write unit and regression tests for


performance, just like we do for correctness.

Challenges:
• Implementing automated performance testing that follow the kind
of best practices we’ve been talking about requires a lot of
machinery.
• If your performance tests are stateful (rely on the results of
previous automated tests), you need even more machinery.
• You need more than just the machinery to run the tests - you
also need automated analysis to determine whether the test has
failed.
Copyright (C) 2015 Bryce Adelstein lelbach 71
Write Performance Tests

Debuggers and tools A potential fix is


are used to learn more implemented. Prior
about the problem. fixes may be reverted.

Analyze Implement
Debugging

Test
The test is executed to
see if the problem has
been solved.

Copyright (C) 2015 Bryce Adelstein lelbach 72


Write Performance Tests

Statistical data is A potential fix is


analyzed to determine implemented. Prior
if the test passed. fixes may be reverted.

Analyze Implement
Optimization

Test
The test is executed,
and statistical data is
recorded.

Copyright (C) 2015 Bryce Adelstein lelbach 73


Stateful Performance Tests

Stateful performance tests: performance benchmarks


which yield results that cannot be interpreted without
contextual information.
• Output: absolute values.
• Most of your existing benchmarks are already stateful.
• To automate these tests, the current performance (e.g. trunk)
needs to be compared against some prior results. There’s two
options for doing this:
– Automated build system stores prior results for comparison. Requires
more machinery, but allows you to track performance over time.
– Automated build system checks out and builds an older version of the
code to compare against.

Copyright (C) 2015 Bryce Adelstein lelbach 74


Stateful Performance Tests

Copyright (C) 2015 Bryce Adelstein lelbach 75


Stateless Performance Tests

Stateless performance tests: performance benchmarks


which test for a performance “failure”, and can provide a
“yes/no” answer without external data.
• Output: relative values.

The idea is to compare different implementation options


which you believe to have a performance impact.
• Ex: Lockfree queue vs lock-based queue.
• Ex: Recomputing data locally vs overhead for sharing.
• Ex: Algorithmic complexity testing.

Copyright (C) 2015 Bryce Adelstein lelbach 76


Stateless Performance Tests

Whenever you face a design trade-off


with performance implications,
write a stateless test!

Copyright (C) 2015 Bryce Adelstein lelbach 77


Summary

• Take a scientific approach to performance benchmarking.


– Hypothesize, design a test, run the test, analyze, draw conclusions.
• Manage and test your assumptions about your tests.
• Collect a statistically significant quantity of data.
• Measure and propagate error.
• Develop unit and regression tests for performance.

Copyright (C) 2015 Bryce Adelstein lelbach 78


UNIVERSITY OF
CALIFORNIA

You might also like