Welford’s method for computing variance – The Mindful Programmer
Welford’s method for computing variance – The Mindful Programmer
N 2
∑ (xi – x̄) −
−
2 i=1 2
s = , s = √s
N − 1
i=1 i
converted directly into an algorithm that computes the variance and standard
deviation in two passes: compute the mean in one pass over the data, and then
do a second pass to compute the squared differences from the mean. Doing
two passes is not ideal though, and it can be impractical in some settings. For
example, if the samples are generated by a random simulation it may be
prohibitively expensive to store samples just so you can do a second pass over
them.
A easy computation gives us the following identity that suggests a method for
computing the variance in a single pass, by simply accumulating the sums of
x and x :
2
i i
N 2 2 2 2 2 2 2
∑i=1 (x – 2x̄xi + x̄ ) ∑ x – 2N x̄ + N x̄ ∑ x – N x̄
2 i i i
s = = =
N − 1 N − 1 N − 1
https://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/ 1/3
4/2/24, 23:14 Welford’s method for computing variance – The Mindful Programmer
variance(samples):
sum := 0
sumsq := 0
for x in samples:
sum := sum + x
sumsq := sumsq + x**2
mean := sum/N
return (sumsq - N*mean**2)/(N-1)
Now that you’ve seen it, do not use this method to compute variance ever.
This is one of those cases where a mathematically simple approach turns out
to give wrong results for being numerically unstable. In simple cases the
algorithm will seem to work fine, but eventually you will find a dataset that
exposes the problem with the algorithm. If the variance is small compared to
the square of the mean, and computing the difference leads catastrophic
cancellation where significant leading digits are eliminated and the result has
a large relative error. In fact, you may even compute a negative variance,
which is mathematically impossible.
N N−1
2 2
= ∑(xi − x̄N ) − ∑ (xi − x̄N−1 )
i=1 i=1
N−1
2 2 2
= (xN − x̄N ) + ∑ ((xi − x̄N ) − (xi − x̄N−1 ) )
i=1
N−1
2
= (xN − x̄N ) + ∑ (xi − x̄N + xi − x̄N−1 )(x̄N−1 – x̄N )
i=1
2
= (xN − x̄N ) + (x̄N – xN )(x̄N−1 – x̄N )
This means we can compute the variance in a single pass using the following
algorithm:
https://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/ 2/3
4/2/24, 23:14 Welford’s method for computing variance – The Mindful Programmer
variance(samples):
M := 0
S := 0
for k from 1 to N:
x := samples[k]
oldM := M
M := M + (x-M)/k
S := S + (x-M)*(x-oldM)
return S/(N-1)
To see that this method does work better than the one derived earlier you can
make an experiment, or analyze it theoretically. John D. Cook found the
accuracy of this method comparable to the accuracy of the two-pass
method <
http://www.johndcook.com/blog/2008/09/26/comparing-three-
methods-of-computing-standard-deviation/> derived directly from the
definition, while the results from the previous one-pass algorithm were found
to be useless as expected.
https://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/ 3/3