0% found this document useful (0 votes)
5 views

Welford’s method for computing variance – The Mindful Programmer

Uploaded by

jalexlg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Welford’s method for computing variance – The Mindful Programmer

Uploaded by

jalexlg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

4/2/24, 23:14 Welford’s method for computing variance – The Mindful Programmer

The Mindful Programmer < https://jonisalonen.com/> — Articles on


computing, mathematics, art, and anything in between

Welford’s method for computing variance


Joni < https://jonisalonen.com/author/joni/>
2013/11/11 < https://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/>

The standard deviation <


http://en.wikipedia.org/wiki/Standard_deviation> is a measure of
how much a dataset differs from its mean; it tells us how dispersed the data
are. A dataset that’s pretty much clumped around a single point would have a
small standard deviation, while a dataset that’s all over the map would have a
large standard deviation.

Given a sample x 1 , … , xN , the standard deviation is defined as the square root


of the variance:

N 2
∑ (xi – x̄) −

2 i=1 2
s = , s = √s
N − 1

Here x̄ is the mean of the sample: x̄ = ∑ x . The definition can be


N
1 N

i=1 i

converted directly into an algorithm that computes the variance and standard
deviation in two passes: compute the mean in one pass over the data, and then
do a second pass to compute the squared differences from the mean. Doing
two passes is not ideal though, and it can be impractical in some settings. For
example, if the samples are generated by a random simulation it may be
prohibitively expensive to store samples just so you can do a second pass over
them.

A easy computation gives us the following identity that suggests a method for
computing the variance in a single pass, by simply accumulating the sums of
x and x :
2
i i

N 2 2 2 2 2 2 2
∑i=1 (x – 2x̄xi + x̄ ) ∑ x – 2N x̄ + N x̄ ∑ x – N x̄
2 i i i
s = = =
N − 1 N − 1 N − 1

Pseudocode for a one-pass variance computation could then look like:

https://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/ 1/3
4/2/24, 23:14 Welford’s method for computing variance – The Mindful Programmer

variance(samples):
sum := 0
sumsq := 0
for x in samples:
sum := sum + x
sumsq := sumsq + x**2
mean := sum/N
return (sumsq - N*mean**2)/(N-1)

Now that you’ve seen it, do not use this method to compute variance ever.
This is one of those cases where a mathematically simple approach turns out
to give wrong results for being numerically unstable. In simple cases the
algorithm will seem to work fine, but eventually you will find a dataset that
exposes the problem with the algorithm. If the variance is small compared to
the square of the mean, and computing the difference leads catastrophic
cancellation where significant leading digits are eliminated and the result has
a large relative error. In fact, you may even compute a negative variance,
which is mathematically impossible.

Welford’s method is a usable single-pass method for computing the


variance. It can be derived by looking at the differences between the sums of
squared differences for N and N-1 samples. It’s really surprising how simple
the difference turns out to be:
2 2
(N − 1)s – (N − 2)s
N N−1

N N−1

2 2
= ∑(xi − x̄N ) − ∑ (xi − x̄N−1 )

i=1 i=1

N−1

2 2 2
= (xN − x̄N ) + ∑ ((xi − x̄N ) − (xi − x̄N−1 ) )

i=1

N−1

2
= (xN − x̄N ) + ∑ (xi − x̄N + xi − x̄N−1 )(x̄N−1 – x̄N )

i=1

2
= (xN − x̄N ) + (x̄N – xN )(x̄N−1 – x̄N )

= (xN − x̄N )(xN − x̄N – x̄N−1 + x̄N )

= (xN − x̄N )(xN – x̄N−1 )

This means we can compute the variance in a single pass using the following
algorithm:

https://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/ 2/3
4/2/24, 23:14 Welford’s method for computing variance – The Mindful Programmer

variance(samples):
M := 0
S := 0
for k from 1 to N:
x := samples[k]
oldM := M
M := M + (x-M)/k
S := S + (x-M)*(x-oldM)
return S/(N-1)

To see that this method does work better than the one derived earlier you can
make an experiment, or analyze it theoretically. John D. Cook found the
accuracy of this method comparable to the accuracy of the two-pass
method <
http://www.johndcook.com/blog/2008/09/26/comparing-three-
methods-of-computing-standard-deviation/> derived directly from the
definition, while the results from the previous one-pass algorithm were found
to be useless as expected.

The Mindful Programmer < https://jonisalonen.com/> , Proudly powered by WordPress. <


https://wordpress.org/>

https://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/ 3/3

You might also like