0% found this document useful (0 votes)
79 views

Variance and Standard Deviation

This document discusses variance and standard deviation as measures of the spread of data around the mean. It provides formulas for calculating population variance (σ2) and standard deviation (σ) using the entire population, as well as sample variance (s2) and standard deviation (s) using a sample. The sample variance is calculated by summing the squared deviations from the mean and dividing by n-1, rather than n, to provide an unbiased estimate of the population variance. Examples are provided to demonstrate calculating variance and standard deviation by hand.

Uploaded by

Harsh kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Variance and Standard Deviation

This document discusses variance and standard deviation as measures of the spread of data around the mean. It provides formulas for calculating population variance (σ2) and standard deviation (σ) using the entire population, as well as sample variance (s2) and standard deviation (s) using a sample. The sample variance is calculated by summing the squared deviations from the mean and dividing by n-1, rather than n, to provide an unbiased estimate of the population variance. Examples are provided to demonstrate calculating variance and standard deviation by hand.

Uploaded by

Harsh kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Variance and Standard Deviation

1
Variance and Standard Deviation
We need a measure of the distribution or spread of data
around an expected value (either x or  ). Variance and
standard deviation provide such measures.

Formulas and rationale for these measures are described


in the next Procedure display. Then, examples and guided
exercises show how to compute and interpret these
measures.

As we will see later, the formulas for variance and standard


deviation differ slightly, depending on whether we are using
a sample or the entire population.

2
Measures of Variation:
• The Sample Variance
• Average (approximately) of squared deviations of values
from the mean

n
• Sample variance:  (X  X) i
2

S 2 i1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
3
Sample Variance

4
SAMPLE VARIANCE

• A shortcut formula for the sample variance:


  n
 
2

n   xi  
1   i 1  
S2   i
n  1  i 1
x 2

n 
 
 
• Where S2 is the sample variance
• n is the total number of values in the sample
• xi is the value of the i-th observation.
•  represents a summation

5
Measures of Variation:

• Most commonly used measure of variation


• Shows variation about the mean
• Is the square root of the variance
• Has the same units as the original data

• Sample standard deviation:


n

 i
(X  X ) 2

S i1
n -1
6
Population Variance
• In practice population variance cannot be computed
directly because the entire population is not ordinarily
observed.
• An analogous measure of variability may be determined
with sample data.
• This referred to as sample variance

7
SAMPLE VARIANCE

• Notice that the sample variance is defined as the sum


of the squared deviations divided by n-1.
• Sample variance is computed to estimate the
population variance.
• An unbiased estimate of the population variance may
be obtained by defining the sample variance as the
sum of the squared deviations divided by n-1 rather
than by n.
• Defining sample variance as the mean squared
deviation from the sample mean tends to
underestimate the population variance.

8
POPULATION/SAMPLE STANDARD DEVIATION

• Compute the sample standard deviation of


advertising data: 2.5, 1.3, 1.4, 1.0 and 2.0

• Compute the sample standard deviation of sales


data: 264, 116, 165, 101 and 209

9
The population mean

• The population mean is the sum of the values in the


population divided by the population size, N

X i
X1  X2    XN
 i1

N N
where μ = population mean
N = population size
Xi = ith value of the variable X
10
Population Variance σ2

• Average of squared deviations of values from the mean


• Population variance:
N

 i
(X  μ)2

σ2  i1
N

where μ = population mean


N = population size
Xi = ith value of the variable X
11
Population Standard Deviation σ

• Most commonly used measure of variation


• Shows variation about the mean
• Is the square root of the population variance
• Has the same units as the original data
• Population standard deviation:
N

 i
(X  μ) 2

σ i1
N

12
POPULATION / SAMPLE STANDARD DEVIATION

• The standard deviation is the positive square root of


the variance:
Population standard deviation:    2

Sample standard deviation: S  S 2


Compute the standard deviations of advertising and
sales.

13
Sample vs population parameters

Measure Population Sample


Parameter Statistic
Mean
 X
Variance
2 S2
Standard
 S
Deviation

14
Variance and Standard Deviation
Procedure:

15
Variance and Standard Deviation

16
Variance and Standard Deviation

17
Variance and Standard Deviation
In statistics, the sample standard deviation and sample
variance are used to describe the spread of data about the
mean x.

The next example shows how to find these quantities by


using the defining formulas.

As you will discover, for “hand” calculations, the


computation formulas for s 2 and s are much easier to use.

18
Variance and Standard Deviation
However, the defining formulas for s 2 and s emphasize the
fact that the variance and standard deviation are based on
the differences between each data value and the mean.

19
Variance and Standard Deviation

20
Example – Sample Standard Deviation (Defining Formula)

Compute the variance and standard deviation of the calories.

21
Standard Deviation

22
Standard Deviation

23
Standard Deviation

Ex: The wholesale prices of a commodity for seven consecutive days in


a month is as follows:
Days: 1 2 3 4 5 6 7
Commodity price/quintal : 240 260 270 245 255 286 264
Calculate the variance and standard deviation.

variance = 206,
24
Example – Sample Standard Deviation (Defining Formula)

Big Blossom Greenhouse was commissioned to develop an


extra large rose for the Rose Bowl Parade.

A random sample of blossoms from Hybrid A bushes


yielded the following diameters (in inches) for mature peak
blooms.

2 3 3 8 10 10

Use the defining formula to find the sample variance and


standard deviation.

25
Example – Solution
Several steps are involved in computing the variance and
standard deviation. A table will be helpful (see Table).

Diameters of Rose Blossoms (in inches)

Since n = 6, we take the sum of the entries in column I of


Table and divide by 6 to find the mean x.

26
Example – Solution

Using this value for x, we obtain Column II. Square each


value in column II to obtain Column III, and then add the
values in Column III.

To get the sample variance, divide the sum of Column III by


n – 1. Since n = 6, n – 1 = 5.

27
Example – Solution

Now obtain the sample standard deviation by taking the


square root of the variance.

28
Variance

29
Variance

30
Variance

31
Variance and Standard Deviation
In most applications of statistics, we work with a random
sample of data rather than the entire population of all
possible data values.

32
Variance and Standard Deviation
However, if we have data for the entire population, we can
compute the population mean , population variance  2,
and population standard deviation  (lowercase Greek
letter sigma) using the following formulas:

33
Variance and Standard Deviation
We note that the formula for  is the same as the formula
for x (the sample mean) and that the formulas for  2 and 
are the same as those for s 2 and s (sample variance and
sample standard deviation), except that the population size
N is used instead of n – 1.

Also,  is used instead of x in the formulas for  2 and .

In the formulas for s and , we use n – 1 to compute s and


N to compute . Why?

The reason is that N (capital letter) represents the


population size, whereas n (lowercase letter) represents
the sample size. 34
Variance and Standard Deviation
Since a random sample usually will not contain extreme
data values (large or small), we divide by n – 1 in the
formula for s to make s a little larger than it would have
been had we divided by n.

Courses in advanced theoretical statistics show that this


procedure will give us the best possible estimate for the
standard deviation .

In fact, s is called the unbiased estimate for . If we have


the population of all data values, then extreme data values
are, of course, present, so we divide by N instead of N – 1.
35
Variance and Standard Deviation
Comment
The computation formula for the population standard
deviation is

36
Standard Deviation

37
Standard Deviation

Example:

For a group of 50 male workers, the mean and standard


deviation of their monthly wages are Rs. 6300 and Rs. 900
respectively. For a group of 40 female workers, these are
Rs. 5400 and Rs. 600, respectively. Find the standard
deviation of monthly wages for the combined group of
workers.

38
Standard Deviation

39
Variance / Standard Deviation

For grouped data:

40
Variance

For grouped data:

41
Standard Deviation

Calculate standard deviation from the following data:

Marks: 10 20 30 40 50 60 70
No. of students: 6 5 12 3 5 4 5

42
Standard Deviation

10 6 60 -27 729 4374


20 5 100 -17 289 1445
30 12 360 -7 49 588
40 3 120 3 9 27
50 5 250 13 169 845
60 4 240 23 529 2116
70 5 350 33 1089 5445
Sum 40 1480 14840

43
Standard Deviation

Compute the standard deviation from the following data.

Expenditure (Rs): 50–100 100–150 150–200 200–250 250–300


No. of families: 20 10 30 5 10

44
Standard Deviation

45
Example – Sample Standard Deviation (Defining Formula)

Ex:

The mean of 5 observations is 15 and the variance is 9. If


two more observations having values – 3 and 10 are
combined with these 5 observations, what will be the new
mean and variance of 7 observations.

46
Standard Deviation

47
Standard deviation
Ex: A study of the age of 100 persons grouped into
intervals 20–22, 22–24, 24–26,..., revealed the mean age
and standard deviation to be 32.02 and 13.18 respectively.
While checking it was discovered that the observation 57
was misread as 27. Calculate the correct mean age and
standard deviation.

48
Standard deviation

49
Coefficient of Variation

50
Coefficient of Variation
A disadvantage of the standard deviation as a comparative
measure of variation is that it depends on the units of
measurement.

This means that it is difficult to use the standard deviation


to compare measurements from different populations.

For this reason, statisticians have defined the coefficient of


variation, which expresses the standard deviation as a
percentage of the sample or population mean.

51
Coefficient of Variation

Notice that the numerator and denominator in the definition


of CV have the same units, so CV itself has no units of
measurement.
52
Coefficient of Variation
This gives us the advantage of being able to directly
compare the variability of two different populations using
the coefficient of variation.

The set of data for which the coefficient of variation is low is


said to be more uniform (consistent) or more homogeneous
(stable).

53
Coefficient of Variation

54
Example – Coefficient of Variation
The Trading Post on Grand Mesa is a small, family-run
store in a remote part of Colorado. The Grand Mesa region
contains many good fishing lakes, so the Trading Post sells
spinners (a type of fishing lure).

The store has a very limited selection of spinners. In fact,


the Trading Post has only eight different types of spinners
for sale. The prices (in dollars) are

2.10 1.95 2.60 2.00 1.85 2.25 2.15 2.25

Since the Trading Post has only eight different kinds of


spinners for sale, we consider the eight data values to be
the population. 55
Example – Coefficient of Variation
(a) Use a calculator with appropriate statistics keys to verify
that for the Trading Post data, and   $2.14 and
  $0.22.

Solution:
Since the computation formulas for x and  are identical,
most calculators provide the value of x only.

Use the output of this key for . The computation formulas


for the sample standard deviation s and the population
standard deviation  are slightly different.

Be sure that you use the key for  (sometimes designated


as n or x). 56
Example – Coefficient of Variation
(b) Compute the CV of prices for the Trading Post and
comment on the meaning of the result.

Solution:

57
Example – Solution
Interpretation
The coefficient of variation can be thought of as a measure
of the spread of the data relative to the average of the data.

Since the Trading Post is very small, it carries a small


selection of spinners that are all priced similarly.

The CV tells us that the standard deviation of the spinner


prices is only 10.28% of the mean.

58
Example
Ex: The weekly sales of two products A and B were
recorded as given below:
Product A : 59 75 27 63 27 28 56
Product B : 150 200 125 310 330 250 225
Find out which of the two shows greater fluctuation in
sales.

Solution: For comparing the fluctuation in sales of two


products, we will prefer to calculate coefficient of
variation for both the products.

59
Example
Product A: Let A = 56 be the assumed mean of sales for
product A

60
Example

Product B: Let A = 225 be the assumed mean of sales for product B

61
Example

62
Example

Since the coefficient variation for product A is more than that of


product B, the sales fluctuation in case of product A is higher.

63
Example
From the analysis of monthly wages paid to employees in two service
organizations X and Y, the following results were obtained:

(a) Which organization pays a larger amount as monthly wages?


(b) In which organization X or Y, is there greater variability in individual
wages?
(c) What are the measures of (i) average monthly wages and (ii)
standard deviation in the distribution of individual wages of all workers
in two organizations taken together?
Solution: (a) For finding out which organization X or Y pays larger
amount of monthly wages, we have to compare the total wages:
64
Example
(a)

Organization Y pays a larger amount as monthly wages as


compared to organization X

(b)

Since CV for X is greater than CV for Y, organization B has greater


variability in individual wages.
65
Example
(c)

66
Example – Solution
From the analysis of monthly wages paid to workers
in two organizations X and Y, the following results
were obtained:

Obtain the average wages and the variability in individual


wages of all the workers in the two organizations taken
together.
Solution:
67
Example
The number of employees, average daily wages per
employee, and the variance of daily wages per employee
for two factories are given below:

(a) In which factory is there greater variation in the distribution of daily


wages per employee?
(b) Suppose in Factory B the wages of an employee were wrongly
noted as Rs. 120 instead of Rs. 100. What would be the correct
variance for Factory B?
68
Example

69
Example
32 trials of a process to finish a certain job revealed the
following information:
Mean time taken to complete the job = 80 minutes
Standard deviation = 16 minutes
Another set of 8 trials gave mean time as 100 minutes and
standard deviation equal to 25 minutes.
Find the combined mean and standard deviation.

Solution:

70
Example
An analysis of production rejects resulted in the following
observations

Calculate the mean and standard deviation.

Solution:
71
Example
From the analysis of monthly wages paid to employees in
two service organizations X and Y, the following results
were obtained:

(a) Which organization pays a larger amount as monthly


wages?
(b) In which organization is there greater variability in
individual wages of all the wage earners taken together?

72
Example

73
Example
From the analysis of monthly wages paid to workers in two
organizations X and Y, the following results were obtained:
X Y
Number of wage-earners : 550 600
Average monthly wages (Rs.): 1260 1348.5
Variance of distribution of
wages (Rs.): 100 841

Obtain the average wages and the variability in individual


wages of all the workers in the two organizations taken
together.

74
Example
The following set of data is from a sample of
7 4 9 8 2
a. Compute the mean, median, and mode.
b. Compute the range, variance, standard deviation, and
coefficient of variation.

The following set of data is from a sample of


7 -5 -8 7 9
a. Compute the mean, median, and mode.
b. Compute the range, variance, standard deviation, and
coefficient of variation.
75

You might also like