measures of variability
measures of variability
Prof. S. P. Bansal
Principle Investigator Vice Chancellor
Maharaja Agrasen University, Baddi
Prof. YoginderVerma
Co-Principle Investigator Pro–ViceChancellor
Central University of Himachal Pradesh. Kangra. H.P.
Module Title
Measures of Dispersion: Mean Absolute Deviation, Standard Deviation,
Variance, Coefficient of Variation
Module Id 7
Objectives Introduction
Range
Mean Absolute Deviation
Computation of Mean Deviation
Characteristics of mean deviation
Uses of mean deviation
Standard Deviation
Computation of Standard Deviation
Characteristics of Standard Deviation
Uses of Standard Deviation
Quartile deviation or Semi Inter-Quartile range
Variance
Relative measures of dispersion
Coefficient of dispersion
Coefficient of variation
Standard error
Expression for the standard error of mean
Probable error
Summary
Self-check exercise with solutions
Keywords Range, Mean Deviation, Standard Deviation, Variance, Quartile Deviation,
Semi Inter Quartile Range, Standard Error, Probable Error
Module-7Measures of Dispersion: Mean Absolute Deviation, Standard Deviation,Variance,
Coefficient of Variation
Introduction
Range: Definition, computation of range, merits and demerits of range estimation
Mean Absolute Deviation: Definition, computation of mean deviation, characteristics of mean deviation,
uses of mean deviation
Standard Deviation: Definition, computation of standard deviation, characteristics of standard deviation,
uses of standard deviation, Quartile Deviation
Variance: Definition, computation of variance
Relative measures of dispersion: Coefficient of dispersion, Coefficient of variation
Standard Error: Definition, Expression for the standard error of mean
Probable Error
Summary
Learning Objectives:
Range
Mean Absolute Deviation
Computation of Mean Deviation
Characteristics of mean deviation
Uses of mean deviation
Standard Deviation
Computation of Standard Deviation
Characteristics of Standard Deviation
Uses of Standard Deviation
Quartile deviation or Semi Inter Quartile range
Variance
Relative measures of dispersion
Coefficient of dispersion
Coefficient of variation
Standard error
Expression for the standard error of mean
Probable error
1. Introduction
Any measure of central tendency or average has its own limitations and gives us an idea
only about that central value of the set of observations around which all the observations
have a tendency to lie, but it fails to give any idea about the way in which they are
distributed. There can be a number of series each of which has the same mean but differs
from others in respect of the pattern in which the observations are distributed. To follow
this point Consider the following series.
Series A 9 9 9 9 9 9 9
Series B 6 7 8 9 10 11 12
Series C 1 2 4 5 11 13 27
Series D 3 15
In the above series, we observe that arithmetic mean of every series is 9, but the pattern
in which the observations are distributed is different in different series. In series A the
mean is 9 and all the observations are same. In series B also, the mean is 9 and the
observations are scattered ranging from 6 to 12 but not very much scattered. In series C,
the mean is the same value 9 but the observations are too much scattered ranging from 1
to 27. In series D there are only two observations the mean of which is 9.
From the above example it is quite obvious that for studying a series, a study of the extent
of scattering of the observations of dispersion is also essential along with the study of the
central tendency in order to throw more light on the nature of the series. The following
are the different measures of dispersion which are in common use.
2. Range
2.1. Definition:
The range is the simplest measure of dispersion. It is the difference between the highest
and lowest terms of a series of observations.
2.2.Computation of Range:
𝑅𝑎𝑛𝑔𝑒 = 𝑋𝐻 − 𝑋𝐿
Where, XH = Highest variate value
and XL = Lowest variate value
3. Mean Deviation
3.1. Definition:
If the deviations of all the observations from their mean are calculated, their algebraic
sum will be zero. When this sum is always zero, it is impossible to get the average of
these deviations. In order to overcome this difficulty, these deviations are added
irrespective of plus or minus sign and then the average is calculated. The deviations
without any plus or minus sign are known as absolute deviations. The mean of these
absolute deviations is called the mean deviation. If the deviations are calculated from the
mean, the measure of dispersion is called mean deviation about the mean. As a matter of
fact mean deviation can be calculated from any average, and for that, the absolute
deviations from that average will be calculated.
3.2.Computation of mean deviation:
1 1
𝑀𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑎𝑏𝑜𝑢𝑡 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 = ∑ ⃓𝑥⃓ = ∑ ⃓𝑋 − 𝑋⃓
𝑁 𝑁
x= Deviation from the mean= X- X͞
⃓x⃓= Absolute deviation
N=Number of observations
Example:
Classes Frequency Mid values X- X͞ fx ⃓fx⃓
(f) (X) (x)
0-10 1 5 -22 -22 22
10-20 3 15 -12 -36 36
20-30 5 25 -2 -10 10
30-40 4 35 +8 +32 32
40-50 2 45 +18 +36 36
Total 15 - - 0 136
Calculations:
1 1
Mean deviation about the mean= ∑ ⃓𝑓𝑥⃓= × 136= 9.07
𝑁 15
4. Standard Deviation
4.1.Definition:
Calculation of standard deviation is also based on the deviations from the arithmetic
mean. In thecase of mean deviation the difficulty, that the sum of the deviations from the
arithmetic mean is always zero, is solved by taking these deviations irrespective of plus
or minus signs. But here, that the difficulty is solved by squaring them and taking the
square root of their average. It is thus defined by thefollowingexpression.
∑(𝑋−𝜇)2
Standard Deviation (S.D.) = √ …………. (1)
𝑁
Where, X= An observation or variate value
µ = Arithmetic mean of the population
N= Number of given observations.
According to the expression given in (1), thepopulationmeanµ is required for finding the
standard deviation (S.D.) of a given set of observations. Generally, µ is not known. Therefore it
is replaced by X͞ , which is the mean of the given set of observations, and then the S.D. of the
given data is given by
∑(𝑋−𝑋)2
Standard Deviation (S.D.) = √ ……………(2)
𝑁
s-10
Here, it should be noted that formula (2) gives the S.D. of the given set of data which itself is
assumed to be the population with µ=X͞ . Therefore we shall this S.D. as the ‘population S.D.’
Thus,
∑(𝑋−𝑋)2
Population S.D. (=σ) = √ …………… (3)
𝑁
∑ 𝑓(𝑋−𝑋)2
Population S.D. (=σ) = √ …………… (4)
𝑁
Sample S.D.: In case, when the given set of data is not a population but is a sample drawn from
a large population, the population mean µ is not known. Therefore, in its place, we use X͞ which
is the estimate ofµ obtained from the sample observations. The result is that we cannot calculate
the population S.D. (σ), but, in its place, we calculate its estimate (S). We represent the estimates
of population parameters, µ and σ, in the following way:
X͞ = Estimate of (µ)
S= Estimate of (σ)
The best estimates (S) of the population S.D. (σ) is given by
∑ 𝑓(𝑋−𝑋)2
S (sample S.D.) = √ …………… (5)
𝑁−1
s-11
4.2.Computation of Standard Deviation
For computing S.D., in every case, we have to calculate the arithmetic mean, which
increases the labor of calculation work. Therefore, to avoid it, theshort cut method should
be used in which any value of the variate is chosen as the arbitrary mean and then the
standard deviation is calculated by the following process:
Suppose, A is the arbitrary mean and d is the deviation of the variate value from A.
i.e. d = X-A
(∑ 𝑓𝑑)2
we have, ∑ 𝑓(𝑋 − 𝑋)2 =∑ 𝑓𝑑2 −
𝑁
Therefore, for this, we require the columns of d, fd, and fd2. In the column of d we shall
find a factor equal to the width of the class interval “i” common to all the figures in that
column. After taking out this factor as common, the columns now will be of d/I, fd/I and
fd2/i2. With the help of these symbols, the values of ∑ 𝑓(𝑋 − X͞ )2 and S.D. will be
calculated as given bellow.
𝑓𝑑
∑ 𝑓𝑑 = 𝑖 × ∑
𝑖
𝑓𝑑 2
∑ 𝑓𝑑 2 = 𝑖 2 × ∑
𝑖2
𝑓 2
2
𝑓𝑑 2 (∑ 𝑑)
2
∑ 𝑓(𝑋 − 𝑋) = 𝑖 × {∑ 2 − }
𝑖 𝑁
s-12
If we use the symbol D for d/I, the above expressions will be written as
∑ 𝑓𝑑 = 𝑖 × ∑ 𝑓𝐷
∑ 𝑓𝑑 2 = 𝑖 2 × ∑ 𝑓𝐷2
(∑ 𝑓𝐷)2
∑ 𝑓(𝑋 − 𝑋)2 = 𝑖 2 × {∑ 𝑓𝐷2 − }
𝑁
1 2
(∑ 𝑓𝐷)2
𝑆. 𝐷. = 𝑖 × [√ {∑ 𝑓𝐷 − }]
𝑁 𝑁
s-13
Example:
Calculation of S.D.
Here,
(∑ 𝑓𝐷)2
∑ 𝑓(𝑋 − 𝑋)2 = 𝑖 2 × {∑ 𝑓𝐷2 − }
𝑁
32
= 102 × [9 − ]
15
276
= 100 ×[ ]
15
= 1840
∑ 𝑓(𝑋−𝑋)2
Population S.D. =√ , (Here, µ= X͞)
𝑁
1840
σ=√ = 11.07
15
∑ 𝑓(𝑋−𝑋)2 1840
Sample S.D. = √ =√
𝑁−1 14
Or S= 11.46
s-15
5. Variance
Variance is the square of the standard deviation.
Variance = (S.D.)2
The variance of a population is generally represented by the symbol σ2and its unbiased
estimate calculated from the sample, by the symbol S2.
6. Relative Measures of Dispersion
The measures of dispersion, which we studied so far, are the absolute measures of
dispersion, and are represented it’s the same units in which the observations are
represented, e.g., gms., cm., meters, hectares, etc. When we have to compare the
dispersions of two or more distributions, it will not be proper to compare their absolute
measures of dispersions, because, the distributions or the data may differ from one
another.
(i) With respect to their averages
(ii) With respect to their dispersions
(iii)With respect to their averages and dispersions both
(iv) With respect to their units
Therefore, they will not be comparable. Under such circumstances, their comparison is
possible with the help of relative measures of dispersion.
6.1.Coefficient of Dispersion
It is computed by the following expression:
𝑀𝑒𝑎𝑠𝑢𝑟𝑒𝑠𝑜𝑓𝐷𝑖𝑠𝑝𝑒𝑟𝑠𝑖𝑜𝑛
Coefficient of Dispersion =
𝑅𝑒𝑙𝑎𝑡𝑒𝑑𝑀𝑒𝑎𝑠𝑢𝑟𝑒𝑠𝑜𝑓𝐶𝑒𝑛𝑡𝑟𝑎𝑙𝑇𝑒𝑛𝑑𝑒𝑛𝑐𝑦
s-17
𝑆.𝐷.
C.V. = × 100
𝐴.𝑀.
It is expressed in percentage and used to compare the variability in the two or more
series. Lesser value of thecoefficient of variation indicates more consistency.
7. Standard Error
7.1.Definition
The Standard deviation of the sampling distribution of a statistic (estimate) is known as
the standard error of that statistic (estimate).
If we take all possible samples from the population of the same size and get a sampling
distribution of means, it can be proved that the mean of this sampling distribution of
means is the population mean and its standard deviation, the standard error of the mean.
As it is not possible to draw and study all possible samples, we have to get and we get the
estimate of the standard error from a single sample. If S be the standard deviation of the
𝑆
sample of size N, the estimate of the standard error of mean is given by .
√𝑁
S-18
7.2.Expression for the standard error of mean
Let there be a sample of N observations, X1, X2, X3………XN which have been drawn at
random from a population, the variance of which is σ2.
1
Now, Mean X͞ = (𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑁
𝑁
Variance of mean
1
V(X͞ )= {𝑉(𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑁 )}
𝑁2
1
= [𝑉(𝑋1 ) + 𝑉(𝑋2 ) + ⋯ + 𝑉(𝑋𝑁 )]
𝑁2
Since, V(X1) =V(X2) = V(X3)=………………….= V(XN)= σ2
𝑁𝜎 𝜎2 2
V(X͞)= 2 =
𝑁 𝑁
𝜎
S.E. of X͞ =
√𝑁
8. Probable Error
The quartile deviation of the sampling distribution of means is known as aProbable error
and is 0.67449times the standard error.
P.E. = 0.67449 (S.E.)
Three times the probable error is roughly twice the standard error. This measure of
dispersion has no particular advantage and moreover involves a troublesome factor
0.67449. This is why it has gone out of use and has given place to standard error.
9. Summary
This module provides an overview to students to understand the techniques that are used
to measure the extent of variation or the deviation (also called thedegree of variation) of
each value in the dataset from a measure of central tendency, usually the mean or median.
Such statistical techniques are called measures of dispersion (or variation). A small
dispersion among values in the data set indicates that data are clustered that data are
clustered closely around the mean. The mean is therefore considered representative of the
data, i.e. mean is reliable average. Conversely, a large dispersion among values in the
data set indicates that the mean is not reliable, i.e. it is not representative of data. The
symmetrical distribution of values in two or more sets of data may have same variation
but differ greatly in terms of A.M. On the other hand, two or more sets of data may have
the same A.M. values but differ in variation.
10. Self-check exercise with solution
Q.1.Thefollowing data give the number of passengers traveling by airplane from one city to
another in one week.
115, 122, 129, 113, 119, 124, 132, 120, 110, 116
Calculate the mean and standard deviation and determine the percentage of class that lie
between (i) µ±σ (ii) µ±2σ and (iii) µ±3σ. What percentage of cases lie outside these
limits.
Calculation of Mean and Standard Deviation
X X- X͞ (X- X͞ )2
115 -5 25
122 2 4
129 9 81
113 -7 49
119 -1 1
124 4 16
132 12 144
120 0 0
110 -10 100
116 -4 16
Solution:
∑𝑋 1200 ∑(𝑋−X͞)2 436
µ= = = 120 and σ2= = =43.6
𝑁 10 𝑁 10
The percentage of cases that lie between a given limit are as follows:
Interval Values within interval Percentage of Percentage
population falling Outside
µ±σ = 120±6.60 113, 115, 116, 119, 70% 30%
= 113.4 and 126.6 120, 122, 124
µ±2σ = 120 ± 2 110, 113, 115, 116, 100% nil
= 106.80 and 133.20 119, 120, 122, 124,
129, 132