Chapter Four: Measures of Dispersion (Variation) : Abebuabebaw
Chapter Four: Measures of Dispersion (Variation) : Abebuabebaw
CHAPTER FOUR:
4.1 Introduction
Just as central tendency can be measured by a number in the form of an average, the amount of variation
(dispersion, spread, or scatter) among the values in the data set can also be measured. The measures of
central tendency describe that the major part of values in the data set appears to concentrate around a
central value called average with the remaining values scattered (distributed) on either sides of that value.
But these measures do not reveal how these values are dispersed (spread or scatter) on each side of the
central value. The dispersion of values is indicated by the extent to which these values tend to spread over
an interval rather than cluster closely around an average.
The term dispersion is generally used in two senses. Firstly, dispersion refers to the variations of the items
among themselves. If the value of all the items of a series is the same, there will be no variation among
different items of a series. Secondly, dispersion refers to the variation of the items around an average. If
the difference between the value of items and the average is large, the dispersion will be high and on the
other hand if the difference between the value of the items and averaging is small, the dispersion will be
low. Thus, dispersion is defined as scatteredness or spreadness of the individual items in a given series.
[email protected]
1
Chapter four: MEASURES OF DISPERSION (VARIATION)
Relative measures of dispersion: A relative measure of dispersion is the ratio of a measure of absolute
dispersion to an appropriate average or the selected items of the data.
Relative measure of
dispersion
[email protected]
2
Chapter four: MEASURES OF DISPERSION (VARIATION)
Range is the simplest measures of dispersion. It is defined as the difference between the largest and
smallest value in a given set of data. Its formula is:
𝑅 =𝐿−𝑆
Where R=Range, L= Largest value in a given set of data, S= smallest value in a given set of data.
• The difference between upper class limit of the last class and the lower class limit of the first
class, or
• The difference between the largest class mark and the smallest class mark, or
• The difference between the upper class boundary of the last class and the lower class boundary
of the first class.
The range is used in describing like the maximum change in daily temperature, rainfall, etc. When the
sample size is small, it can be an adequate measure of variation. It is commonly used in quality control.
L−S
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑅𝑎𝑛𝑔𝑒(𝑅𝑅) =
L+S
Example 4.1: Five students obtained the following marks in statistics: 20, 35, 25, 30, 15. Find the range
and relative range
𝑅𝑎𝑛𝑔𝑒 = 𝐿 − 𝑆 = 35 − 15 = 20
L−S 35 − 15
𝑅𝑅 = = = 0.4
L+S 35 + 15
Example 4.2: Find out range and relative range of the following given data.
Solution: Here,
[email protected]
3
Chapter four: MEASURES OF DISPERSION (VARIATION)
30 − 5
Range = 30 – 5 = 25, 𝑅𝑅 = = 0.7143 .
30 + 5
Inter-quartile range and quartile deviation are other measures of dispersion. The difference between the
upper quartile (𝑄3 ) and lower quartile (𝑄1 ) is called inter-quartile range. Symbolically,
The inter-quartile ranges covers dispersion of middle 50% of the items of the series. Quartile deviation,
also called semi-inter-quartile range, is half of the difference between the upper and lower quartile. That
is, half of the inter-quartile range. Its formula is
𝑄3 − 𝑄1
𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑄𝐷) =
2
The relative measure of quartile deviation also called the coefficient of quartile deviation (CQD) is
defined as:
𝑄3 − 𝑄1
𝐶𝑄𝐷 =
𝑄3 + 𝑄1
Example 4.3: Find inter-quartile range, quartile deviation and coefficient of quartile deviation from the
following data.
[email protected]
4
Chapter four: MEASURES OF DISPERSION (VARIATION)
Solution: First arrange the data in ascending order. 15, 18, 20, 24, 27, 28, 30
𝑛 + 1 𝑡ℎ 7 + 1 𝑡ℎ
𝑄1 = 𝑠𝑖𝑧𝑒 𝑜𝑓 ( ) 𝑖𝑡𝑒𝑚 = 𝑠𝑖𝑧𝑒 𝑜𝑓 ( ) 𝑖𝑡𝑒𝑚
4 4
𝑛 + 1 𝑡ℎ 7 + 1 𝑡ℎ
𝑄3 = 𝑠𝑖𝑧𝑒 𝑜𝑓 3 ( ) 𝑖𝑡𝑒𝑚 𝑠𝑖𝑧𝑒 𝑜𝑓 3 ( ) 𝑖𝑡𝑒𝑚
4 4
𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 28 − 18 = 10
𝑄3 − 𝑄1 28 − 18
𝑄𝐷 = = =5
2 2
𝑄3 − 𝑄1 28 − 18
𝐶𝑄𝐷 = = = 0.217
𝑄3 + 𝑄1 28 + 18
Example 4.4: Find inter-quartile range, quartile deviation and coefficient of quartile deviation from the
following data
Marks 2 3 4 5 6 7 8 9
No. Of students 10 11 12 13 5 12 7 5
Solution:
Marks 2 3 4 5 6 7 8 9
No. of students 10 11 12 13 5 12 7 5
CF 10 21 33 46 51 63 70 75=N
𝑁+1 75 + 1
𝑄1 = ( )= = 19𝑡ℎ 𝑖𝑡𝑒𝑚 = 3
4 4
𝑁+1 75+1
𝑄3 = 3 ( 4
) = 3( 4
) = 57th item = 7
𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 7 − 3 = 4
[email protected]
5
Chapter four: MEASURES OF DISPERSION (VARIATION)
𝑄3 − 𝑄1 7 − 3
𝑄𝐷 = = =2
2 2
𝑄3 − 𝑄1 7 − 3
𝐶𝑄𝐷 = = = 0.4
𝑄3 + 𝑄1 7 + 3
Remark: Q.D or CQD includes only the middle 50% of the observation.
Merits of QD
Demerits of QD
➢ It is not based on all the items (it ignores 50% items, i.e., the first 25% and the last 25%).
➢ It is greatly influenced by sampling fluctuations.
➢ It is not amenable to algebraic manipulations.
The mean deviation (MD) measures the average deviation of a set of observations about their central
value, generally the mean or the median, ignoring the plus/minus sign of the deviations. In other words
the mean deviation of a set of items is defined as the arithmetic mean of the values of the absolute
deviations from a given average. Depending up on the type of averages used we have different mean
deviations.
❖ The mean deviation of a sample of n observations x1, x2, . . .,xn (individual series)is given as
∑|𝑋𝑖 − 𝐴|
𝑀𝐷 =
𝑛
Where |𝑋𝑖 − 𝐴| denotes the absolute value of the deviation. Generally, arithmetic mean and median are
used in calculating mean deviation. So, 𝐴 stands for the average used for calculating 𝑀𝐷. That is, 𝐴 =
𝑚𝑒𝑑𝑖𝑎𝑛(𝑋̃ ) 𝑜𝑟 𝐴 = 𝑚𝑒𝑎𝑛(𝑋̅).
❖ In case of discrete data arranged in FD and continuous grouped data, the formula for MD
becomes
[email protected]
6
Chapter four: MEASURES OF DISPERSION (VARIATION)
∑ 𝑓𝑖 |𝑋𝑖 −𝐴|
𝑀𝐷 = 𝑛
, where 𝑋𝑖 is the class mark of the ith class, 𝑓𝑖 is the frequency of the ith class and
n = ∑ 𝑓𝑖 .
1. The mean deviation about the arithmetic mean is, therefore, given by
∑|𝑋 −𝑋| ̅
𝑀𝐷(𝑋̅) = 𝑛𝑖 … for ungrouped data (individual series).
∑ 𝑓 |𝑋 −𝑋| ̅
𝑀𝐷 (𝑋̅) = 𝑖 𝑖 . . . for discrete data arranged in FD and a grouped continuous frequency
𝑛
distribution; where 𝑋𝑖 is the value for discrete data arranged in FD and class mark of the ith class
for continuous grouped data, 𝑓𝑖 is the frequency of the ith class and n = ∑ 𝑓𝑖 .
∑|𝑋 −x̃|
𝑀𝐷(𝑋̃) = 𝑛𝑖 … for ungrouped data (individual series).
∑ 𝑓 |𝑋 −x̃|
𝑀𝐷(𝑋̃) = 𝑖 𝑛 𝑖 . . . for discrete data arranged in FD and a grouped continuous frequency
distribution; where 𝑋𝑖 is the value for discrete data arranged in FD and class mark of the ith class
for continuous grouped data , 𝑓𝑖 is the frequency of the ith class and n = ∑ 𝑓𝑖 .
∑|𝑋𝑖 −x̂|
𝑀𝐷(x̂) = 𝑛
… for ungrouped data (individual series).
∑ 𝑓𝑖 |𝑋𝑖 −x̂|
𝑀𝐷(x̂) = . . for discrete data arranged in FD and a grouped continuous frequency
𝑛
distribution; where 𝑋𝑖 is the value for discrete data arranged in FD and class mark of the ith class
for continuous grouped data, 𝑓𝑖 is the frequency of the ith class and n = ∑ 𝑓𝑖 .
[email protected]
7
Chapter four: MEASURES OF DISPERSION (VARIATION)
|𝑋𝑖 − 𝑋̅| 2 2 1 1 1 0 1 1 2 3 14
|𝑋𝑖 − x̃| 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14
|𝑋𝑖 − 𝑋̂| 1 1 0 0 0 1 2 2 3 4 14
Since the distribution is ungrouped the mean deviation about mean, median and mode:
∑|𝑋𝑖 − 𝑋̅| 14
𝑀𝐷(𝑋̅) = = = 1.4
𝑛 10
∑|𝑋𝑖 − x̃| 14
𝑀𝐷(𝑋̃) = = = 1.4
𝑛 10
∑|𝑋𝑖 −x̂| 14
𝑀𝐷(x̂) = 𝑛
= 10 = 1.4
Merits of 𝑴𝑫
Demerit of 𝑴𝑫
[email protected]
8
Chapter four: MEASURES OF DISPERSION (VARIATION)
➢ It does not take in to account the signs of the deviations of items from the average.
Remark: Of all the mean deviations taken about different averages or any arbitrary value, the mean
deviation about the median has the smallest value.
The relative measure of mean deviation, also called the coefficient of mean deviation is obtained by
dividing mean deviation by the particular average used in computing mean deviation. Thus,
𝑀𝐷(𝑋) ̅
𝐶𝑀𝐷(𝑋̅) = 𝑋̅ where MD is the mean deviation calculated about the arithmetic mean.
➢ CMD about the median is given by:
𝑀𝐷(𝑋) ̃
𝐶𝑀𝐷(𝑋̃) = 𝑋̃ in which case MD is calculated about the median of the observations.
Example 4.6: Calculate the coefficient of mean deviation about the mean, median and mode for the data
in Example 4.5 above.
Solution:
𝑀𝐷(𝑋̅) 1.4
𝐶𝑀𝐷(𝑋̅) = = = 0.23
𝑋̅ 6
𝑀𝐷(𝑋̃) 1.4
𝐶𝑀𝐷(𝑋̃) = = = 0.25
𝑋̃ 5.5
𝑀𝐷(x̂) 1.4
𝐶𝑀𝐷(x̂) = = = 0.28
x̂ 5
Like the mean deviation, the variance is also based on all observations in a set of data. But the
variance is the average of squared deviations from the mean. Recall that the sum of squared deviations is
minimum only when taken from the mean. Squared deviations are mathematically manipulated than
absolute deviations. Thus, if we averaged the squared deviations from the mean and take the square root
of the result (to compensate for the fact that the deviations were squared), we obtain the standard
deviation. This overcomes the limitation of the mean deviation.
[email protected]
9
Chapter four: MEASURES OF DISPERSION (VARIATION)
If the values xi have frequencies fi (i=1,2,…,m), then the sample variance is given by:
1 m
S = fi ( xi − x )
2 2
∑ 𝑓𝑖 (𝑥𝑖 −𝑥̅ )2 1
𝑆2 = = [ ∑ fi x i 2 − 𝑛𝑥̅ 2 ] or
𝑛−1 𝑛−1 n − 1 i =1
ith class, fi is the frequency of the ith class and n=∑ fi.
The Standard Deviation
There is a problem with variances. Recall that the deviations were squared. That means that the units were
also squared. To get the units back the same as the original data values, the square root must be taken.
➢ Population Standard Deviation (s )
[email protected]
10
Chapter four: MEASURES OF DISPERSION (VARIATION)
Example 4.7: Find the sample variance and standard deviation of:
xi 2 4 5 6 8
fi 2 2 3 1 2
1
𝑆2 = [∑ fi xi 2 − 𝑛𝑥̅ 2 ]
𝑛−1
1 49 1
= 9 [279 − 10(10)2 ] = 9 (38.9) = 4.32, 𝑎𝑛𝑑 𝑆 = √4.32 = 2.08.
Example 4.8: Find the sample variance and standard deviation for the distribution:
Freq. 4 1 2 3
Solution: In a continuous F.D., xi is the class mark representing the ith class.
C.I xi fi 2
f i xi f i xi
[email protected]
11
Chapter four: MEASURES OF DISPERSION (VARIATION)
1-5 3 4 12 36
6-10 8 1 8 64
11-15 13 2 26 338
16.20 18 3 54 972
∑ fi x i 100
Where, n=∑ fi = 10, x̅ = = = 10, ∑ fi xi 2 = 1410, so that
𝑛 10
1 1
𝑆2 =
𝑛−1
[ ∑ fi x i 2 − 𝑛𝑥̅ 2 ] = [1410 − 10(10)2 ]
9
410
= = 45.56,
9
𝑆 = √45.56 = 6.75.
1. If a constant is added to (or subtracted from) all the values, the variance remains the same; i.e.,
Example 4.9 Consider the 6 sample values xi: 54,52,53,50,51, and 52.
2. If each and every value is multiplied by a non-zero constant (k), the standard deviation is
3. Both the variance and the standard deviation give more weight to extreme values and less to
those which are near to the mean.
Coefficient of Variation
The standard deviation is an absolute measure of dispersion. The corresponding relative measure is
known as the coefficient of variation (CV).
[email protected]
12
Chapter four: MEASURES OF DISPERSION (VARIATION)
Of course, standard deviation is an absolute measure of dispersion that expresses the variation in the same
unit as the original data but it can not be the sole basis for comparing two distributions. For instance, if
we have a standard deviation of 10 and a mean of 5, the values vary by an amount twice as large as the
mean itself. If, on the other hand, we have a standard deviation of 10 and a mean of 5000, the variation
relative to the mean is significant. Therefore, we cannot know the dispersion of a set of data until we
know the standard deviation, the mean, and how the standard deviation compares with the mean.
Coefficient of variation is used in such problems where we want to compare the variability of two or more
different series. Coefficient of variation is the ratio of the standard deviation to the arithmetic mean,
usually expressed in percent.
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
CV = 𝑚𝑒𝑎𝑛
× 100%
Mean score 85 65
Standard deviation 25 12
Compare the relative dispersions of the two departments’ scores using the appropriate way.
Solution:
Mathematics Departments Chemistry Departments
𝑆 𝑆
CV = ̅ × 100 CV = ̅ × 100
x x
25 12
= 85 × 100 = 65 × 100
[email protected]
13
Chapter four: MEASURES OF DISPERSION (VARIATION)
= 29.41% = 18.46%
Interpretation: Since the CV of Mathematics Department students is greater than that of Chemistry
Department students, we can say that there is more dispersion relative to the mean in the distribution of
Mathematics students’ scores compared with that of Chemistry students.
4.4 Standard Scores (Z-Scores)
A standard score for sample value in a data set is obtained by subtracting the mean of the data set from
the value and dividing the result by the standard deviation of the data set. Basically, the standard score (z-
score) tells us how many standard deviations a specific value is above or below the mean value of the data
set. That is, the z-score is the number of standard deviations the data value falls above (positive z-score)
or below (negative z-score) the mean for the data set.
𝑋−𝜇
𝑍 𝑠𝑐𝑜𝑟𝑒 =
𝜎
𝑋 − 𝑋̅
𝑍 𝑠𝑐𝑜𝑟𝑒 =
𝑆
Example 4.11: What is the Z-score for the value of 14 in the following sample data set?
3 8 6 14 4 12 7 10
Solution:
14−8
𝑋̅ = 8, SD = 3.8173 thus, Z =3.8173 ≈ 1.57.
The data value of 14 is located 1.57 standard deviations above the mean 8 because the z-score is
positive.
Example 4.12: Suppose that a student scored 66 in Statistics and 80 in Mathematics. The score of the
summary of the courses is given below.
Course Average score Standard deviation of the score
Statistics 51 12
[email protected]
14
Chapter four: MEASURES OF DISPERSION (VARIATION)
Mathematics 72 16
In which course did the student scored better as compared to his classmates?
Solution:
𝑋−𝜇 66−51 15
Z-score of student in Statistics: 𝑍 = 𝜎
= 12
= 12 = 1.25
𝑋−𝜇 80−72 8
Z-score of student in Mathematics: 𝑍 = 𝜎
= 16
= 16 = 0.5
From these two standard scores, we can conclude that the student has scored better in Statistics course
relative to his classmates than in Mathematics course.
The measures of central tendency and variation discussed in previous one do not reveal the entire story
about a frequency distribution. Two distributions may have the same mean and standard deviation but
may differ in their shape of the distribution. Further description of their characteristics is necessary that is
provided by measures of skewness and kurtosis.
4.5.1 Moments
Moments are statistical tools used in statistical investigation. The moments of a distribution are the
arithmetic mean of the various powers of the deviations of items from some number. In our course, we
shall use it in the study of Skewness and Kurtosis of statistical distribution.
∑ 𝑋𝑖 𝑟
𝑀𝑟 =
𝑛
Where 𝑟 = 0, 1, 2, 3, …
Moments about the origin for grouped frequency distribution and for ungrouped frequency distribution is
∑ 𝑓𝑖 𝑋𝑖 𝑟
𝑀𝑟 =
𝑛
Where 𝑓𝑖 is the frequency of 𝑋𝑖 . 𝑋𝑖 is the midpoint in the case of grouped frequency distribution or class
value in the case of ungrouped frequency distribution.
[email protected]
15
Chapter four: MEASURES OF DISPERSION (VARIATION)
∑(𝑋𝑖 − 𝑋̅)𝑟
𝑀𝑟′ =
𝑛
Moments about the mean for grouped frequency distribution and for ungrouped frequency distribution.
∑ 𝑓𝑖 (𝑋𝑖 − 𝑋̅)𝑟
𝑀𝑟′ =
𝑛
Where 𝑓𝑖 is the frequency of 𝑋𝑖 . 𝑋𝑖 is the midpoint in the case of grouped frequency distribution or class
value in the case of ungrouped frequency distribution.
∑(𝑋𝑖 − 𝐴)𝑟
𝑀𝑟′ =
𝑛
Moments about any arbitrary constant 𝐴 for grouped frequency distribution and for ungrouped frequency
distribution
∑ 𝑓𝑖 (𝑋𝑖 −𝐴)𝑟
𝑀𝑟′ = .
𝑛
Example 4.13: Find the first four moments about the mean for the following individual series
𝑋𝑖 : 3 6 8 10 18
Solution: n=5,
S.No 𝑿𝒊 ̅)
(𝑿𝒊 − 𝑿 ̅ )𝟐
(𝑿𝒊 − 𝑿 ̅ )𝟑
(𝑿𝒊 − 𝑿 ̅ )𝟒
(𝑿𝒊 − 𝑿
1 3 -6 36 -216 1296
[email protected]
16
Chapter four: MEASURES OF DISPERSION (VARIATION)
2 6 -3 9 -27 81
3 8 -1 1 -1 1
4 10 1 1 1 1
5 18 9 81 729 6561
Thus,
4.5.2 Skewness
[email protected]
17
Chapter four: MEASURES OF DISPERSION (VARIATION)
Note that: In moderately skewed distributions the averages have the following
relationship.
A measure of skewness gives a numerical expression for and the direction of asymmetry in a distribution.
It gives information about the shape of the distribution and the degree of variation on either side of the
central value. The three most commonly used measures of skewness are Pearson’s coefficient of
skewness, Bowley’s coefficient of skewness and coefficient of skewness based on moments.
[email protected]
18
Chapter four: MEASURES OF DISPERSION (VARIATION)
𝑀𝑒𝑎𝑛−𝑀𝑜𝑑𝑒
𝛼3 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑀′3 𝑀′3
𝛼3 = 3/2 =
𝑀′2 𝜎3
𝛼3 > 0,➔ the distribution is positively skewed/skewed to the right, i.e mode < median <mean
➔smaller observations are more frequent than larger observations. i.e., the majority of
α3 < 0,➔ the distribution is negatively skewed/skewed to the left. i.e., mean < median < mode
➔smaller observations are less frequent than larger observations. i.e., the majority of
4.5.3 Kurtosis
[email protected]
19
Chapter four: MEASURES OF DISPERSION (VARIATION)
𝑀′4 5.8
b/ 𝛼4 = 𝑀′22
= 1.62 = 2.26 < 3, ➔the curve is platykurtic.
Example 4.14: Find the coefficient of skewness and the coefficient of kurtosis for the above
example 4.13.
Solution:
𝑀′3 97.2 97.2
i) 𝛼3 = 3/2
𝑀′2
= 3 =
129.527
= 0.75
(25.6)2
[email protected]
20
Chapter four: MEASURES OF DISPERSION (VARIATION)