0% found this document useful (0 votes)
574 views21 pages

Chapter Four: Measures of Dispersion (Variation) : Abebuabebaw

This chapter discusses measures of dispersion, which quantify how spread out or varied the values in a data set are. There are absolute measures of dispersion, such as range, which are expressed in the same units as the original data, and relative measures, such as coefficient of variation, which are ratios that allow comparison across data sets with different units. Range is defined as the difference between the largest and smallest values, while quartile deviation measures the spread of the middle 50% of values using the interquartile range. Measures of dispersion are important for judging the reliability of central tendency measures, controlling variability, and comparing groups.

Uploaded by

Yohannis Reta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
574 views21 pages

Chapter Four: Measures of Dispersion (Variation) : Abebuabebaw

This chapter discusses measures of dispersion, which quantify how spread out or varied the values in a data set are. There are absolute measures of dispersion, such as range, which are expressed in the same units as the original data, and relative measures, such as coefficient of variation, which are ratios that allow comparison across data sets with different units. Range is defined as the difference between the largest and smallest values, while quartile deviation measures the spread of the middle 50% of values using the interquartile range. Measures of dispersion are important for judging the reliability of central tendency measures, controlling variability, and comparing groups.

Uploaded by

Yohannis Reta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Chapter four: MEASURES OF DISPERSION (VARIATION)

CHAPTER FOUR:

MEASURES OF DISPERSION (VARIATION)

4.1 Introduction

Just as central tendency can be measured by a number in the form of an average, the amount of variation
(dispersion, spread, or scatter) among the values in the data set can also be measured. The measures of
central tendency describe that the major part of values in the data set appears to concentrate around a
central value called average with the remaining values scattered (distributed) on either sides of that value.
But these measures do not reveal how these values are dispersed (spread or scatter) on each side of the
central value. The dispersion of values is indicated by the extent to which these values tend to spread over
an interval rather than cluster closely around an average.
The term dispersion is generally used in two senses. Firstly, dispersion refers to the variations of the items
among themselves. If the value of all the items of a series is the same, there will be no variation among
different items of a series. Secondly, dispersion refers to the variation of the items around an average. If
the difference between the value of items and the average is large, the dispersion will be high and on the
other hand if the difference between the value of the items and averaging is small, the dispersion will be
low. Thus, dispersion is defined as scatteredness or spreadness of the individual items in a given series.

After studying this chapter, you should be able to:

✓ Explain the meaning of measures of dispersion

✓ Compare two or more sets of data using relative measures of dispersion.


✓ Apply the Z-score to find out the relative standing of values.
✓ Explain measures of skewness and kurtosis.
Objectives of measuring Variation:
✓ To judge the reliability of measures of central tendency
✓ To control variability itself.
✓ To compare two or more groups of numbers in terms of their variability.
✓ To make further statistical analysis.
4.2 Absolute and Relative Measures of Dispersion

Absolute measures of dispersion: Absolute measure is expressed in the same statistical


unit in which the original data are given such as kilograms, tones etc. These measures are
suitable for comparing the variability in two distributions having variables expressed in
the same units and of the same averaging size. These measures are not suitable for

[email protected]
1
Chapter four: MEASURES OF DISPERSION (VARIATION)

comparing the variability in two distributions having variables expressed in different


units.

Relative measures of dispersion: A relative measure of dispersion is the ratio of a measure of absolute
dispersion to an appropriate average or the selected items of the data.

Relative measure of
dispersion

Based on Based on all


selected items
items

Coefficient of Coefficient of mean


range and deviation &coefficient of
coefficient of standard deviation or
quartile coefficient of variation
deviation

4.3 Types of Measures of Variation

4.3.1 The Range and Relative Range

[email protected]
2
Chapter four: MEASURES OF DISPERSION (VARIATION)

Range is the simplest measures of dispersion. It is defined as the difference between the largest and
smallest value in a given set of data. Its formula is:

𝑅 =𝐿−𝑆

Where R=Range, L= Largest value in a given set of data, S= smallest value in a given set of data.

For a continuous grouped distribution, the range may be obtained as:

• The difference between upper class limit of the last class and the lower class limit of the first
class, or

• The difference between the largest class mark and the smallest class mark, or

• The difference between the upper class boundary of the last class and the lower class boundary
of the first class.

The range is used in describing like the maximum change in daily temperature, rainfall, etc. When the
sample size is small, it can be an adequate measure of variation. It is commonly used in quality control.

The relative measures of range, also called coefficient of range, is defined as

L−S
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑅𝑎𝑛𝑔𝑒(𝑅𝑅) =
L+S

Example 4.1: Five students obtained the following marks in statistics: 20, 35, 25, 30, 15. Find the range
and relative range

Solution: Here, 𝐿 = 35, 𝑎𝑛𝑑 𝑆 = 15

𝑅𝑎𝑛𝑔𝑒 = 𝐿 − 𝑆 = 35 − 15 = 20

L−S 35 − 15
𝑅𝑅 = = = 0.4
L+S 35 + 15

Example 4.2: Find out range and relative range of the following given data.

Size 5-10 11-15 16-20 21-25 26-30


Frequency 4 9 15 30 40

Solution: Here,

L = Upper class limit of the largest class = 30

L = lower class limit of the smallest class = 5

[email protected]
3
Chapter four: MEASURES OF DISPERSION (VARIATION)

30 − 5
Range = 30 – 5 = 25, 𝑅𝑅 = = 0.7143 .
30 + 5

Merits of the Range

➢ It is well-defined, easy to compute and simple to understand.


➢ It helps in giving an idea about the variation, just by giving the lowest value and the greatest
value of variable.

Demerits of the Range

➢ It is not based on all observations of the series.


➢ It can’t be calculated in case of open-ended distribution.
➢ It is affected by sampling fluctuation.
➢ It is affected by extreme values in the series.

4.3.2 The Quartile Deviation and Coefficient of Quartile Deviation

Inter-quartile range and quartile deviation are other measures of dispersion. The difference between the
upper quartile (𝑄3 ) and lower quartile (𝑄1 ) is called inter-quartile range. Symbolically,

𝑰𝑛𝑡𝑒𝑟 𝑸𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑹𝑎𝑛𝑔𝑒 (𝐼𝑄𝐷) = 𝑄3 − 𝑄1

The inter-quartile ranges covers dispersion of middle 50% of the items of the series. Quartile deviation,
also called semi-inter-quartile range, is half of the difference between the upper and lower quartile. That
is, half of the inter-quartile range. Its formula is

𝑄3 − 𝑄1
𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑄𝐷) =
2

The relative measure of quartile deviation also called the coefficient of quartile deviation (CQD) is
defined as:

𝑄3 − 𝑄1
𝐶𝑄𝐷 =
𝑄3 + 𝑄1

Example 4.3: Find inter-quartile range, quartile deviation and coefficient of quartile deviation from the
following data.

28, 18, 20, 24, 27, 30, 15

[email protected]
4
Chapter four: MEASURES OF DISPERSION (VARIATION)

Solution: First arrange the data in ascending order. 15, 18, 20, 24, 27, 28, 30

𝑛 + 1 𝑡ℎ 7 + 1 𝑡ℎ
𝑄1 = 𝑠𝑖𝑧𝑒 𝑜𝑓 ( ) 𝑖𝑡𝑒𝑚 = 𝑠𝑖𝑧𝑒 𝑜𝑓 ( ) 𝑖𝑡𝑒𝑚
4 4

= 𝑠𝑖𝑧𝑒 𝑜𝑓 2𝑛𝑑 𝑖𝑡𝑒𝑚 = 18 𝑚𝑎𝑟𝑘𝑠

𝑛 + 1 𝑡ℎ 7 + 1 𝑡ℎ
𝑄3 = 𝑠𝑖𝑧𝑒 𝑜𝑓 3 ( ) 𝑖𝑡𝑒𝑚 𝑠𝑖𝑧𝑒 𝑜𝑓 3 ( ) 𝑖𝑡𝑒𝑚
4 4

= 𝑠𝑖𝑧𝑒 𝑜𝑓 6𝑡ℎ 𝑖𝑡𝑒𝑚 = 28 𝑚𝑎𝑟𝑘𝑠

𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 28 − 18 = 10

𝑄3 − 𝑄1 28 − 18
𝑄𝐷 = = =5
2 2

𝑄3 − 𝑄1 28 − 18
𝐶𝑄𝐷 = = = 0.217
𝑄3 + 𝑄1 28 + 18

Example 4.4: Find inter-quartile range, quartile deviation and coefficient of quartile deviation from the
following data

Marks 2 3 4 5 6 7 8 9
No. Of students 10 11 12 13 5 12 7 5

Solution:

Marks 2 3 4 5 6 7 8 9

No. of students 10 11 12 13 5 12 7 5

CF 10 21 33 46 51 63 70 75=N

𝑁+1 75 + 1
𝑄1 = ( )= = 19𝑡ℎ 𝑖𝑡𝑒𝑚 = 3
4 4

𝑁+1 75+1
𝑄3 = 3 ( 4
) = 3( 4
) = 57th item = 7

𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 7 − 3 = 4

[email protected]
5
Chapter four: MEASURES OF DISPERSION (VARIATION)

𝑄3 − 𝑄1 7 − 3
𝑄𝐷 = = =2
2 2

𝑄3 − 𝑄1 7 − 3
𝐶𝑄𝐷 = = = 0.4
𝑄3 + 𝑄1 7 + 3

Remark: Q.D or CQD includes only the middle 50% of the observation.

Merits of QD

➢ It is well-defined, easy to compute and simple to understand.


➢ It helps in studying the middle 50% item in the series.
➢ It is not affected by the extreme items.
➢ It is useful in measuring variations in the case of open-ended distributions.

Demerits of QD

➢ It is not based on all the items (it ignores 50% items, i.e., the first 25% and the last 25%).
➢ It is greatly influenced by sampling fluctuations.
➢ It is not amenable to algebraic manipulations.

4.3.3 The Mean Deviation and Coefficient of Mean Deviation

The mean deviation (MD) measures the average deviation of a set of observations about their central
value, generally the mean or the median, ignoring the plus/minus sign of the deviations. In other words
the mean deviation of a set of items is defined as the arithmetic mean of the values of the absolute
deviations from a given average. Depending up on the type of averages used we have different mean
deviations.
❖ The mean deviation of a sample of n observations x1, x2, . . .,xn (individual series)is given as
∑|𝑋𝑖 − 𝐴|
𝑀𝐷 =
𝑛

Where |𝑋𝑖 − 𝐴| denotes the absolute value of the deviation. Generally, arithmetic mean and median are
used in calculating mean deviation. So, 𝐴 stands for the average used for calculating 𝑀𝐷. That is, 𝐴 =
𝑚𝑒𝑑𝑖𝑎𝑛(𝑋̃ ) 𝑜𝑟 𝐴 = 𝑚𝑒𝑎𝑛(𝑋̅).

❖ In case of discrete data arranged in FD and continuous grouped data, the formula for MD
becomes

[email protected]
6
Chapter four: MEASURES OF DISPERSION (VARIATION)

∑ 𝑓𝑖 |𝑋𝑖 −𝐴|
𝑀𝐷 = 𝑛
, where 𝑋𝑖 is the class mark of the ith class, 𝑓𝑖 is the frequency of the ith class and

n = ∑ 𝑓𝑖 .
1. The mean deviation about the arithmetic mean is, therefore, given by

∑|𝑋 −𝑋| ̅
𝑀𝐷(𝑋̅) = 𝑛𝑖 … for ungrouped data (individual series).
∑ 𝑓 |𝑋 −𝑋| ̅
𝑀𝐷 (𝑋̅) = 𝑖 𝑖 . . . for discrete data arranged in FD and a grouped continuous frequency
𝑛

distribution; where 𝑋𝑖 is the value for discrete data arranged in FD and class mark of the ith class
for continuous grouped data, 𝑓𝑖 is the frequency of the ith class and n = ∑ 𝑓𝑖 .

Steps to calculate M.D for (𝑋̅)


▪ Find the arithmetic mean, 𝑋̅
▪ Find the deviations of each reading from 𝑋̅
▪ Find the arithmetic mean of the deviations, ignoring sign.
2. The mean deviation about the median is also given by

∑|𝑋 −x̃|
𝑀𝐷(𝑋̃) = 𝑛𝑖 … for ungrouped data (individual series).
∑ 𝑓 |𝑋 −x̃|
𝑀𝐷(𝑋̃) = 𝑖 𝑛 𝑖 . . . for discrete data arranged in FD and a grouped continuous frequency

distribution; where 𝑋𝑖 is the value for discrete data arranged in FD and class mark of the ith class
for continuous grouped data , 𝑓𝑖 is the frequency of the ith class and n = ∑ 𝑓𝑖 .

Steps to calculate M.D (𝑋̃ )


▪ Find the median, 𝑋̃
▪ Find the deviations of each reading from 𝑋̃
▪ Find the arithmetic mean of the deviations, ignoring sign.

3. The mean deviation about the mode is also given by

∑|𝑋𝑖 −x̂|
𝑀𝐷(x̂) = 𝑛
… for ungrouped data (individual series).
∑ 𝑓𝑖 |𝑋𝑖 −x̂|
𝑀𝐷(x̂) = . . for discrete data arranged in FD and a grouped continuous frequency
𝑛

distribution; where 𝑋𝑖 is the value for discrete data arranged in FD and class mark of the ith class
for continuous grouped data, 𝑓𝑖 is the frequency of the ith class and n = ∑ 𝑓𝑖 .

Steps to calculate M.D (x̂)

[email protected]
7
Chapter four: MEASURES OF DISPERSION (VARIATION)

▪ Find the mode, x̂


▪ Find the deviations of each reading from x̂
▪ Find the arithmetic mean of the deviations, ignoring sign.
Example 4.5
The following are the number of visit made by ten mothers to the local doctor’s surgery. 8, 6, 5, 5, 7, 4, 5,
9, 7, 4. Find mean deviation about mean, median and mode.
Solution:
First calculate the three averages
𝑋̅ = 6, 𝑋̃ = 5.5, x̂ = 5
Then take the deviations of each observation from these averages.
xi 4 4 5 5 5 6 7 7 8 9 Total

|𝑋𝑖 − 𝑋̅| 2 2 1 1 1 0 1 1 2 3 14

|𝑋𝑖 − x̃| 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14

|𝑋𝑖 − 𝑋̂| 1 1 0 0 0 1 2 2 3 4 14

Since the distribution is ungrouped the mean deviation about mean, median and mode:

∑|𝑋𝑖 − 𝑋̅| 14
𝑀𝐷(𝑋̅) = = = 1.4
𝑛 10
∑|𝑋𝑖 − x̃| 14
𝑀𝐷(𝑋̃) = = = 1.4
𝑛 10
∑|𝑋𝑖 −x̂| 14
𝑀𝐷(x̂) = 𝑛
= 10 = 1.4

Merits of 𝑴𝑫

➢ It is well-defined, easy to compute and simple to understand.


➢ It is based on all observations.
➢ It is not greatly affected by the extreme items.
➢ It can be calculated by using any average.

Demerit of 𝑴𝑫

[email protected]
8
Chapter four: MEASURES OF DISPERSION (VARIATION)

➢ It does not take in to account the signs of the deviations of items from the average.

Remark: Of all the mean deviations taken about different averages or any arbitrary value, the mean
deviation about the median has the smallest value.

Coefficient of mean deviation (CMD):

The relative measure of mean deviation, also called the coefficient of mean deviation is obtained by
dividing mean deviation by the particular average used in computing mean deviation. Thus,

➢ CMD about the arithmetic mean is given by:

𝑀𝐷(𝑋) ̅
𝐶𝑀𝐷(𝑋̅) = 𝑋̅ where MD is the mean deviation calculated about the arithmetic mean.
➢ CMD about the median is given by:

𝑀𝐷(𝑋) ̃
𝐶𝑀𝐷(𝑋̃) = 𝑋̃ in which case MD is calculated about the median of the observations.

➢ CMD about the mode is given by:


𝑀𝐷(x̂)
𝐶𝑀𝐷(x̂) = x̂
in which case MD is calculated about the mode of the observations.

Example 4.6: Calculate the coefficient of mean deviation about the mean, median and mode for the data
in Example 4.5 above.
Solution:
𝑀𝐷(𝑋̅) 1.4
𝐶𝑀𝐷(𝑋̅) = = = 0.23
𝑋̅ 6
𝑀𝐷(𝑋̃) 1.4
𝐶𝑀𝐷(𝑋̃) = = = 0.25
𝑋̃ 5.5
𝑀𝐷(x̂) 1.4
𝐶𝑀𝐷(x̂) = = = 0.28
x̂ 5

4.3.4 The Variance, Standard Deviation and Coefficient of Variation

Variance and Standard Deviation

Like the mean deviation, the variance is also based on all observations in a set of data. But the
variance is the average of squared deviations from the mean. Recall that the sum of squared deviations is
minimum only when taken from the mean. Squared deviations are mathematically manipulated than
absolute deviations. Thus, if we averaged the squared deviations from the mean and take the square root
of the result (to compensate for the fact that the deviations were squared), we obtain the standard
deviation. This overcomes the limitation of the mean deviation.

[email protected]
9
Chapter four: MEASURES OF DISPERSION (VARIATION)

Population Variance (𝝈𝟐 )


If we divide the variation by the number of values in the population, we get something called the
population variance. This variance is the "average squared deviation from the mean".
• For ungrouped data (individual series )
∑𝑵
𝒊=𝟏(𝑿𝒊 −𝝁)
𝟐 𝟏 2
𝝈𝟐 = 𝑵
= 𝑵 [∑N 𝟐
i=1 X i − 𝑵𝝁 ] where 𝝁 is the population arithmetic mean and N is the

total number of observations in the population.

• For discrete data arranged in FD & for continuous grouped data


∑ 𝒇𝒊 (𝑿𝒊 −𝝁)𝟐 𝟏
𝝈𝟐 = 𝑵
= 𝑵 [∑ fi Xi 2 − 𝑵𝝁𝟐 ] where 𝝁 is the population arithmetic mean, 𝑿𝒊 is the class mark of

the ith class, fi is the frequency of the ithclass and N=∑ fi


Sample Variance (𝑺𝟐 )
One would expect the sample variance to simply be the population variance with the population mean
replaced by the sample mean. However, one of the major uses of statistics is to estimate the
corresponding parameter. This formula has the problem that the estimated value isn't the same as the
parameter. To offset this, the sum of the squares of the deviations is divided by one less than the sample
size.
• For ungrouped data
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 1
𝑆2 = 𝑛−1
= 𝑛−1 [∑ni=1 xi 2 − 𝑛𝑥̅ 2 ] where 𝒙
̅ is the sample arithmetic mean and n is the total

number of observations in the sample.

• For discrete data arranged in FD

If the values xi have frequencies fi (i=1,2,…,m), then the sample variance is given by:

1 m
S =  fi ( xi − x )
2 2
∑ 𝑓𝑖 (𝑥𝑖 −𝑥̅ )2 1
𝑆2 = = [ ∑ fi x i 2 − 𝑛𝑥̅ 2 ] or
𝑛−1 𝑛−1 n − 1 i =1

• For continuous grouped data


∑ 𝑓𝑖 (𝑥𝑖 −𝑥̅ )2 1
𝑆2 = 𝑛−1
= 𝑛−1 [∑ fi xi 2 − 𝑛𝑥̅ 2 ] where 𝒙
̅ is the sample arithmetic mean, 𝒙𝒊 is the class mark of the

ith class, fi is the frequency of the ith class and n=∑ fi.
The Standard Deviation
There is a problem with variances. Recall that the deviations were squared. That means that the units were
also squared. To get the units back the same as the original data values, the square root must be taken.
➢ Population Standard Deviation (s )

[email protected]
10
Chapter four: MEASURES OF DISPERSION (VARIATION)

𝜎 = √𝝈𝟐 where 𝜎 2 is the population variance.


➢ Sample Standard Deviation ( S )

𝑆 = √𝑆 2 where 𝑆 2 is the sample variance.

Example 4.7: Find the sample variance and standard deviation of:

xi 2 4 5 6 8

fi 2 2 3 1 2

Solution: Prepare the following table:

xi fi fixi xi2 fixi2


2 2 4 4 8
4 2 8 16 32
5 3 15 25 75
6 1 6 36 36
8 2 16 64 128
Sum 10 49 279

Thus, n=∑ fi = 10, ∑ fi xi = 49, ∑ fi xi 2 = 279.

1
𝑆2 = [∑ fi xi 2 − 𝑛𝑥̅ 2 ]
𝑛−1

1 49 1
= 9 [279 − 10(10)2 ] = 9 (38.9) = 4.32, 𝑎𝑛𝑑 𝑆 = √4.32 = 2.08.

Example 4.8: Find the sample variance and standard deviation for the distribution:

C.I 1-5 6-10 11-15 16-20

Freq. 4 1 2 3

Solution: In a continuous F.D., xi is the class mark representing the ith class.

C.I xi fi 2
f i xi f i xi

[email protected]
11
Chapter four: MEASURES OF DISPERSION (VARIATION)

1-5 3 4 12 36

6-10 8 1 8 64

11-15 13 2 26 338

16.20 18 3 54 972

Total 10 100 1410

∑ fi x i 100
Where, n=∑ fi = 10, x̅ = = = 10, ∑ fi xi 2 = 1410, so that
𝑛 10

1 1
𝑆2 =
𝑛−1
[ ∑ fi x i 2 − 𝑛𝑥̅ 2 ] = [1410 − 10(10)2 ]
9

410
= = 45.56,
9

𝑆 = √45.56 = 6.75.

Properties of Variance & Standard Deviation

1. If a constant is added to (or subtracted from) all the values, the variance remains the same; i.e.,

for any constant k, V ( xi  k ) = V ( xi ) .

Example 4.9 Consider the 6 sample values xi: 54,52,53,50,51, and 52.

The sample variance is 2 = V ( xi ) . Now, subtract 50 from each value to get:

yi : 4, 2, 3, 0, 1, 2; and, the variance of this new series is 2. i.e., V (x) = V ( y ) = 2 .

2. If each and every value is multiplied by a non-zero constant (k), the standard deviation is

multiplied by |𝑘| and the variance is multiplied by k2; i.e., V (kxi ) = k V ( xi ) .


2

3. Both the variance and the standard deviation give more weight to extreme values and less to
those which are near to the mean.

Coefficient of Variation
The standard deviation is an absolute measure of dispersion. The corresponding relative measure is
known as the coefficient of variation (CV).

[email protected]
12
Chapter four: MEASURES OF DISPERSION (VARIATION)

Of course, standard deviation is an absolute measure of dispersion that expresses the variation in the same
unit as the original data but it can not be the sole basis for comparing two distributions. For instance, if
we have a standard deviation of 10 and a mean of 5, the values vary by an amount twice as large as the
mean itself. If, on the other hand, we have a standard deviation of 10 and a mean of 5000, the variation
relative to the mean is significant. Therefore, we cannot know the dispersion of a set of data until we
know the standard deviation, the mean, and how the standard deviation compares with the mean.
Coefficient of variation is used in such problems where we want to compare the variability of two or more
different series. Coefficient of variation is the ratio of the standard deviation to the arithmetic mean,
usually expressed in percent.
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
CV = 𝑚𝑒𝑎𝑛
× 100%

For population data:


𝜎
CV = 𝜇 × 100

Where 𝜎 is the population standard deviation and 𝜇 is population mean.


For sample data:
𝑆
CV = ̅ × 100
x

Where 𝑆 is the sample standard deviation and x̅ is sample mean.


Remark: A distribution having less coefficient of variation is said to be less variable or more consistent
or more uniform or more homogeneous.
Example 4.10: Last semester, the students of Mathematics and Chemistry Departments took Introduction
to Statistics course. At the end of the semester, the following information was recorded.

Department Mathematics Chemistry

Mean score 85 65

Standard deviation 25 12

Compare the relative dispersions of the two departments’ scores using the appropriate way.
Solution:
Mathematics Departments Chemistry Departments
𝑆 𝑆
CV = ̅ × 100 CV = ̅ × 100
x x
25 12
= 85 × 100 = 65 × 100

[email protected]
13
Chapter four: MEASURES OF DISPERSION (VARIATION)

= 29.41% = 18.46%
Interpretation: Since the CV of Mathematics Department students is greater than that of Chemistry
Department students, we can say that there is more dispersion relative to the mean in the distribution of
Mathematics students’ scores compared with that of Chemistry students.
4.4 Standard Scores (Z-Scores)

A standard score for sample value in a data set is obtained by subtracting the mean of the data set from
the value and dividing the result by the standard deviation of the data set. Basically, the standard score (z-
score) tells us how many standard deviations a specific value is above or below the mean value of the data
set. That is, the z-score is the number of standard deviations the data value falls above (positive z-score)
or below (negative z-score) the mean for the data set.

Z-score computed from the population

𝑋−𝜇
𝑍 𝑠𝑐𝑜𝑟𝑒 =
𝜎

Z-score computed from the sample

𝑋 − 𝑋̅
𝑍 𝑠𝑐𝑜𝑟𝑒 =
𝑆

Example 4.11: What is the Z-score for the value of 14 in the following sample data set?

3 8 6 14 4 12 7 10

Solution:

14−8
𝑋̅ = 8, SD = 3.8173 thus, Z =3.8173 ≈ 1.57.

 The data value of 14 is located 1.57 standard deviations above the mean 8 because the z-score is
positive.

Example 4.12: Suppose that a student scored 66 in Statistics and 80 in Mathematics. The score of the
summary of the courses is given below.
Course Average score Standard deviation of the score

Statistics 51 12

[email protected]
14
Chapter four: MEASURES OF DISPERSION (VARIATION)

Mathematics 72 16

In which course did the student scored better as compared to his classmates?
Solution:
𝑋−𝜇 66−51 15
Z-score of student in Statistics: 𝑍 = 𝜎
= 12
= 12 = 1.25

𝑋−𝜇 80−72 8
Z-score of student in Mathematics: 𝑍 = 𝜎
= 16
= 16 = 0.5

From these two standard scores, we can conclude that the student has scored better in Statistics course
relative to his classmates than in Mathematics course.

4.5 Moments, Skewness and Kurtosis

The measures of central tendency and variation discussed in previous one do not reveal the entire story
about a frequency distribution. Two distributions may have the same mean and standard deviation but
may differ in their shape of the distribution. Further description of their characteristics is necessary that is
provided by measures of skewness and kurtosis.

4.5.1 Moments

Moments are statistical tools used in statistical investigation. The moments of a distribution are the
arithmetic mean of the various powers of the deviations of items from some number. In our course, we
shall use it in the study of Skewness and Kurtosis of statistical distribution.

Moments about the origin

∑ 𝑋𝑖 𝑟
𝑀𝑟 =
𝑛

Where 𝑟 = 0, 1, 2, 3, …

Moments about the origin for grouped frequency distribution and for ungrouped frequency distribution is

∑ 𝑓𝑖 𝑋𝑖 𝑟
𝑀𝑟 =
𝑛

Where 𝑓𝑖 is the frequency of 𝑋𝑖 . 𝑋𝑖 is the midpoint in the case of grouped frequency distribution or class
value in the case of ungrouped frequency distribution.

Note that: 𝑀1 = 𝑋̅, 𝑀0 = 1

[email protected]
15
Chapter four: MEASURES OF DISPERSION (VARIATION)

Moments about the Mean (Central Moments)

∑(𝑋𝑖 − 𝑋̅)𝑟
𝑀𝑟′ =
𝑛

Moments about the mean for grouped frequency distribution and for ungrouped frequency distribution.

∑ 𝑓𝑖 (𝑋𝑖 − 𝑋̅)𝑟
𝑀𝑟′ =
𝑛

Where 𝑓𝑖 is the frequency of 𝑋𝑖 . 𝑋𝑖 is the midpoint in the case of grouped frequency distribution or class
value in the case of ungrouped frequency distribution.

Note that: 𝑀2′ = 𝑆𝐷2 if it is assumed 𝑛 = 𝑛 − 1.

Moments about any arbitrary constant 𝑨

∑(𝑋𝑖 − 𝐴)𝑟
𝑀𝑟′ =
𝑛

Moments about any arbitrary constant 𝐴 for grouped frequency distribution and for ungrouped frequency
distribution

∑ 𝑓𝑖 (𝑋𝑖 −𝐴)𝑟
𝑀𝑟′ = .
𝑛

Example 4.13: Find the first four moments about the mean for the following individual series

𝑋𝑖 : 3 6 8 10 18

Solution: n=5,

S.No 𝑿𝒊 ̅)
(𝑿𝒊 − 𝑿 ̅ )𝟐
(𝑿𝒊 − 𝑿 ̅ )𝟑
(𝑿𝒊 − 𝑿 ̅ )𝟒
(𝑿𝒊 − 𝑿

1 3 -6 36 -216 1296

[email protected]
16
Chapter four: MEASURES OF DISPERSION (VARIATION)

2 6 -3 9 -27 81

3 8 -1 1 -1 1

4 10 1 1 1 1

5 18 9 81 729 6561

Total ∑ 𝑋 = 45 ∑(𝑋 − 𝑋̅) = 0 ∑(𝑋 − 𝑋̅)2 ∑(𝑋 − 𝑋̅)3 ∑(𝑋 − 𝑋̅)4

= 128 = 486 = 7940

Thus,

45 ∑(𝑋𝑖 −9)1 ∑(𝑋𝑖 −9) 2128 ∑(𝑋𝑖 −9) 4863


𝑋̅ = 5 = 9, 𝑀1′ = 5
= 0, 𝑀2′ = 5
= 5 = 25.6, 𝑀3′ = 5
= 5 = 97.2

∑(𝑋𝑖 − 9)4 7940


𝑀4′ = = = 1588
5 5

4.5.2 Skewness

Skewness refers to lack of symmetry (or departure from symmetry) in a distribution.

➢ A skewed frequency distribution is one that is not symmetrical.


➢ Skewness is concerned with the shape of the curve not size.
A distribution is said to be symmetrical when the value is uniformly distributed around the mean
(distribution of the data below the mean and above the mean are equal). In a symmetrical distribution, the
mean, median and mode coincide (i.e., mean = median = mode).
Positively skewed distribution: if the value of mean is greater than the mode, skewness is said to be
positive. In a positively skewed distribution mean is greater than the mode and the median lies
somewhere in between mean and mode. A positively skewed distribution contains some values that are
much larger than the majority of other observations.
Negatively Skewed distribution: if the value of mode is greater than the mean, skewness is said to be
negative. In a negatively skewed distribution mode is greater than the mean and the median lies in
between mean and mode. The mean is pulled towards the low-valued item (that is, to the left). A
negatively skewed distribution contains some values that are much smaller than the majority of
observations.

[email protected]
17
Chapter four: MEASURES OF DISPERSION (VARIATION)

Note that: In moderately skewed distributions the averages have the following
relationship.

(Mean – mode) = 3(mean - median)

How to check the presence of skewness in a distribution?

Skewness present in the data if:

i) the graph is not symmetrical.


ii) the mean, median and mode do not coincide.
iii) the sum of positive and negative deviations from the median is not zero.
iv) the frequencies are not similarly distributed on either side of the mode.

Measures of skewness (𝜶𝟑 )

A measure of skewness gives a numerical expression for and the direction of asymmetry in a distribution.
It gives information about the shape of the distribution and the degree of variation on either side of the
central value. The three most commonly used measures of skewness are Pearson’s coefficient of
skewness, Bowley’s coefficient of skewness and coefficient of skewness based on moments.

1. Pearson’s coefficient skewness (Pearsonian coefficient of skewness)


The skewness of the distribution can be measured by Pearson’s Coefficient of Skewness (𝜶𝟑 ),
for which the formula is given below:

[email protected]
18
Chapter four: MEASURES OF DISPERSION (VARIATION)

𝑀𝑒𝑎𝑛−𝑀𝑜𝑑𝑒
𝛼3 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

2. Bowley’s Coefficient of Skewness


Bowley’s coefficient of skewness is based on quartiles. The formula for calculating coefficient of
skewness is:
(𝑄3 −𝑄2 )−(𝑄2 − 𝑄1 ) 𝑄3 +𝑄1 − 2𝑄2
𝛼3 = 𝑄3 −𝑄1
= 𝑄3 −𝑄1

3. Moment Coefficient of Skewness


Moment coefficient of skewness is based on moments. The formula for calculating coefficient of
skewness is:

𝑀′3 𝑀′3
𝛼3 = 3/2 =
𝑀′2 𝜎3

Where, M'r = ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )𝑟 /𝑛

The shape of the curve is determined by the value of 𝛼3

𝛼3 > 0,➔ the distribution is positively skewed/skewed to the right, i.e mode < median <mean

➔smaller observations are more frequent than larger observations. i.e., the majority of

the observations have a value below an average.

α3 = 0,➔ the distribution is symmetric, i.e. mean = mode = median

α3 < 0,➔ the distribution is negatively skewed/skewed to the left. i.e., mean < median < mode

➔smaller observations are less frequent than larger observations. i.e., the majority of

the observations have a value above an average.

4.5.3 Kurtosis

Kurtosis is a measure of peakedness of a distribution. The degree of kurtosis of a distribution is


measured relative to the peakedness of a normal curve. If a curve is more peaked than the
normal curve it is called ‘leptokurtic’; if it is more or flate-topped than the normal curve it is
called ‘platykurtic’ or flat-topped. The normal curve itself is known as ‘mesokurtic’.

[email protected]
19
Chapter four: MEASURES OF DISPERSION (VARIATION)

Measures of Kurtosis (𝜶𝟒 )

The moment coefficient of kurtosis:


𝑀′4 𝑀′4
α4 = 𝑀′22
= 𝜎4

The peakedness depends on the value of 𝛼4


• 𝛼4 > 3 ➔ the curve is leptokurtic,
• 𝛼4 = 3 ➔ the curve is mesokurtic,
• 𝛼4 < 3 ➔ the curve is platykurtic.

Example: Based on the following data:


𝑀′0 = 1, 𝑀′1 = -0.6, 𝑀′2 = 1.6, 𝑀′3 = -2.4, 𝑀′4 = 5.8
a/ Find the coefficient of skewness and discuss the distribution type.
b/ Find the coefficient of kurtosis and discuss the distribution type.
Solution:
𝑀′3 −2.4
a/ 𝛼3 = 3/2 = 1.63/2 = -1.19 < 0, ➔the distribution is negatively skewed.
𝑀′2

𝑀′4 5.8
b/ 𝛼4 = 𝑀′22
= 1.62 = 2.26 < 3, ➔the curve is platykurtic.

Example 4.14: Find the coefficient of skewness and the coefficient of kurtosis for the above
example 4.13.
Solution:
𝑀′3 97.2 97.2
i) 𝛼3 = 3/2
𝑀′2
= 3 =
129.527
= 0.75
(25.6)2

➔the distribution is positively skewed.


𝑀′4 1588
ii) 𝛼4 =
𝑀′2
=
25.62
= 2.42
2

➔the curve is platykurtic.

[email protected]
20
Chapter four: MEASURES OF DISPERSION (VARIATION)

[email protected]
21

You might also like