Dispersion
Dispersion
Central tendency is one character of a distribution. Measures of central tendency give the idea of
central value or location of the distribution. But the central tendency is not the only character of a
distribution. Two distributions may be different despite of their same central value. As for example, the
data set comprised of the values 0, 10 and 20 has 10 as its mean and median. Again the mean and median
of the series 5, 10, 15 is also 10. But the deviation of these values from their mean is not same. The deviation
of observations from their mean is called dispersion. The measure of dispersion or variation is the measure
of the extent of variation or deviation of individual values from the central value. This measure of variation
gives a precise idea as to the extent of representativeness of the central value.
Range :
Range is the absolute difference between the highest and lowest observations of a distribution. When
the frequency distribution is arranged in order of magnitude then range will be the absolute difference
between the mid-values of last class and first class.
Symbolically ; Range = | Xmax - Xmin | = | XM - XL |
Range is the samplest and a crude measure of dispersion. Range is based on two extreme observations
only.
Advantages of Range :
• It is very easy to understand and easy to calculate.
• It gives us a quick idea about the variability of a set of data.
• It is based on the extreme observations only and no detail information is required.
• It is the simplest of all measures of distribution.
Disadvantages of Range :
• It is very much affected by the extreme values.
• It provides us with the idea of only two extreme values in a set of data.
• It cannot be computed for data set having open ended class interval.
Uses of Range :
• Range is used to forecast the weather, the percentage of humidity in the air for weather forecasting.
• It is used in reporting daily market price of commodities.
• It is used in statistical quality control.
Quartile Deviation :
The quartile deviation is the half of the difference between the upper quartile (Q3) and lower quartile
(Q1).
Q 3 − Q1
QD =
2
It is also known as semi-interquartile range.
Advantages of Quartile Deviation :
• It is a very easily understandable location based measure.
• It is superior to other measures in the sense that the extreme values cannot affect the quartile
deviation.
• For distributions with open ended class intervals no other measure can be computed but it is
possible to compute quartile deviation.
Disadvantages of Quartile Deviation :
• It is not a good measure of dispersion because it does not measure the deviation from any central
value of the distribution.
• It is not based upon all the observations.
• It is more affected by sampling fluctuations.
• It is not suitable for further algebric treatment.
Uses of Quartile Deviation :
• Quartile deviation is a location-based measure and can be profitably used where a rough estimate
of the variation is desired.
• It is a suitable measure of dispersion when the frequency distribution has open-ended class
interval.
Measures of Dispersion 3
Mean Deviation :
The arithmetic mean of the absolute deviations of the given observations from their central value is
called mean deviation; it can be measured from mean, median and mode.
Mean deviation of a distribution having observations x1, x2, ......., xn may be defined as follows :
• Mean deviation from mean or simply mean deviation :
1 n
MD ( x ) = xi − x
n i =1
In the case of frequency distribution
1 n n
MD ( x ) = f i x i − x ; where N = f i
N i =1 i =1
Example 4.1 :
Computing mean deviation of the daily wages of a group of farm labours (given in example 3.1): The
mean, median and mode are respectively, x = 66.40, Me = 66.43, Mo = 66.67.
1 512.0
Mean deviation from mean MD ( x ) = fi xi − x = = 5.12
N 100
1 511.4
Mean deviation from median MD (M e ) = fi xi − Me = = 5.114
N 100
1 506.6
Mean deviation from mode MD (M e ) = fi xi − Mo = = 5.066
N 100
| xi - Me | = [(Me - x1) + (Me - x2) + ...... + (Me - xk)] + [(Me - xk+1) + (Me - xk+2) + .......... + (Me - xn)] +
[(xn+1 - Me) + (xn+2 - Me) + ......... + (x2n - Me)] ........................ (1)
Again, sum of the absolute deviations of observations from any other value xk is given by
|xi - xk | = [(xk - x1) + (xk - x2) + ... + (xk - xk) + (xk+1 - xk) + (xk+2 - xk) + .... + (xn - xk)] + (xn+1 - xk) + (xn+2
- xk) + ..... + (x2n - xk) ......... (2)
[Note : This theorem is always true for ungrouped data but may not always be true for grouped frequency
distribution; Ref: example 4.1 above].
Standard Deviation :
The arithmetic mean of the squares of deviations of the observations of a series from the their mean is
known as variance. The positive square root of variance is called standard deviation. The variance is denoted
Measures of Dispersion 5
by 2 and standard deviation is denoted by . Standard deviation, therefore, may be defined as the root
mean square deviation from the mean.
For a set of observations x1, x2, ........ xn standard deviation is computed as
1 n
= (x i − x)
2
n i =1
For frequency distributions.
1 n n
= f i ( x i − x ) ; where N = f i
2
N i =1 i =1
Root mean−square deviation from an arbitrary value a is denoted by s and is computed as,
1
s= f i (x i − a )
2
N
Standard Error : The standard deviation of the sampling distribution of a statistic (say mean) is
known as standard error. It is denoted by SE.
Let x1, x2, ......... , xn be the observations of a sample of size n, the standard error of mean is given by
SE( x ) =
n
= Population standard deviation, x = Sample mean (statistic)
Advantages of Standard Deviation :
• It is rigidly defined.
• It is based upon all the observations.
• It is less affected by sampling fluctuation.
• It is suitable for further algebric treatments.
• The standard deviation of the combined series can be obtained if the number of observations, mean
and standard deviation in each series are known.
Disadvantages of Standard Deviation :
• It is not readily comprehensible, computation requires a good deal of time and knowledge of
mathematics.
• It is affected by the extreme values.
• It cannot be computed in case of distributions having open-ended class interval.
Uses of Standard Deviation :
Standard deviation is the most useful measure of dispersion. The use of standard deviation is highly
desirable in advanced statistical works. Sampling and analysis of data have got their basis on standard
deviation. Sampling, correlation analysis, the normal curve of errors, comparing variability and uniformity
of two sets of data which are of great use in statistical works, are analysed in terms of standard deviation.
Thus standard deviation is the most important measure of dispersion.
2
1 n xi − a x − a
= fi −
N i =1 h h
2
1 n xi − a − x + a
= fi
N i =1 h
2
1 n xi − x 1 1 n
= fi = f i (x i − x)
2
N i =1 h h N i =1
1
= x
h
x = h u
This implies that standard deviation is independent of change of origin but not of scale.
Proof:
Let, x1, x2, ........ , xn are the values of 'n' observations with corresponding frequencies f1, f2, ...... ,
fn. Also let x be the arithmetic mean of the observations.
1 1 n
We have, x = f i ( x i − x ) 2 and s = f i (x i − a )
2
N N i =1
n n
or, Ns 2 = f i ( x i − a ) 2 = f i ( x i − x ) + ( x − a )2
i =1 i =1
= f i (x i − x) + 2f i (x i − x)(x − a) + f i (x − a ) 2
2
= N 2x + 2(x − a ) f i (x i − x) + f i (x − a ) 2
= N 2x + 2(x − a ) x 0 + Positive quantity [ f i (x i − x) = 0]
Ns2 = N 2x + positive quantity
Ns2 N 2x
s2 2x
i.e., s Proved.
We have, 2 =
1 2 1
(x i − x) = (x1 − x) + (x 2 − x)
2 i =1
2
2
2 2
1
2 2
x1 + x 2 x + x2
= x 1 − + x 2 − 1
2 2 2
1 x − x x − x
2 2
= 1 2 + 2 1
2 2 2
2 2
x − x2 | x − x 2 |
, 2 = 1 = 1
2 2
| x1 − x 2 | 1
= = | x 1 − x 2 | = Half of range.
2 2
• Working Formula of Standard Deviation:
Here,
n
(x i − x )
2
I =1
= x i2 − 2x x i + x 2 = x i2 − 2x x i + nx 2
2
xi xi
= x i2 − 2 ( x i ) + n
n n
= x i2 − 2
( x i )2 + ( x i )2
n n
( x i ) 2
= x i2 −
n
= (x i − x )2
21
n
1 ( x i ) 2
= x i2 −
n n
2
1 xi
= x i2 −
n n
Measures of Dispersion 8
2
1 2 xi
= x i −
n n
In case of grouped data
2
2 fi xi
1 n
= f i x i − ; where N = f i
N N i =1
1 2 ( f i x i )
2
= f i x i −
N N
Proof :
Let, x = Mean and R = Range of 'n' observations x1, x2, ........ , xn.
Since, Range is the difference between the highest and lowest observations of the distribution, it
will be greater than (xi - x ).
i.e., R > (xi - x )
1
We have, 2 = (x i − x)
2
n
1
= ( x 1 − x ) 2 + ( x 2 − x ) 2 + ......... + ( x n − x ) 2
n
1
< R 2 + R 2 + ........... + R 2 =
n
nR 2
n
= R2
2 < R2
<R Proved.
Proof :
1 1
By definition, 2 = f i ( x i − x ) 2 and MD ( x ) = f i | x i − x |
N N
Let, | x i − x |= z
1
f i z i2 + z 2 − 2 z 2 0
N
1
f i z i2 − z 2 0
N
2
1 1
f i z i2 − f i z i 0
N N
Measures of Dispersion 9
2
1 1
f i z i2 f i z i
N N
2
1 1
f i | x i − x | 2 f i | x i − x |
N N
1
N
1
f i ( x i − x ) 2 f i | x i − x |
N
| x i − x | 2 = (x i − x) 2
MD( x ) Proved.
=
1
2
n
1(+ 2 2
+ 3 2
+ .......... + n 2
−)(1 + 2 + 3 + .......... + n ) 2
n
n (n + 1)
2
1 n (n + 1)(2n + 1) 2
= −
n 6 n
1 n (n + 1)(2n + 1) n (n + 1) 2
= −
n 6 4
2n (n + 1)(2n + 1) − 3n (n + 1) 2
=
12n
2n (2n + 1) − 3n (n + 1)
= (n + 1)
12n
n2 −n
= (n + 1) = (n + 1) n (n − 1)
12n 12n
(n − 1) n 2 − 1
= (n + 1) =
12 12
n 2 −1
=
12
• Standard Deviation of Combined Series.
Let, x1i (i = 1, 2, ......., n1) and x2j (j = 1, 2, ........, n2) are two series with means x1 and x 2 and variances
12 and 22 respectively. Mean of the combined series is given by
n1x1 + n 2 x 2 n1x1 + n 2 x 2
x= = ; where N = n1 + n2
n1 + n 2 N
By definition,
Measures of Dispersion 10
1 n1 1 n2
12 = ( x 1i − x 1 ) and 2 =
2 2
(x 2 j − x 2 )
2
n 1 i =1 n 2 j=1
n1 n2
n 112 = ( x 1i − x 2 ) 2 and n 2 22 = ( x 2 j − x 2 ) 2
i =1 j=1
The variance of the combined series is
1 N
2 = (x k − x)
2
N k =1
N n1 n2
or, N 2 = ( x k − x ) 2 = ( x 1i − x ) 2 + ( x 2 j − x ) 2
k =1 k =1 j=1
n1
( x 1i − x ) = ( x 1i − x 1 ) + ( x 1 − x )
2 2
Now,
i =1
= Σ (x1i − x1 ) 2 + (x1 − x) 2 + 2(x1i − x1 )(x1 − x)
= (x1i − x1 ) 2 + (x1i − x) 2 + 2(x1 − x) (x1i − x1 )
= n 112 + d 12 + d 1 ( x 1i − x 1 ) ; Putting d1 = x1 − x
= n 112 + n 1d 12 + 0 ; [Since, (x1i − x1 ) = 0]
= n1 (12 + d12 )
n2
Similarly, we get, ( x 2 j − x ) 2 = n 2 ( 22 + d 22 )
j=1
where, d 2 = x 2 − x
N 2 = n1 (12 + d12 ) + n 2 ( 22 + d 22 )
n 112 + n 2 22 + n 1d12 + n 2 d 22
2 = ...................... (i)
N
n 112 + n 2 22 + n 1 ( x 1 − x ) 2 + n 2 ( x 2 − x ) 2
=
n1 + n 2
n 112 + n 2 22 + n 1 ( x 1 − x ) 2 + n 2 ( x 2 − x ) 2
=
n1 + n 2
Alternative Way :
n1x1 + n 2 x 2 n 2 (x1 − x 2 )
d1 = x 1 − x = x 1 − =
n1 + n 2 n1 + n 2
n1 (x 2 − x1 )
Similarly , d 2 =
n1 + n 2
Putting the values of d1 and d2 in (1) we get after simplification
n 112 + n 2 22 n 1n 2
= + ( x 1 − x 2 ) 2 ................ (ii)
n1 + n 2 (n 1 + n 2 ) 2
n i i2 + n i d i2
i.e., =
n i
where ni is the size of the ith set, x i and i2 are the mean and variance respectively of the ith set, di =
x i − x , and 2 is the variance of the combined set.
Example 4.2 :
The frequency distribution of the weight of tomato (Example 2.2) is reproduced below :
Weights: 50-60 60-70 70-80 80-90 90-100 100-110 110-120
No. of
5 9 13 20 19 9 5
tomato :
Calculate standard deviation by direct method and indirect method.
Solution :
Direct Method :
Class Mid value
frequency fi fixi fi x i2
interval of class xi
50-60 5 55 275 15125
60-70 9 65 585 38025
70-80 13 75 975 73125
80-90 20 85 1700 144500
90-100 19 95 1805 171475
100-110 9 105 945 99225
110-120 5 115 575 66125
Total N=80 6860 607600
1 2 (f i x i )
2
Standard deviation = f i x i −
N N
=
1 (6860)2 = 19355
607600 −
80 80 80
= 15.554
Indirect Method :
[We change the origin to x = 85 and scale by dividing by 10]
Class Mid value frequency x i − 85
interval
ui fi u i fi u i2
of class xi fi 10
50-60 55 5 -3 -15 45
60-70 65 9 -2 -18 36
70-80 75 13 -1 -13 13
80-90 85 20 0 0 0
Measures of Dispersion 12
90-100 95 19 1 19 19
100-110 105 9 2 18 36
110-120 115 5 3 15 45
Total N=80 6 194
1 2 (f i u i )
2
u = f i u i −
N N
=
1 (6)2 = 1 (193.55)
(194) − = 1.5554
80 80 80
Example 4.3 :
A student while calculating mean and standard deviation of 20 observations obtained mean as 68 and
standard deviation as 8. At the time of checking it was fond that he copied 96 instead of 69. What would
be the actual values of mean and standard deviation ?
1
Again we know, 2 = x i2 − x 2
n
x i2 = n ( 2 + x 2 ) = 20 (82 + 682) = 93760
Example 4.4:
Two sets of data having 200 and 250 observations have means 25 and 15 respectively and standard
deviations 3 and 4 respectively. If the two sets are combined together what will be the mean and standard
deviation of the combined set?
n2 = 250 x 2 = 15, 2 = 4
Let, mean and standard deviation of the combined set are x and respectively.
We know, the combined mean for two sets of observation is
n 1 x 1 + n 2 x 2 200 x 25 + 250 x 15 8750
x= = = = 19.44
n1 + n 2 200 + 250 450
Again the combined standard deviation for two sets of observation is
n 112 + n 2 22 n 1n 2
= + (x1 − x 2 ) 2
n1 + n 2 (n 1 + n 2 ) 2
5800 5000000
= + = 12.89 + 24.69 = 6.13
450 202500
xm − xl
Thus, CR = x 100%
xm + xl
where xm = the highest value of the data
xl = the lowest value of the data
• Coefficient of Quartile Deviation : When the difference of Q3 and Q1 is divided by their sum and
expressed in percentage, we get the coefficient of quartile deviation (C.Q.D).
Q 3 − Q1
Thus, CQD = x 100%
Q3 + Ql
where Q3 and Q1 are the upper and lower quartiles respectively.
[Note : For comparing the variability of two series, we calculate the C.V. for each series. The series having
greater C.V. is said to be more variable (unstable) than the other and the series having smaller C.V. is said
Measures of Dispersion 14
to be more consistent (stable/ homogeneous) than the other. Thus C.V. is of the great practical significance
and is the best measure for comparing the variability of two or more series.]
4.3. Moments :
Moments are constant which are used to determine some characteristics (e.g., nature, shape etc.) of
frequency distributions. Moments about the mean are called the central moments and those about arbitrary
value (other than mean) are known as raw moments.
If x1, x2, ......., xn occur with frequencies f1, f2, ......, fn, respectively, then the rth central moment given
by ;
n
f i (x i − x)
r
1 1 1
2 = f i x i2 , 3 = f i x 3i , 4 = f i x i4 etc.
N N N
• Relation Between Central Moments and Raw Moments :
(rth central moment in terms of raw moments)
1
r = f i ( x i − x ) r
N
1
() () ()
= f i x ir − 1r x ir −1 ( x ) + 2r x ir − 2 ( x ) 2 − 3r x ir −3 ( x ) 3
N
( )x
+ .......... + (−1) r −1 r −1r i ( x ) r −1 + (−1) r ( x ) r
Measures of Dispersion 15
=
1
N
()
f i x ir − 1r
N
1
f i x ir −1 ( x ) + 2r
1
N
()
f i x ir − 2 ( x ) 2 + 3r
1
N
()
f i x ir −3 ( x ) 3
= 2 − 2 (1 )2 + (1 )2
1
[Since 0 = f i x i0 = 1]
N
2
= 2 − 1
• When, r = 3
3 = 3 − ( )
3
1 3−1 1 + ( )
3
2 3− 2 (1 )2 − (33 ) 3−3 (1 )3
2 3
= 3 − 3 2 1 + 31 1 − 0 1
3 3
= 3 − 3 2 1 + 31 − 1
3
= 3 − 3 2 1 + 21
• When, r = 4
() 2
() 3
()
4 = 4 − 41 4−1 1 + 42 4− 2 1 − 43 4−3 1 + 44 4− 4 1
4
()
2 3 4
= 4 − 4 3 1 + 6 2 1 − 41 1 + 0 1
2 4
= 4 − 4 3 1 + 6 2 1 − 31
1
r( u ) = r( x )
hr
r( x ) = h r r( u )
Hence, moments are independent of original but dependent on
scale. Proved.
• Sheppard's Correction for Moments :
In calculating the moments of a grouped frequency distribution, we assume that all the values within
a class interval refer to the mid-value of the class interval. If the distribution is symmetrical or
1
moderately asymmetrical and the class intervals are small (not greater that th Range), this
20
assumption is approximately true. Generally, this assumption is not always true, some error, called
grouping error creeps into the calculation of the moments.
W.F. Sheppard proposed that if
(i) the frequency distribution is continuous and
(ii) the frequency tapers off to zero in both ends of the class interval the effect due to grouping at the
mid-point of the intervals can be corrected by the following formulae, known as Sheppard's
Corrections :
h2
2 (correlated) = 2 -
12
3 (corrected) = 3
h2 7 4
4 (corrected) = 4 - 2 + h
2 240
where h is the length of class interval.
Example 4.5 : The wages per hour of 100 farm labours are given below :
Wages (Taka) : 0-5 5-10 10-15 15-20 20-25
No. of labours : 10 15 40 25 10
Compute first four central moments (use Sheppard's-correction for the 2nd and 4th central moments) :
Solution :
No. of Mid
Wages x i − 12.5 2 3 4
labours value ui= fiui fi u i fi u i fi u i
(Tk.) 5
fi xi
0-5 10 2.5 -2 -20 40 -80 160
5-10 15 7.5 -1 -15 15 -15 15
10-15 40 12.5 0 0 0 0 0
15-20 25 17.5 1 25 25 25 25
20-25 10 22.5 2 20 40 80 160
Total 100 10 120 10 360
1 1
1 = f i u i = x 10 = 1.0
(u ) N 100
1 1
2 = f i u i2 = x 120 = 1.2
(u ) N 100
1 1
3 = f i u 3i = x 10 = 0.1
(u ) N 100
Measures of Dispersion 17
1 1
4 = f i u i4 = x360 = 3.6
(u ) N 100
2
Now, 2 = 2 −
1 ( u ) = 1.2 − (0.1) = 1.19
2
(u ) (u )
3
3 = 3 − 3 2 1 + 21
(u ) (u ) (u) (u) (u )
= 0.1 - 3(1.2)(0.1) +2{0.1}3 = -0.258
2 4
4 = 4 − 3 3 1 + 6 2 − 3
( u ) 1( u )
1( u )
(u ) (u ) (u ) (u )
= 3.6 - 4(0.1)(0.1) + 6(1.2)(0.1)2 - 3(0.1)4
= 3.6 - 0.04 + 0.072 - 0.003 = 3.6317
First Four Central Moments of the Original Variable:
1( x ) = 0
Proof*:
Let us consider the following symmetrical continuous frequency distribution with equal class
interval (x1 < x2 < ...... < xn+1):
[*Adopted with minor modification from an unpublished article by M. Amirul Islam providing a theoretical
proof.]
Since the distribution is symmetrical, we will have f1 = fn, f2 = fn-1, .........., fk-1 = fk+1 and fk will be the
highest frequency. Let us consider that h be the width of each class interval.
From the traditional formula of mode we get,
f o − f1
Mo = Lo + h
2f o − f1 − f 2
h (f k − f k −1 )
= xk +
2f k − f k −1 − f k +1
[Putting Lo = xk, fo = fk, f1 = fk-1 and f2 = fk+1]
h (f k − f k −1 )
= xk + ; [since fk-1 = fk+1]
2f k − 2f k −1
h
= xk +
2
Since the distribution is symmetrical we get,
f1 + f2 + ........ + fk-1 + ½fk = ½fk + fk+1 + fk+2 + ........ + fn
N 1 n
= f1 + f 2 + ....... + f k −1 + f k , where N = f i
2 2 i =1
N 1
= Fk −1 + f k ........................................... (1)
2 2
N
− Fk −1
= xk + 2 h
fk
= Fk −1 ]
[putting Lm = xk, fm = fk and Fm
(Fk −1 + ½f k ) − Fk −1
= xk + h
fk
½f k
= xk + h
fk
h
= xk +
2
We know, for a frequency distribution
1 n
Arithmetic mean, x = fi yi ........................ (2)
N i =1
For a symmetric distribution
n
f i y i = f1y1 + f2y2 + ... + fk-1 yk-1 + fkyk + fk+1yk+1 +... + fnyn
i =1
= (f1y1 + fnyn) + (f2y2 + fn-1 yn-1) + (fk-1 yk-1 + fk+1 yk+1) + fkyk
= f1(y1 + y2) + f2(y2 + yn-1) + ..... + fk-1(y\k-1 + yk+1) + fkyk…..(3)
[f1 = fn, f2 = fn-1, ........., fk-1 = fk+1]
x
Mo
Me
Fig. 4.1.
For distributions of moderate skewness, there is an empirical relationship among the mean, median
and mode that,
Mean - Mode = 3(Mean - Me)
or, x - Mo = 3( x - Me)
The position of arithmetic mean, median and mode of moderately asymmetrical distributions are
shown in Fig. 4.2 and Fig. 4.3.
Mo Me x
Fig. 4.2
x Me Mo
Fig. 4.3
Karl Pearson's and Co-efficient :
Karl Pearson defined the following co-efficients, based upon first four central moments :
2
1 = 3 and 2 = 4
32 22
1 = ± 1 and 2 = 2 - 3
Measures of Dispersion 21
Measures of Skewness :
We may compare the nature, shape and size of two or more frequency distributions with the help of
measures of skewness. The difference between mean and mode is considered as a measure of skewness. If
x > Me the skewness is said to be positive and if x < Me, the skewness is said to be negative. Skewness of
distributions having different units of measurement cannot be compared with the help of absolute measures
of skewness. That is why, relative measures of skewness are widely used.
In case it is not possible to find the mode or if a distribution has more than one mode, the following
3(Mean − Median) 3( x − M e )
formula is used to measure skewness : S k = =
s.d.
(2) Bowley's formula
(Q 3 − Q 2 ) − (Q 2 − Q1 ) Q 3 + Q1 − 2Q 2
Sk = =
(Q 3 − Q 2 ) + ( Q 2 − Q 1 ) Q 3 − Q1
Q 3 + Q1 − 2M e
=
Q 3 − Q1
where Q1, Q2 and Q3 are the 1st, 2nd and 3rd quartiles respectively.
As both 1 and 2 are always non-negative, the above formula cannot indicate as to whether the
skewness is positive or negative. In such case the nature of the distribution will depend upon the value of
µ3. If µ3 is positive, the skewness is considered to be positive and if µ3 is negative the skewness is also
treated to be negative.
4.5. Kurtosis:
Like skewness, kurtosis is also an important shape characteristic of frequency distribution. Two
distributions may be both symmetrical, they may have the same variability as measured by standard
deviation, they may be relatively more or less flat topped compared to normal curve (Discussed in chapter
VII). This relative flatness of the top or the degree of peakedness is called kurtosis and is measured by 2.
For normal distribution, 2 = 3. Hence the quantity 2-3 is known as excess of kurtosis or simply kurtosis.
On the basis of kurtosis, frequency curves are divided into the following three categories :
1) Leptokurtic ; a curve having a high peak.
2) Platykurtic ; a curve which is flat topped
3) Mesokurtic ; a curve which is neither too peaked nor too
flat-topped.
Measures of Dispersion 22
If a distribution has
(i) 2 > 3, it is called leptokurtic
(ii) 2 < 3, it is called platykurtic
(iii) 2 = 0, it is called mesokurtic
Example 4.6 : A distribution of short term agricultural credit disbursement from 10 branches of a bank is
given below -
Amount of credit : 0-5, 5-10, 10-15, 15-20, 20-25
(Lac Taka)
No. of branches : 1 2 4 2 1
Calculate first four central moments, co-efficients of skewness and kurtosis and thus comment on the
shape and nature of the distribution.
Solution :
Amount of No. of Mid
credit branches value fixi xi- x fi(xi- x ) fi(xi- x )2 fi(xi- x )3 fi(xi- x )4
(lac Tk.) fi xi
0-5 1 2.5 2.5 -10 -10 100 -1000 10000
5-10 2 7.5 15.0 -5 -10 50 -250 1250
10-15 4 12.5 50.0 0 0 0 0 0
15-20 2 17.5 35.0 5 10 50 250 1250
20-25 1 22.5 22.5 10 10 100 1000 10000
Total N=10 125.0 0 0 300 0 22500
1 1
x= f i x i = x 125.0 = 12.5
N 10
1 1
1 = f i ( x i − x ) = x (0) = 0 (1 = 0 always)
N 10
Measures of Dispersion 23
1 1
2 = f i ( x i − x ) 2 = x (300) = 30.0
N 10
1 1
3 = f i ( x i − x ) 3 = x (0) = 0
N 10
1 1
and 4 = f i ( x i − x ) 4 = x (22500) = 2250.0
N 10
32 0
Now, 1 = = = 0
32 (30) 3
( 2 + 3) 1
Coefficient of skewness S k = =0
2(5 2 − 61 − 9)
Hence the distribution is symmetrical.
4 2250
Again 2 = = = 2.5 3
22 (30) 2
= 2 - 3 = 2.5 - 3 = -0.5.
Since < 0; The curve is platykurtic.
The distribution is symmetrical and platykurtic.