0% found this document useful (0 votes)
6 views

lec3

This document discusses the concepts of dispersion and measures of dispersion in frequency distributions, emphasizing the importance of understanding variability beyond central tendency. It outlines characteristics of ideal measures, types of dispersion (absolute and relative), and details various measures such as range, quartile deviation, mean deviation, and standard deviation, along with their advantages, disadvantages, and applications. The document also includes examples and formulas for calculating these measures, highlighting their significance in statistical analysis.

Uploaded by

wasi78045
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

lec3

This document discusses the concepts of dispersion and measures of dispersion in frequency distributions, emphasizing the importance of understanding variability beyond central tendency. It outlines characteristics of ideal measures, types of dispersion (absolute and relative), and details various measures such as range, quartile deviation, mean deviation, and standard deviation, along with their advantages, disadvantages, and applications. The document also includes examples and formulas for calculating these measures, highlighting their significance in statistical analysis.

Uploaded by

wasi78045
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Lecture Note 3

Dispersion, Natureand Shapeof Frequency Distribution

Central tendency is one character of a distribution. Measures of central tendency give the idea of
central value or location of the distribution. But the central tendency is not the only character of a
distribution. Two distributions may be different despite of their same central value. As for example, the
data set comprised of the values 0, 10 and 20 has 10 as its mean and median. Again the mean and median
of the series 5, 10, 15 is also 10. But the deviation of these values from their mean is not same. The
deviation of observations from their mean is called dispersion. The measure of dispersion or variation is
the measure of the extent of variation or deviation of individual values from the central value. This
measure of variation gives a precise idea as to the extent of representativeness of the central value.

Characteristics of an Ideal Measure of Dispersion:


The following are the requisites for an ideal measure of dispersion:
● It should be rigidly defined.
● It should be easy to understand and easy to calculate.
● It should be based on all the observations.
● It should be suitable for further algebra ical treatments.
● It should be least affected by sampling fluctuation.
● It should be least affected by extreme values.

Importance of Measuring Dispersion:


Dispersion is an important character of distribution. Measures of dispersion are widely used for the
accurate and efficient analysis of data. The importance of measuring dispersion can be pointed out as
follows:
● Measure of dispersion is needed to know representativeness of the observations of a
distribution; representativeness of mean can not be judged without the knowledge about
dispersion.
● Measures of dispersion help control the deviation of data.
● Measures of dispersion give the comparative picture of different distributions.
● Measures of dispersion help control the quality of industrial products.
● Measures of dispersion is important for time series data such as rainfall, temperature etc., where
central values are less important.

Measures of Dispersion may be divided in two broad types:

(a) Absolute Measures and


(b) Relative Measures.

(a) Absolute Measures:

1. Range, 2. Quartile Deviation, 3. Mean Deviation and 4. Standard Deviation


● Absolute measures of dispersion will retain the unit of measurement of the variable.
(b) Relative Measures:

1. Co-efficient of Range, 2. Co-efficient of quartile deviation,


3. Co-efficient of mean deviation, and 4. Co-efficient of variation.
● Relative measures of dispersion have no unit because these are the ratio of absolute measures
and the corresponding values.
Absolute Measures of Dispersion:
Range:
Range is the absolute difference between the highest and lowest observations of a distribution. When
the frequency distribution is arranged in order of magnitude then range will be the absolute difference
between the mid-values of last class and first class.
Symbolically ; Range = Xmax - Xmin = XM - XL
Range is the simplest and a crude measure of dispersion. Range is based on two extreme
observations only.
Advantages of Range:
● It is very easy to understand and easy to calculate.
● It gives us a quick idea about the variability of a set of data.
● It is based on the extreme observations only and no detail information is required.
● It is the simplest of all measures of distribution.
Disadvantages of Range:
● It is very much affected by the extreme values.
● It provides us with the idea of only two extreme values in a set of data.
● It cannot be computed for data set having open ended class interval.
Uses of Range:
● Range is used to forecast the weather, the percentage of humidity in the air for weather
forecasting.
● It is used in reporting daily market price of commodities.
● It is used in statistical quality control.
Quartile Deviation:
The quartile deviation is the half of the difference between the upper quartile (Q3) and lower quartile
(Q1).

It is also known as semi-interquartile range.


Advantages of Quartile Deviation:
● It is a very easily understandable location based measure.
● It is superior to other measures in the sense that the extreme values cannot affect the quartile
deviation.
● For distributions with open ended class intervals no other measure can be computed but it is
possible to compute quartile deviation.
Disadvantages of Quartile Deviation :
● It is not a good measure of dispersion because it does not measure the deviation from any central
value of the distribution.
● It is not based upon all the observations.
● It is more affected by sampling fluctuations.
● It is not suitable for further algebraic treatment.
Uses of Quartile Deviation:
● Quartile deviation is a location-based measure and can be profitably used where a rough estimate
of the variation is desired.
● It is a suitable measure of dispersion when the frequency distribution has open-ended class
interval.
Mean Deviation:
The arithmetic mean of the absolute deviations of the given observations from their central value is
called mean deviation; it can be measured from mean, median and mode.
Mean deviation of a distribution having observations x1, x2, .......,xn may be defined as follows :
● Mean deviation from mean or simply mean deviation :

In the case of frequency distribution

● Mean deviation from median:

In the case of frequency distribution

● Mean deviation from mode :

In the case of frequency distribution

Advantages of Mean Deviation:


● It is based on all the observations
● It is rigidly defined and easy to understand.
● It is not affected by the extreme values
● It is suitable for comparative discussion.

Disadvantages of Mean Deviation:


● It cannot be computed for open-ended class intervals.
● It is not amenable to father algebraic treatment.
● It is seldom used in statistical decision making.
Example1.
● Computing mean deviation of the daily wages of a group of farm labours (given in example 3.1): The
mean, median and mode are respectively, = 66.40, Me = 66.43, Mo = 66.67.

Mid |xi- fi|xi- fi|xi-Me fi|xi-M


fi |xi-Me| |xi-Mo|
value (xi) | | | o|

52.5 5 13.9 13.93 14.17 69.5 69.65 70.85


57.5 10 8.9 8.93 9.17 89.0 89.30 91.70
62.5 25 3.9 3.93 4.17 97.5 98.25 104.25
67.5 35 1.1 1.07 0.83 38.5 37.45 29.05
72.5 15 6.1 6.07 5.83 91.5 91.05 87.45
77.5 7 11.1 11.07 10.83 77.7 77.49 75.81
82.5 3 16.1 16.07 15.83 48.3 48.21 47.49
Total 100 512.0 511.40 506.6

● Mean deviation from mean

● Mean deviation from median

● Mean deviation from mode


Theorem1. Mean Deviation from the Median is the Minimum.
Standard Deviation:
The arithmetic mean of the squares of deviations of the observations of a series from their mean is
known as variance. The positive square root of variance is called standard deviation. The variance is
denoted by σ2 and standard deviation is denoted by σ. Standard deviation, therefore, may be defined as the
root mean square deviation from the mean.

For a set of observations x1, x2, ...,xn standard deviation is computed as


For frequency distributions.

; where
Root mean−square deviation from an arbitrary value a is denoted by s and is computed as,

Standard Error: The standard deviation of the sampling distribution of a statistic (say mean) is
known as standard error. It is denoted by SE.
Let x1, x2, ... ,xn be the observations of a sample of size n, the standard error of mean is given by

σ = Population standard deviation, = Sample mean (statistic)


Advantages of Standard Deviation:
● It is rigidly defined.
● It is based upon all the observations.
● It is less affected by sampling fluctuation.
● It is suitable for further algebraic treatments.
● The standard deviation of the combined series can be obtained if the number of observations,
mean and standard deviation in each series are known.
Disadvantages of Standard Deviation:
● It is not readily comprehensible.
● It is affected by the extreme values.
● It cannot be computed in case of distributions having open-ended class interval.
Uses of Standard Deviation:
Standard deviation is the most useful measure of dispersion. The use of standard deviation is highly
desirable in advanced statistical works. Sampling and analysis of data have got their basis on standard
deviation. Sampling, correlation analysis, the normal curve of errors, comparing variability and
uniformity of two sets of data which are of great use in statistical works, are analysed in terms of standard
deviation.
Thus standard deviation is the most important measure of dispersion.

Difference between Mean Deviation and Standard Deviation:


● In computing mean deviation (MD), we omit the sign of deviation but in computing standard
deviation (SD) we do not need to omit the sign of the deviations.
● MD can be computed from mean, Median or Mode but in computing SD we consider only the
deviations from the mean.
● MD is not suitable for further algebraic treatment but SD is suitable for further algebraic
treatment.
Some Properties of Standard Deviation:
1. Standard deviation is independent of change of origin but not of scale.
2. Standard deviation is the least possible root mean square deviation.
3. For two observations, standard deviation is the half of the range.

1. Standard Deviation is Independent of Change of Origin but not of Scale.


Proof.
Let, x1, x2, ... , xn be the mid-values of the classes of a frequency distribution and let f1, f2,..., fn be

their corresponding frequencies and also let, ui = ; where ui, a and h are changed variable, origin
and scale respectively.

Now standard deviation of the new variable u is

⇒σx = h σu

This implies that standard deviation is independent of change of origin but not of scale.

2. Standard Deviation is the Least Possible Root Mean Square Deviation.

Proof:
Let, x1, x2, ... ,xn are the values of 'n' observations with corresponding frequencies f1, f2, ... , fn.
Also let be the arithmetic mean of the observations.
We have, and

Mean square deviation from an arbitrary value 'a' is given by

or,

∴ Ns2 = + positive quantity

∴ Ns2≥

⇒ s2≥
i.e., σ≤ s Proved.

3. For two Observations, Standard Deviation is the half of the Range.


Proof:

Let, x1 and x2 be two observations. Then,

We have,

, σ2 =

∴σ = = Half of range.
● Working Formula of Standard Deviation:
Here,
=

In case of grouped data

● Standard Deviation of first 'n' Natural Numbers.

First n natural numbers are 1, 2, 3, ... ,n.

Variance,
● Standard Deviation of Combined Series.

Let, x1i (i = 1, 2, ..., n1) and x2j (j = 1, 2, ..., n2) are two series with means and variances

respectively. Then combined standard deviation is given by

................(1)

where

; where N = n1 + n2
Alternative way:

Similarly,
Putting the values of d1 and d2 in (1) we get after simplification

Example2.

The frequency distribution of the weight of tomato (Example 2.2) is reproduced below :
Weights: 50-60 60-70 70-80 80-90 90-100 100-110 110-120
No. of
5 9 13 20 19 9 5
tomato :
Calculate standard deviation by direct method and indirect method.
Solution :
Direct Method:

Class frequency Mid value


fixi fi
interval fi of class xi
50-60 5 55 275 15125
60-70 9 65 585 38025
70-80 13 75 975 73125
80-90 20 85 1700 144500
90-100 19 95 1805 171475
100-110 9 105 945 99225
110-120 5 115 575 66125
Total N=80 6860 607600

Standard deviation

= 15.554
Indirect Method:
[We change the origin to x = 85 and scale by dividing by 10]
Class Mid value frequency
interval of class xi fi fi fi
50-60 55 5 -3 -15 45
60-70 65 9 -2 -18 36
70-80 75 13 -1 -13 13
80-90 85 20 0 0 0
90-100 95 19 1 19 19
100-110 105 9 2 18 36
110-120 115 5 3 15 45
Total N=80 6 194

= 1.5554
∴σx = hσu = 10 x 1.554 = 15.554

[Note: The second method is generally known as the short-cut-method. But at the present age of
electronic calculator it is no more a short-cut method, rather it is more lengthy and time consuming. That
is why, the method is termed here as an indirect method. However, the method is sometimes useful when
the observations of the distributions are large.]

Example3.
A student while calculating mean and standard deviation of 20 observations obtained mean as 68 and
standard deviation as 8. At the time of checking it was fond that he copied 96 instead of 69. What would
be the actual values of mean and standard deviation ?

Solution : Here, n = 20, = 68 and σ = 8

We know, = 20 x 68 = 1360
Since the student copied 69 instead of 96, the actual sum of the observations is
Σxi = 1360 – 96 + 69 = 1333

∴ Actual mean, = 66.65

Again we know,
= 20 (82 + 682) = 93760

But actual = 93760 – 962 + 692 = 89305


∴ Actual standard deviation is

= (app.)
Example4.
The mean and standard deviation of two sets of data having 200 and 250 observations are (25, 5) and
(3, 4) respectively. If the two sets are combined together what will be the mean and standard deviation?

Solution :Given that,

n1 = 200

n2 = 250

Let, mean and standard deviation of the combined set are respectively.
We know, the combined mean for two sets of observation is

Again the combined standard deviation for two sets of observation is

Relative Measures of Dispersion

● Co-efficient of Range : When the range is divided by the sum of highest and lowest items of the
data and expressed in percentage we get the coefficient of range (CR).

Thus, CR =
where xm = the highest value of the data
xl = the lowest value of the data

● Coefficient of Quartile Deviation : When the difference of Q3 and Q1 is divided by their sum
and expressed in percentage, we get the coefficient of quartile deviation (C.Q.D).

Thus, CQD =
where Q3 and Q1 are the upper and lower quartiles respectively.

● Co-efficient of Mean Deviation :

CMD based upon mean,

CMD based upon median,

CMD based upon mode,


● Coefficient of Variation :
Coefficient of variation of a set of data is the ratio of the standard deviation to mean expressed as
percentage.

Thus, C.V = x 100


[Note : For comparing the variability of two series, we calculate the C.V. for each series. The series
having greater C.V. is said to be more variable (unstable) than the other and the series having smaller C.V.
is said to be more consistent (stable/ homogeneous) than the other. Thus C.V. is of the great practical
significance and is the best measure for comparing the variability of two or more series.]

Moments:
Moments are constant which are used to determine some characteristics (e.g., nature, shape etc.) of
frequency distributions. Moments about the mean are called the central moments and those about arbitrary
value (other than mean) are known as raw moments.

Central Moment:

If x1, x2, ..., xn occur with frequencies f1, f2, ..., fn, respectively, then the rth central moment given by ;

; where N = Σfi ; r = 1, 2, 3, 4, etc.

In particular, ; when r = 0
=

1st central moment, ; when r = 1


=0
[µ1 for any distribution is zero]

2nd central moment, = σ2; when r = 2


nd
[2 central moment µ2 is the variance]

3rd central moment, ; when r = 3

4th central moment, ; when r = 4 etc.


Raw Moment :
The rth raw moment about any arbitrary value 'a' is defined as

rth raw moment about the origin (a = 0) is

● when, r = 1,
[First raw moment is the arithmetic mean]

etc.

Moments are Independent of Change of Origin but not of Scale.


Proof: Let, x1, x2, ..., xn be the mid-values of the classes of a frequency distribution and let f1, f2, ..., fn be
their corresponding frequencies.
Now rth central moment is

We change the origin and scale of x such that

ui = ==>
Now; for new variateu ; we have
Hence, moments are independent of original but dependent on
scale. Proved.
Example 5.The wages per hour of 100 farm labourers are given below :
Wages (Taka) : 0-5 5-10 10-15 15-20 20-25
No. of labourers : 10 15 40 25 10
Compute first four central moments.

Solution :
No. of Mid
Wages
labours value fiui fi fi fi
(Tk.) ui=
fi xi
0-5 10 2.5 -2 -20 40 -80 160
5-10 15 7.5 -1 -15 15 -15 15
10-1 40 12.5 0 0 0 0 0
5
15-2 25 17.5 1 25 25 25 25
0
20-2 10 22.5 2 20 40 80 160
5
Total 100 10 120 10 360

Now,
= 0.1 - 3(1.2)(0.1) +2{0.1}3 = -0.258

= 3.6 - 4(0.1)(0.1) + 6(1.2)(0.1)2 - 3(0.1)4


= 3.6 - 0.04 + 0.072 - 0.003 = 3.6317

First Four Central Moments of the Original Variable:

(1.19) = 29.75

(-0.258) = -32.25

(3.6317) = 2269.8125

Skewness:
Skewnessmeans lack of symmetry. For an asymmetric distribution it is the departure from symmetry.
Coefficient of skewness is denoted by β1.
Symmetrical Distribution:
A distribution is said to be symmetrical if the frequencies are symmetrically distributed about the
mean. For symmetrical distributions the values equi-distant from mean have equal frequency. For
example, the following distribution is symmetrical about its mean 4.
x: 0 1 2 3 4 5 6 7 8
f: 12 14 16 18 20 18 16 14 12
Again for symmetrical distribution mean = mode = median.
● A distribution is said to be skewed if -
i) Mean, median and mode fall at different points.

iii) The curve drawn with the help of the given data is not symmetrical but elongated more to one side.
Skewness may be positive or negative. Skewness is said to be positive if the frequency curve is more
elongated to the right side. In this case mean of the distribution lies at the right of (or greater than) the
mode.
i.e, >Me>Mo.
On the other hand, the skewness is negative if the frequency curve is more elongated to the left side.
In this case mean of the distribution lies at the left of (or less than) the mode.
i.e, Mo> Me>
For distributions of moderate skewness, there is an empirical relationship among the mean, median and
mode that,
Mean - Mode = 3(Mean - Me)
or, - Mo = 3( - Me)

Measures of Skewness:
We may compare the nature, shape and size of two or more frequency distributions with the help of
measures of skewness. The difference between mean and mode is considered as a measure of skewness. If
>Me the skewness is said to be positive and if < Me, the skewness is said to be negative. Skewness of
distributionshaving different units of measurement cannot be compared with the help of absolute
measures of skewness. That is why, relative measures of skewness are widely used.

Relative Measures of Skewness :

(1) Karl Pearson's Formula,


In case it is not possible to find the mode or if a distribution has more than one mode, the following

formula is used to measure skewness :

(2) Bowley's formula

where Q1, Q2 and Q3 are the 1st, 2nd and 3rd quartiles respectively.

(3) Keley'sformula :

4) Co-efficient of skewness based upon moments.

As both β1 and β2 are always non-negative, the above formula cannot indicate as to whether the
skewness is positive or negative. In such case the nature of the distribution will depend upon the value of
µ3. If µ3 is positive, the skewness is considered to be positive and if µ3 is negative the skewness is also
treated to be negative.
Kurtosis:
Like skewness, kurtosis is also an important shape characteristic of frequency distribution. Two
distributions may be both symmetrical, they may have the same variability as measured by standard
deviation, they may be relatively more or less flat topped compared to normal curve. This relative flatness
of the top or the degree of peakedness is called kurtosis and is measured by β2. For normal distribution, β2
= 3. Hence the quantity β2-3 is known as excess of kurtosis or simply kurtosis. On the basis of kurtosis,
frequency curves are divided into the following three categories :
1) Leptokurtic ; a curve having a high peak.
2) Platykurtic ; a curve which is flat topped
3) Mesokurtic ; a curve which is neither too peaked nor too flat-topped.
For formal distribution, β2 = 3 andγ2 = 0. Kurtosis is measured by γ2 = β2 - 3.

If a distribution has
(i) β2> 3, it is called leptokurtic
(ii) β2< 3, it is called platykurtic
(iii) β2 = 3, it is called mesokurtic
Karl Pearson's β and γ Co-efficient :
Karl Pearson defined the following co-efficients, based upon first four central moments :

γ1 = ± and γ2 = β2 - 3

You might also like