lec3
lec3
Central tendency is one character of a distribution. Measures of central tendency give the idea of
central value or location of the distribution. But the central tendency is not the only character of a
distribution. Two distributions may be different despite of their same central value. As for example, the
data set comprised of the values 0, 10 and 20 has 10 as its mean and median. Again the mean and median
of the series 5, 10, 15 is also 10. But the deviation of these values from their mean is not same. The
deviation of observations from their mean is called dispersion. The measure of dispersion or variation is
the measure of the extent of variation or deviation of individual values from the central value. This
measure of variation gives a precise idea as to the extent of representativeness of the central value.
; where
Root mean−square deviation from an arbitrary value a is denoted by s and is computed as,
Standard Error: The standard deviation of the sampling distribution of a statistic (say mean) is
known as standard error. It is denoted by SE.
Let x1, x2, ... ,xn be the observations of a sample of size n, the standard error of mean is given by
their corresponding frequencies and also let, ui = ; where ui, a and h are changed variable, origin
and scale respectively.
⇒σx = h σu
This implies that standard deviation is independent of change of origin but not of scale.
Proof:
Let, x1, x2, ... ,xn are the values of 'n' observations with corresponding frequencies f1, f2, ... , fn.
Also let be the arithmetic mean of the observations.
We have, and
or,
∴ Ns2≥
⇒ s2≥
i.e., σ≤ s Proved.
We have,
, σ2 =
∴σ = = Half of range.
● Working Formula of Standard Deviation:
Here,
=
Variance,
● Standard Deviation of Combined Series.
Let, x1i (i = 1, 2, ..., n1) and x2j (j = 1, 2, ..., n2) are two series with means and variances
................(1)
where
; where N = n1 + n2
Alternative way:
Similarly,
Putting the values of d1 and d2 in (1) we get after simplification
Example2.
The frequency distribution of the weight of tomato (Example 2.2) is reproduced below :
Weights: 50-60 60-70 70-80 80-90 90-100 100-110 110-120
No. of
5 9 13 20 19 9 5
tomato :
Calculate standard deviation by direct method and indirect method.
Solution :
Direct Method:
Standard deviation
= 15.554
Indirect Method:
[We change the origin to x = 85 and scale by dividing by 10]
Class Mid value frequency
interval of class xi fi fi fi
50-60 55 5 -3 -15 45
60-70 65 9 -2 -18 36
70-80 75 13 -1 -13 13
80-90 85 20 0 0 0
90-100 95 19 1 19 19
100-110 105 9 2 18 36
110-120 115 5 3 15 45
Total N=80 6 194
= 1.5554
∴σx = hσu = 10 x 1.554 = 15.554
[Note: The second method is generally known as the short-cut-method. But at the present age of
electronic calculator it is no more a short-cut method, rather it is more lengthy and time consuming. That
is why, the method is termed here as an indirect method. However, the method is sometimes useful when
the observations of the distributions are large.]
Example3.
A student while calculating mean and standard deviation of 20 observations obtained mean as 68 and
standard deviation as 8. At the time of checking it was fond that he copied 96 instead of 69. What would
be the actual values of mean and standard deviation ?
We know, = 20 x 68 = 1360
Since the student copied 69 instead of 96, the actual sum of the observations is
Σxi = 1360 – 96 + 69 = 1333
Again we know,
= 20 (82 + 682) = 93760
= (app.)
Example4.
The mean and standard deviation of two sets of data having 200 and 250 observations are (25, 5) and
(3, 4) respectively. If the two sets are combined together what will be the mean and standard deviation?
n1 = 200
n2 = 250
Let, mean and standard deviation of the combined set are respectively.
We know, the combined mean for two sets of observation is
● Co-efficient of Range : When the range is divided by the sum of highest and lowest items of the
data and expressed in percentage we get the coefficient of range (CR).
Thus, CR =
where xm = the highest value of the data
xl = the lowest value of the data
● Coefficient of Quartile Deviation : When the difference of Q3 and Q1 is divided by their sum
and expressed in percentage, we get the coefficient of quartile deviation (C.Q.D).
Thus, CQD =
where Q3 and Q1 are the upper and lower quartiles respectively.
Moments:
Moments are constant which are used to determine some characteristics (e.g., nature, shape etc.) of
frequency distributions. Moments about the mean are called the central moments and those about arbitrary
value (other than mean) are known as raw moments.
Central Moment:
If x1, x2, ..., xn occur with frequencies f1, f2, ..., fn, respectively, then the rth central moment given by ;
In particular, ; when r = 0
=
● when, r = 1,
[First raw moment is the arithmetic mean]
etc.
ui = ==>
Now; for new variateu ; we have
Hence, moments are independent of original but dependent on
scale. Proved.
Example 5.The wages per hour of 100 farm labourers are given below :
Wages (Taka) : 0-5 5-10 10-15 15-20 20-25
No. of labourers : 10 15 40 25 10
Compute first four central moments.
Solution :
No. of Mid
Wages
labours value fiui fi fi fi
(Tk.) ui=
fi xi
0-5 10 2.5 -2 -20 40 -80 160
5-10 15 7.5 -1 -15 15 -15 15
10-1 40 12.5 0 0 0 0 0
5
15-2 25 17.5 1 25 25 25 25
0
20-2 10 22.5 2 20 40 80 160
5
Total 100 10 120 10 360
Now,
= 0.1 - 3(1.2)(0.1) +2{0.1}3 = -0.258
(1.19) = 29.75
(-0.258) = -32.25
(3.6317) = 2269.8125
Skewness:
Skewnessmeans lack of symmetry. For an asymmetric distribution it is the departure from symmetry.
Coefficient of skewness is denoted by β1.
Symmetrical Distribution:
A distribution is said to be symmetrical if the frequencies are symmetrically distributed about the
mean. For symmetrical distributions the values equi-distant from mean have equal frequency. For
example, the following distribution is symmetrical about its mean 4.
x: 0 1 2 3 4 5 6 7 8
f: 12 14 16 18 20 18 16 14 12
Again for symmetrical distribution mean = mode = median.
● A distribution is said to be skewed if -
i) Mean, median and mode fall at different points.
iii) The curve drawn with the help of the given data is not symmetrical but elongated more to one side.
Skewness may be positive or negative. Skewness is said to be positive if the frequency curve is more
elongated to the right side. In this case mean of the distribution lies at the right of (or greater than) the
mode.
i.e, >Me>Mo.
On the other hand, the skewness is negative if the frequency curve is more elongated to the left side.
In this case mean of the distribution lies at the left of (or less than) the mode.
i.e, Mo> Me>
For distributions of moderate skewness, there is an empirical relationship among the mean, median and
mode that,
Mean - Mode = 3(Mean - Me)
or, - Mo = 3( - Me)
Measures of Skewness:
We may compare the nature, shape and size of two or more frequency distributions with the help of
measures of skewness. The difference between mean and mode is considered as a measure of skewness. If
>Me the skewness is said to be positive and if < Me, the skewness is said to be negative. Skewness of
distributionshaving different units of measurement cannot be compared with the help of absolute
measures of skewness. That is why, relative measures of skewness are widely used.
where Q1, Q2 and Q3 are the 1st, 2nd and 3rd quartiles respectively.
(3) Keley'sformula :
As both β1 and β2 are always non-negative, the above formula cannot indicate as to whether the
skewness is positive or negative. In such case the nature of the distribution will depend upon the value of
µ3. If µ3 is positive, the skewness is considered to be positive and if µ3 is negative the skewness is also
treated to be negative.
Kurtosis:
Like skewness, kurtosis is also an important shape characteristic of frequency distribution. Two
distributions may be both symmetrical, they may have the same variability as measured by standard
deviation, they may be relatively more or less flat topped compared to normal curve. This relative flatness
of the top or the degree of peakedness is called kurtosis and is measured by β2. For normal distribution, β2
= 3. Hence the quantity β2-3 is known as excess of kurtosis or simply kurtosis. On the basis of kurtosis,
frequency curves are divided into the following three categories :
1) Leptokurtic ; a curve having a high peak.
2) Platykurtic ; a curve which is flat topped
3) Mesokurtic ; a curve which is neither too peaked nor too flat-topped.
For formal distribution, β2 = 3 andγ2 = 0. Kurtosis is measured by γ2 = β2 - 3.
If a distribution has
(i) β2> 3, it is called leptokurtic
(ii) β2< 3, it is called platykurtic
(iii) β2 = 3, it is called mesokurtic
Karl Pearson's β and γ Co-efficient :
Karl Pearson defined the following co-efficients, based upon first four central moments :
γ1 = ± and γ2 = β2 - 3