Statistics - Theory Notes
Statistics - Theory Notes
An average or a central value of a statistical series in the value of the variable which describes the characteristics of the
entire distribution.
The following are the five measures of central tendency.
(1) Arithmetic mean (2) Geometric mean (3) Harmonic mean (4) Median (5) Mode
Arithmetic Mean.
Arithmetic mean is the most important among the mathematical mean.
According to Horace Secrist,
“The arithmetic mean is the amount secured by dividing the sum of values of the items in a series by their number.”
x
Sum of the series
x , i.e., x 1 i
Number of terms n n i 1
f x f2 x 2 .... fn x n
fx
i 1
i i
then the arithmetic mean x is given by, x 1 1 .
f1 f2 .... fn n
f
i1
i
f ( x A)
(ii) Short cut method : Arithmetic mean (x ) A
f
Where A = assumed mean, f = frequency and x – A = deviation of each item from the assumed mean.
f (x
i1
i i x ) 0 , x being the mean of the distribution.
(ii) The sum of the squares of the deviations of a set of values is minimum when taken about mean.
@aakashallen
(iii) Mean of the composite series : If x i , (i 1, 2, ....., k ) are the means of k-component series of sizes ni , (i 1, 2, ...., k )
respectively, then the mean x of the composite series obtained on combining the component series is given by the formula
n x n 2 x 2 .... n k x k n n
x 1 1
n1 n 2 .... n k
i1
ni x i
i1
ni .
Geometric Mean.
If x 1 , x 2 , x 3 , ......, x n are n values of a variate x, none of them being zero, then geometric mean (G.M.) is given by
1
G.M. (x 1 . x 2 . x 3 ...... x n )1 / n log(G.M. ) (log x 1 log x 2 ..... log x n ) .
n
In case of frequency distribution, G.M. of n values x 1 , x 2 , ..... x n of a variate x occurring with frequency f1 , f2 , ....., fn is
given by G.M. (x 1f1 . x 2f2 ..... x nfn )1 / N , where N f1 f2 ..... fn .
Harmonic Mean.
n
The harmonic mean of n items x 1 , x 2 , ......, x n is defined as H.M. .
1 1 1
.....
x1 x 2 xn
f1 f2 f3 ..... fn
If the frequency distribution is f1 , f2 , f3 , ......, fn respectively, then H.M.
f1 f f
2 ..... n
x1 x 2 xn
Note : A.M. gives more weightage to larger values whereas G.M. and H.M. give more weightage to smaller values.
fx
i1
i i
Solution: (c) We know that, Mean
n
f
i1
i
1 4 2 5 3 y 4 1 5 2
i.e. 2.6 or 31 .2 2 .6 y 28 3 y or 0 . 4 y 3 . 2 y 8
4 5 y 1 2
Example: 2 In a class of 100 students there are 70 boys whose average marks in a subject are 75. If the average marks of the complete class are 72,
then what are the average marks of the girls
(a) 73 (b) 65 (c) 68 (d) 74
Solution: (b) Let the average marks of the girls students be x, then
70 75 30 x
72 (Number of girls = 100 – 70 = 30)
100
7200 5250
i.e., x , x = 65.
30
@aakashallen
Example: 3 If the mean of the set of numbers x 1 , x 2 , x 3 , ....., x n is x , then the mean of the numbers x i 2 i , 1 i n is
x i n
Solution: (b) We know that x i1
n
i.e., x nx
i1
i
n n n
i1
(x i 2i) i1
xi 2 i
i1 nx 2(1 2 ...n)
nx 2
n(n 1)
2
x (n 1)
n n n n
Example: 4 The harmonic mean of 4, 8, 16 is
(a) 6.4 (b) 6.7 (c) 6.85 (d) 7.8
3 48
Solution: (c) H.M. of 4, 8, 16 6 . 85
1 1 1 7
4 8 16
Example: 5 The average of n numbers x 1 , x 2 , x 3 , ......, x n is M. If x n is replaced by x , then new average is
nM x n x (n 1)M x M xn x
(a) M xn x (b) (c) (d)
n n n
x 1 x 2 x 3 ...... x n
Solution: (b) M i.e. nM x 1 x 2 x 3 ..... x n 1 x n
n nM x n x 1 x 2 x 3 ..... x n 1
nM x n x x 1 x 2 x 3 ..... x n 1 x
n n
nM x n x
New average
n
Example: 6 Mean of 100 items is 49. It was discovered that three items which should have been 60, 70, 80 were wrongly read as 40, 20, 50
respectively. The correct mean is
1
(a) 48 (b) 82 (c) 50 (d) 80
2
Solution: (c) Sum of 100 items 49 100 4900
Sum of items added 60 70 80 210
Sum of items replaced 40 20 50 110
New sum 4900 210 110 5000
5000
Correct mean 50
100
Median.
Median is defined as the value of an item or observation above or below which lies on an equal number of observations i.e.,
the median is the central value of the set of observations provided all the observations are arranged in the ascending or descending
orders.
@aakashallen
(iii) For grouped or continuous distributions : In this case, following formula can be used
N
C
2 i
(a) For series in ascending order, Median = l
f
Where l = Lower limit of the median class
f = Frequency of the median class
N = The sum of all frequencies
i = The width of the median class
C = The cumulative frequency of the class preceding to median class.
As median divides a distribution into two equal parts, similarly the quartiles, quantiles, deciles and percentiles divide the
N
j C
distribution respectively into 4, 5, 10 and 100 equal parts. The jth quartile is given by Q j l 4 i; j 1, 2, 3 . Q1 is the
f
lower quartile, Q 2 is the median and Q 3 is called the upper quartile.
@aakashallen
(5) Percentile : Percentile divide total frequencies N into hundred equal parts
N k
C
Pk l 100 i
f
where k = 1, 2, 3, 4, 5,.......,99.
Height (in cm) 150 152 154 155 156 160 161
Number of students 8 4 3 7 3 12 4
Cumulative frequency 8 12 15 22 25 37 41
41 1
Here, total number of items is 41, i.e. an odd number. Hence, the median is th i.e. 21 item.
st
2
From cumulative frequency table, we find that median i.e. 21 item is 155.
st
Example: 8 The median of a set of 9 distinct observation is 20.5. If each of the largest 4 observation of the set is increased by 2, then the median of
the new set
(a) Is increased by 2 (b) Is decreased by 2
(c) Is two times the original median (d) Remains the same as that of the original set
th
9 1
Solution: (d) n = 9, then median term 5 th term . Since last four observation are increased by 2.
2
∵ The median is 5 observation which is remaining unchanged.
th
@aakashallen
C = Total of all frequencies preceding median class = 50
i = Width of class interval of median class = 10
N 159
C 50
295
Required median l 2 i 30 2 10 30 36.55 .
f 45 45
Mode.
Mode : The mode or model value of a distribution is that value of the variable for which the frequency is maximum. For
f1 f0
continuous series, mode is calculated as, Mode l1 i
2 f1 f0 f 2
Where, l1 = The lower limit of the model class
f1 = The frequency of the model class
f0 = The frequency of the class preceding the model class
f2 = The frequency of the class succeeding the model class
i = The size of the model class.
Symmetric distribution : A symmetric is a symmetric distribution if the values of mean, mode and median coincide. In a
symmetric distribution frequencies are symmetrically distributed on both sides of the centre point of the frequency curve.
Mean
Mean Mode Median
Median Mode
A distribution which is not symmetric is called a skewed-distribution. In a moderately asymmetric the interval between the
mean and the median is approximately one-third of the interval between the mean and the mode i.e. we have the following empirical
relation between them
Mean – Mode = 3(Mean – Median) Mode = 3 Median – 2 Mean. It is known as Empirical relation.
@aakashallen
Example: 11 Consider the following statements
(1) Mode can be computed from histogram
(2) Median is not independent of change of scale
(3) Variance is independent of change of origin and scale
Which of these is/are correct
(a) (1), (2) and (3) (b) Only (2) (c) Only (1) and (2) (d) Only (1)
Solution: (d) It is obvious.
Important Tips
Some points about arithmetic mean
• Of all types of averages the arithmetic mean is most commonly used average.
• It is based upon all observations.
• If the number of observations is very large, it is more accurate and more reliable basis for comparison.
Some points about geometric mean
• It is based on all items of the series.
• It is most suitable for constructing index number, average ratios, percentages etc.
• G.M. cannot be calculated if the size of any of the items is zero or negative.
Some points about H.M.
• It is based on all item of the series.
• This is useful in problems related with rates, ratios, time etc.
• A.M. G.M. H.M. and also (G.M.)2 ( A.M.)(H.M.)
Some points about median
• It is an appropriate average in dealing with qualitative data, like intelligence, wealth etc.
• The sum of the deviations of the items from median, ignoring algebraic signs, is less than the sum from any other point.
Some points about mode
• It is not based on all items of the series.
• As compared to other averages mode is affected to a large extent by fluctuations of sampling,.
• It is not suitable in a case where the relative importance of items have to be considered.
Example: 12 If for a slightly assymetric distribution, mean and median are 5 and 6 respectively. What is its mode [DCE 1998]
(a) 5 (b) 6 (c) 7 (d) 8
Solution: (d) We know that
Mode = 3Median – 2Mean
= 3(6) – 2(5) = 8
@aakashallen
Example: 13 A pie chart is to be drawn for representing the following data
Items of expenditure Number of families
Education 150
Food and clothing 400
House rent 40
Electricity 250
Miscellaneous 160
The value of the central angle for food and clothing would be [NDA 1998]
(a) 90° (b) 2.8° (c) 150° (d) 144°
400
Solution: (d) Required angle for food and clothing 360 144
1000
Measure of Dispersion.
The degree to which numerical data tend to spread about an average value is called the dispersion of the data. The four
measure of dispersion are
(1) Range (2) Mean deviation (3) Standard deviation (4) Square deviation
(1) Range : It is the difference between the values of extreme items in a series. Range = Xmax – Xmin
x x min
The coefficient of range (scatter) max .
x max x min
Range is not the measure of central tendency. Range is widely used in statistical series relating to quality control in
production.
(i) Inter-quartile range : We know that quartiles are the magnitudes of the items which divide the distribution into four
equal parts. The inter-quartile range is found by taking the difference between third and first quartiles and is given by the formula
Inter-quartile range Q 3 Q1
Where Q1 = First quartile or lower quartile and Q3 = Third quartile or upper quartile.
(ii) Percentile range : This is measured by the following formula
Percentile range P90 P10
Where P90 = 90th percentile and P10 = 10th percentile.
Percentile range is considered better than range as well as inter-quartile range.
(iii) Quartile deviation or semi inter-quartile range : It is one-half of the difference between the third quartile and first
Q Q1 Q Q1
quartile i.e., Q.D. 3 and coefficient of quartile deviation 3 .
2 Q 3 Q1
Where, Q3 is the third or upper quartile and Q1 is the lowest or first quartile.
(2) Mean deviation : The arithmetic average of the deviations (all taking positive) from the mean, median or mode is
known as mean deviation.
(i) Mean deviation from ungrouped data (or individual series)
| x M |
Mean deviation
n
Where |x – M| means the modulus of the deviation of the variate from the mean (mean, median or mode). M and n is the
number of terms.
(ii) Mean deviation from continuous series : Here first of all we find the mean from which deviation is to be taken. Then we
find the deviation dM | x M | of each variate from the mean M so obtained.
Next we multiply these deviations by the corresponding frequency and find the product f.dM and then the sum f dM of
these products.
f | x M | f dM
Lastly we use the formula, mean deviation , where n = f.
n n
@aakashallen
Important Tips
Mean deviation from the mean
Mean coefficient of dispersion
Mean
Mean deviation from the median
Median coefficient of dispersion
Median
Mean deviation from the mode
Mode coefficient of dispersion
Mode
In general, mean deviation (M.D.) always stands for mean deviation about median.
(3) Standard deviation : Standard deviation (or S.D.) is the square root of the arithmetic mean of the square of deviations
of various values from their arithmetic mean and is generally denoted by read as sigma.
(i) Coefficient of standard deviation : To compare the dispersion of two frequency distributions the relative measure of
standard deviation is computed which is known as coefficient of standard deviation and is given by
Coefficient of S.D. , where x is the A.M.
x
(ii) Standard deviation from individual series
(x x ) 2
N
where, x = The arithmetic mean of series
N = The total frequency.
(iii) Standard deviation from continuous series
fi (x i x ) 2
N
where, x = Arithmetic mean of series
x i = Mid value of the class
fi = Frequency of the corresponding x i
N = f = The total frequency
Short cut method
2 2
fd 2 fd d2 d
(i) (ii)
N N N N
where, d = x – A = Deviation from the assumed mean A
f = Frequency of the item
N = f = Sum of frequencies
@aakashallen
Variance.
The square of standard deviation is called the variance.
Coefficient of standard deviation and variance : The coefficient of standard deviation is the ratio of the S.D. to A.M. i.e., .
x
Coefficient of variance = coefficient of S.D. 100 100 .
x
Variance of the combined series : If n1 ;n 2 are the sizes, x 1 ; x 2 the means and 1 ; 2 the standard deviation of two
1
series, then 2 [n1 ( 12 d 12 ) n 2 ( 22 d 22 )]
n1 n 2
n1 x 1 n 2 x 2
Where, d 1 x 1 x , d 2 x 2 x and x .
n1 n 2
Important Tips
Range is widely used in statistical series relating to quality control in production.
Standard deviation ≤ Range i.e., variance ≤ (Range) .2
Skewness.
(x i )3
“Skewness” measures the lack of symmetry. It is measured by 1 and is denoted by 1 .
{(x i 2 )} 3 / 2
The distribution is skewed if,
(i) Mean Median Mode
(ii) Quartiles are not equidistant from the median and
(iii) The frequency curve is stretched more to one side than to the other.
(1) Distribution : There are three types of distributions
(i) Normal distribution : When 1 0 , the distribution is said to be normal. In this case
Mean = Median = Mode
(ii) Positively skewed distribution : When 1 0 , the distribution is said to be positively skewed. In this case
Mean > Median > Mode
(iii) Negative skewed distribution : When 1 0 , the distribution is said to be negatively skewed. In this case
Mean < Median < Mode
@aakashallen
(2) Measures of skewness
(i) Absolute measures of skewness : Various measures of skewness are
(a) S K M M d (b) S K M M o (c) S k Q 3 Q1 2 M d
where, M d = median, M o = mode, M = mean
Absolute measures of skewness are not useful to compare two series, therefore relative measure of dispersion are used, as
they are pure numbers.
2
| xi M | | x i 47 | 13 9 5 3 1 1 7 8 16 23
Mean deviation 8.6
n 10 10
Example: 15 S.D. of data is 6 when each observation is increased by 1, then the S.D. of new data is
(a) 5 (b) 7 (c) 6 (d) 8
Solution: (c) S.D. and variance of data is not changed, when each observation is increased (OR decreased) by the same constant.
Example: 16 In a series of 2n observations, half of them equal a and remaining half equal –a. If the standard deviation of the observations is 2, then |a|
equals
2 1
(a) (b) 2 (c) 2 (d)
n n
n(a 0)2 n(a 0)2
Solution: (c) Let a, a, ..........n times – a, – a, – a, – a, ----- n time i.e. mean = 0 and S.D.
2n
na 2 na 2
2 a 2 a . Hence | a | 2
2n
Example: 17 If is the mean of distribution (y i , fi ) , then fi (y i )
(a) M.D. (b) S.D. (c) 0 (d) Relative frequency
fi y i
Solution: (c) We have, fi (y i ) fi y i fi fi fi 0
fi
Example: 18 What is the standard deviation of the following series
Measurements 0-10 10-20 20-30 30-40
Frequency 1 3 4 2
(a) 81 (b) 7.6 (c) 9 (d) 2.26
Solution: (c)
Class Frequency y yi A fu fu 2
ui , A = 25
i i i i i
10
0-10 1 5 –2 –2 4
10-20 3 15 –1 –3 3
20-30 4 25 0 0 0
30-40 2 35 1 2 2
10 –3 9
f u2
2
2
2 fiui 10 2 9 3 90 9 81 = 9
2
c
2
i i
fi fi
10 10
@aakashallen
Example: 19 In an experiment with 15 observations on x, the following results were available x 2 2830 , x 170 . On observation that was 20
was found to be wrong and was replaced by the correct value 30. Then the corrected variance is
4
th
3(n 1)
Q 3 size of item = size of 6 item = 19
th
4
Q Q1 19 10
Then Q.D. 3 4 .5
2 2
Example: 21 Karl-Pearson’s coefficient of skewness of a distribution is 0.32. Its S.D. is 6.5 and mean 39.6. Then the median of the distribution is given
by
(a) 28.61 (b) 38.81 (c) 29.13 (d) 28.31
M Mo
Solution: (b) We know that S k , Where M = Mean, M o = Mode, = S.D.
39 . 6 M o
i.e. 0 .32 M o 37 . 52 and also know that, M o 3median – 2mean
6 .5
37.52 3(Median) – 2(39.6)
Median = 38.81 (approx.)
ax b
Example: 22 The S.D. of a variate x is . The S.D. of the variate where a, b, c are constant, is
c
a a
a2
(a) (b) (c)
(d) None of these
c
c2
c
ax b a b a b
Solution: (b) Let y i.e., y x i.e. y Ax B , where A , B
c c c c c
y Ax B
y y A(x x ) (y y )2 A2 (x x )2 (y y )2 A2 (x x )2 n. y2 A 2 .n x2 y2 A 2 x2
a
y | A | x y x
c
a
Thus, new S.D. .
c
@aakashallen