Statistics - Chapter Notes
Statistics - Chapter Notes
1. DEFINITION :
2. DATA :
Measure of central value gives rough idea about where data points are centred. Mean, mode, median
are three measure of central tendency.
(A) MEAN :
The mean is the most common measure of central tendency and the one that can be mathematically
manipulated. It is defined as the average of a distribution is equal to the X / N. Simply, the mean is
computed bysumming all the scores in the distribution (X) and dividing that sum bythe total number of
scores (N).
Page # 14
www.rancho.in
(I) Arithmetic mean of individual series (Ungrouped data) :
If the series in this case be x1, x2, x3, ......., xn ; then the arithmetic mean x is given by
n
Sum of the series x1 x 2 x 3 ...... x n 1
i.e., x = =
N
= N xi .
Number of terms i 1
f1x1 f 2 x 2 ........ f n x n 1 n n
x = = fi x i . fi N
N N i 1 i 1
1 n
Arithmetic mean ( x ) =A + fi ( x i A) ,
N i 1
where A= assumed mean, f = frequency and x – A= deviation of each item from the assumed mean.
n
nixi
n1x1 n 2 x 2 ......... n k x k i 1
x = = n .
n1 n 2 ..... n k
ni
i 1
Page # 15
www.rancho.in
Properties of arithmetic mean :
If each of the values of a variable 'X' is increased of decreased by some constant k, then arithmetic mean
also increased of decreased by k.
Similarlywhen the value of the variable 'X' are multiplied/divided byconstant sayk, arithmetic mean also
multiplied /divided bythe same quantity k.
Illustration :
The mean weight of 150 persons in a group is 60 kg. The mean weight of men in the group is 70 kg and
that of the women is 55 kg. Find the number of men and women.
Sol. Number of person = 150; their mean weight = 60 kg;
mean weight of men ( x1 ) = 70 kg and
70n1 55n 2
or 60 =
150
or 3n1 = (1800 – 1650) = 150
or n1 = 50 and n2 = 100
Illustration :
Find the mean of the following data :
Marks obtained 10 20 20 30 30 40 40 50 50 60 60 70 70 80
Number of students 2 3 8 14 8 3 2
Sol. Method-1 :
Marks Number of
Mid - points fi xi
obtained students
10 20 2 15 30
20 30 3 25 75
30 40 8 35 280
40 50 14 45 630
50 60 8 55 440
60 70 3 65 195
70 80 2 75 150
40 1800
Page # 16
www.rancho.in
7 7
N = fi = 40, fi x i = 1800
i 1 i 1
1 7 1800
x
N i1
fi x i =
40
= 45
Method-2 :
10 80
Asumed mean a = = 45, h = 10
2
Marks Number of x i 45
Mid - points d i fidi
obtained students 10
10 20 2 15 3 6
20 30 3 25 2 6
30 40 8 35 1 8
40 50 14 45 0 0
50 60 8 55 1 8
60 70 3 65 2 6
70 80 2 75 3 6
40 0
7
fi di 0
x =a+ i 1
= 45 + × 10 = 45
N 40
(B) MEDIAN :
(a) Definition : The median is the score that divides the distribution into halves; half of the scores are above
the median and half are below it when the data are arranged in numerical order. The median is also
referred to as the score at the 50th percentile in the distribution.
Calculation of median :
(i) Individual series : If the data is raw, arrange in ascending or descending order. Let n be the number of
observations.
th
n 1
If n is odd, Median = value of item.
2
1
th th
n n
If n is even, Median = value of item value of 1 item .
2 2 2
Page # 17
www.rancho.in
(ii) Discrete series :In this case, we first find the cumulative frequencies of the variable arranged in ascending
th
n
or descendingorder and the median is given byMedian = 1 observation, where n is the cumulative
2
frequency.
(iii) For grouped or continuous distributions : In this case, following formula can be used.
N
C
2 i
Median = l +
f
(b) Quartile :As median, divides a distribution into two equal parts, similarlythe quartiles, quantiles, deciles
and percentiles divide the distribution respectively into 4, 5, 10 and 100 equal parts. The jth quartile is
N
j C
given by Qj = l + 10 i.
f
Illustration :
The marks obtained by 10 students in an examination are 22, 26, 14, 30, 18, 11, 35, 41, 12, 32. What
is the median mark?
Sol. Number of students (n) = 10 and marks obtained by them = 22, 26, 14, 30, 18, 11, 35, 41, 12, 32
Arranging the given marks in the ascending order, we get 11, 12, 14, 18, 22, 26, 30, 32, 35, 41.
Since the number of students is even, therefore median of their marks
10 10 2
= Arithmetic mean of and marks
2 2
= Arithmetic mean of 5th and 6th marks
22 26
= = 24 Ans.
2
Page # 18
www.rancho.in
Illustration :
Calculate the median of the following data:
N
Here, N = 60. So = 30.
2
N
The cumulative frequency just greater than = 30 is 40 and the corresponding class is 40-50.
2
So, 40-50 is the median class.
l = 40, f = 20, h = 10, F = 20
N
F
30 20
Now, Median = l 2 h = 40 10 = 55 Ans.
f 20
(C) MODE :
Mode is the most frequent score in the distribution.Adistribution where a single score is most frequent
has one mode and is called unimodal. When there are ties for the most frequent score, the distribution is
bimodal if two scores tie or multimodal if more than two scores tie.
Mode for continuous series
f f
Mode = l1 + 1 0 i
2f1 f 0 f 2
Page # 19
www.rancho.in
Where, l1 = The lower limit of the model class
f1 = The frequency of the model class
f0 = The frequency of the class preceding the model class
f2 = The frequency of the class succeeding the model class
i = The size of the model class.
Symmetric distribution :
A distribution is a symmetric distribution if the values of mean, mode and median coincide. In a symmetric
distribution frequencies are symmetrically distributed on both sides of the centre point of the frequency
curve.
Positively skewed :
A distribution is positively skewed when is has a tail extending out to the right (larger numbers) When a
distribution is positivelyskewed, the mean is greater than the median reflecting the fact that the mean is
sensitive to each score in the distribution and is subject to large shifts when the sample is small and
contains extreme scores.
Negatively skewed :
A negativelyskewed distribution has an extended tail pointing to the left (smaller numbers) and reflects
bunching of numbers in the upper part of the distribution with fewer scores at the lower end of the
measurement scale.
Page # 20
www.rancho.in
In a moderatelyasymmetric distribution, the interval between the mean and the median is approximately
one-third of the interval between the mean and the mode i.e., when have the following empirical relation
between them,
Empirical formula : mode = 3 median – 2 mean
Mean Mode
Coefficient of skewness =
Measures of variability provide information about the degree to which individual scores are clustered
about or deviate from the average value in a distribution i.e.,
The degree to which numerical data tend to spread about an average value is called the dispersion of the
data. The four measure of dispersion are
(i) Range (ii) Mean deviation
(iii) Variance (iv) Standard deviation
Important Note :
(a) A small value for a measure of dispersion indicate that the data are clustered closely(the mean is therefore
representative of the data).
(b) A large value of dispersion indicates that the mean is not reliable (it is not representative of the data).
(i) Range :
The simplest measure of variability to compute and understand is the range. The range is the difference
between the highest and lowest score in a distribution. Because it is based solely on the most extreme
scores in the distribution and does not fully reflect the pattern of variation within a distribution, the range
is a very limited measure of variability.
LS
Coefficient of range :
L S
L = Largest value
S = Smallest value
Page # 21
www.rancho.in
(a) Mean deviation from ungrouped data (or individual series)
1 n
Mean deviation = xi M .
N i 1
n
Where x i M is the sum of modulus of the deviation of the variate from the mean (mean, median
i 1
1 n n
So, Mean deviation = fi x i M , where N =
N i 1
fi .
i 1
Illustration :
The scores of a batsman in ten innings are : 38, 70, 48, 34, 42, 55, 63, 46, 54, 44. Find the mean
deviation about the median.
Sol. Arranging the data in ascending order, we have
34, 38, 42, 44, 46, 48, 54, 55, 63, 70
Here n = 10. So, median is the A.M. of 5th and 6th observations.
46 48
Median, M = = 47
2
Calculation of Mean Deviation
xi |di| = |xi – 47|
38 9
70 23
48 1
34 13
42 5
55 8
63 16
46 1
54 7
44 3
Total |di| = 86
1 86
M.D. = di = = 8.6 Ans.
n 10
Page # 22
www.rancho.in
Illustration :
Calculate the mean deviation from the median of the following data:
Age 16 20 21 25 26 30 31 35 36 40 41 45 46 50 51 55
Number 5 6 12 14 26 12 16 9
Since given data is not continuous frequency distribution but we can make it continuous frequency
distribution by subtracting lower limit by 0.5 and adding 0.5 to upper limit of every group.
Sol. Calculation of Mean Deviation from Median
N
Here, N = 100. So = 50.
2
N
The cumulative frequency just greater than = 50 is 63 and the corresponding class is 35.5-40.5.
2
So, 35.5-40.5 is the median class.
l =35.5, f = 26, h = 5, C = 37
N
C
50 37
Now, Median = l 2 h = 35.5 5 = 38 Ans.
f 26
f i | d i | 735
Mean Deviation from median = = = 7.35 Ans.
N 100
Page # 23
www.rancho.in
(iii) Variance or Var(X) or 2 :
The variance is a measure based on the deviations of individual scores from the mean. As noted in the
definition of the mean, however, simplysumming the deviations will result in a value of 0.The get around
this problem the variance is based on squared deviations of scores about the mean. When the deviations
are squared, the rank order and relative distance of scores in the distribution is preserved while negative
values are eliminated. Then to control for the number of subjects in the distribution, the sum of the
squared deviations, (X X ) 2 , is divided by N(population). The average of the sum of the squared
deviations is called the variance.
2
1 n 1 n 2 1 n
Var(X) =
n
(xi X) = n
2
i 1
x i x i
n i 1
i 1
2
n
1 n 2 1 n
n
1
= i i fi x i fi N
Var (X) = fi ( x i X ) 2
N i 1
f x
N i 1 N i 1 i 1
1 2 1
2
xi X
Var (X) = h2 fi u i fi u i ui =
N N h
Page # 24
www.rancho.in
(iv) Standard Deviation :
The standard deviation (s or ) is defined as the positive square root of the variance. The variance is a
measure in squared units and has little meaning with respect to the data. Thus, the standard deviation is
a measure of variability expressed in the same units as the data. The standard deviation is verymuch like
a mean or an "average" of these deviations.
standards deviations 1 & 2, then the mean x and the standard deviations of n1 + n2 observations,
taken together, are
n1x1 n 2 x 2
x = n1 n 2
2 =
1
n 2 d12 n 2 22 d 22
n1 n 2 1 1
where d1 = x – x1 , d2 = x – x 2
Illustration :
Calculate the mean and standard deviation of first n natural numbers.
Sol. Here xi = i = i = 1, 2,........, n. Let X be the mean and be the S.D. Then,
1 n 1 n 1
X =
n i 1
x i =
n i1
i = (1 + 2 + 3 + ...... + n)
n
n (n 1) n 1
X = =
2n 2
2
1 n 2 1 n 1 2 n 1
2
= x i x i
2 2
and 2 2 = (1 2 ...... n )
n i 1 n i 1 n 2
2
n (n 1)(2n 1) n 1 (n 1)(2n 1) (n 1) 2 n 2 1
2 = 2 = = Ans.
6n 2 6 4 12
Page # 25
www.rancho.in
Illustration :
The mean and variance of 7 observations are 8 and 16 respectively. If 5 of the observations are
2, 4, 10, 12, 14, find the remaining two observations.
Sol. Let x and y be the remaining two observation. Then,
Mean = 8
2 4 10 12 14 x y
=8 42 + x + y = 56
7
x + y = 14 .....(i)
Variance = 16
1 2
(2 + 42 +102 + 122 + 142 + x2 + y2) – (Mean)2 = 16
7
1
(4 + 16 +100 + 144 + 196 + x2 + y2) – 64 = 16 460 + x2 + y2 = 7 × 80
7
x2 + y2 = 100 .....(ii)
2 2 2
Now, (x + y) + (x – y) = 2(x + y ) 2
Illustration :
Find the variance and standard deviation for the following distribution:
Classes 30 40 40 50 50 60 60 70 70 80 80 90 90 100
Frequency 3 7 12 15 8 3 2
Therefore x = A +
fi yi × h = 65 – 15 × 10 = 62
50 50
variance 2
h2
N
= 2 N f i y i f i yi
2
2
=
(10)2
(50) 2
50 105 (15)2 =
1
25
[5250 – 225] = 201
Page # 26
www.rancho.in
Illustration :
The mean and standard deviation of 20 observations are found to be 10 and 2 respectively. On rechecking,
it was found that an observation 8 was incorrect. Calculate the correct mean and standard deviation in
each of the following cases:
(i) If the wrong item is omitted.
Sol. We have , n = 20, X = 10 and = 2
1
X = x i xi = n X = 20 × 10 = 200 Incorrect xi = 200
n
1
and, = 2 2 = 4 x i 2 – (Mean)2 = 4
n
1
x i 2 – 100 = 4 xi2 = 104 × 20 Incorrect xi2 = 2080
20
(i) When 8 is omitted from the data.
If 8 is omitted from the data, then 19 observations are left.
Now Incorrect xi = 200 Correct xi + 8 = 200 Correct xi = 192
and Incorrect xi = 2080 Correct xi2 + 82 = 2080 Correct xi2 = 2016
2
192
Correct mean 10.10
19
1
Correct variance = (Correct xi2) – (Correct mean)2
19
2
2016 192
Correct variance =
19 19
38304 36864 1440
Correct variance = =
361 361
1440 12 10
Correct standard deviation = = = 1.997
361 19
where and X are the standard deviation and mean of the data.
For comparing the variability of two series, we calculate the coefficient of variation for each series. The
series having greater C.V. is said to be more variable or conversely less consistent, less uniform less
stable or less homogeneous than the other and the series having lesser C.V. is said to be more consistent
(or homogeneous) than the other.
Page # 27
www.rancho.in
Illustration :
The following values are calculated in respect of heights and weights of the students of a section of
Class XI :
Height Weight
Mean 162.6 cm 52.36
Variance 127.69 cm 2 23.1361 kg2
Can we say that the weights show greater variation than the heights ?
Sol. To compare the variability, we have to calculate their coefficients of variation
Given Variance of height = 127.69 cm2
Therefore Standard deviation of height 127.69 cm = 11.3 cm
Also Variance of weight = 23.1361 kg2
Therefore Standard deviation of weight = 23.1361 kg = 4.81 kg
Now, the coefficient of variations (C.V.) are given by
Standard Deviation
(C.V.) in heights = × 100
Mean
11.3
= × 100 = 6.95
162.6
4.81
and (C.V.) in weight = × 1000 = 9.18
52.36
Clearly C.V. in weights is greater than the C.V. in heights
Therefore, we can say that weights show more variability than heights.
IMPORTANT DEFINITIONS :
1. Raw Data :
Data collected in original form.
2. Frequency :
The number of times a certain value or class of values occurs.
3. Frequency Distribution :
The organization of raw data in table form with classes and frequencies.
Page # 28
www.rancho.in
7. Class Limits :
Separate one class in a grouped frequency distribution from another. The limits could actually appear in
the data and have gaps between the upper limit of one class and the lower limit of the next.
8. Class Boundaries :
Separate one class in a grouped frequency distribution from another. The boundaries have one more
decimal place than the raw data and therefore do not appear in the data. There is no gap between the
upper boundary of one class and the lower boundary of the next class. The lower class boundary is
found by subtracting 0.5 units from the lower class limit and the upper class boundaryis found by adding
0.5 units to the upper class limit.
9. Class Width :
The difference between the upper and lower boundaries of anyclass. The class width is also the difference
between the lower limits of two consecutive classes or the upper limits of two consecutive classes. It is
not the difference between the upper and lower limits of the same class.
14. Histogram :
A graph which displays the data by using vertical bars of various heights to represent frequencies.
The horizontal axis can be either the class boundaries, the class marks, or the class limits.
16. Ogive :
Afrequency polygon of the cumulative frequencyor the relative cumulative frequency. The vertical axis
the cumulative frequency or relative cumulative frequency. The horizontal axis is the class boundaries.
The graph always starts at zero at the lowest class boundary and will end up at the total frequency
(for a cumulative frequency) or 1.00 (for a relative cumulative frequency).
Page # 29
www.rancho.in