Lecture III Probability and Statistics
Lecture III Probability and Statistics
A numerical summary for a set of data is referred to as a statistic if the data set is a sample and
a parameter if the data set is the entire population.
Numerical summaries are categorized as measures of location and measures of spread.
Measures of location can further be classified into measures of central tendancy and measures
of relative positioning (quantiles).
n n
Page 1
Example 2 The grades of a student on six examinations were 84, 91, 72, 68, 91 and 72. Find
the arithmetic mean.
1 n 1(84) + 2(91) + 2(72) + 1(68)
The arithmetic mean x =
n i =1
fi xi ==
1+ 2 + 2 +1
= 79.67
Exercise
1 Find the mean of 9, 3, 4, 2, 1, 5, 8, 4, 7, 3
2 A sample of 5 executives received the following amount of bonus last year: sh 14,000, sh
15,000, sh 17,000, sh 16,000 and sh y. Find the value of y if the average bonus for the
5executives is sh 15,400
(x − a ) a=x.
2
(2) i is minimum if and only if
i =1
(3) If n1 numbers have mean x1 , n2 numbers have mean x 2 ,…, nk numbers have
mean x k , then the mean of all the numbers called the combined mean is given by
xc =
n1x1 + n2 x 2 + ... + nk x k
=
n x i i
n11 + n2 + ... + nk n i
Median
It’s the value below which and above which half of the observations fall when ranked in order
of size. The position of the median term is given by (n+21 )th Value .
NB if n is even we average the middle 2 terms
For grouped data median is estimated using the formular
( n +1 ) − Cf a
Median = LCB + 2 i
f
where LCB, f and i are the lower class boundary, frequency and class interval of the median
class. Cfa is the cumuilative frequency of the class above the median class.
Remark: The disadvantage of median is that it is not sensitive against changes in the data.
Mode
It’s the value occurring most frequently in a data set. If each observation occurs the same
number of times, then there is no mode. When 2 or more observation occurs most frequently in
a data then the data is said to be multimodal.
Page 2
For ungrouped data it’s very easy to pick out the mode. However If the data is grouped, mode
f − fa
is estimated using the formular Mode = LCB + i
2 f − f a − f b
where LCB, f and i are the lower class boundary, frequency and class interval of the modal
class. fa and fb are frequencies of the class above and below the modal class respectively.
Remark The Empirical Relation between the Mean, Median and Mode
MEAN − MODE = 3 ( MEAN − MEDIAN )
The above relation is true for unimodal frequency curves which are asymmetrical.
Example 1 Find the median and mode for the data: 19, 13, 18, 14, 12, 25, 11, 17, 16, 23, 19.
Solution
Sorted data: 11, 12, 13, 14, 16, 17, 18, 19, 19, 23, 25.
n = 11 thus Median = (112+1 ) Value = 6th Value = 17
th
Mode=19 since it appears most frequently in this data set as compared to other observations.
Example 2 Find the median and mode of the data: 2, 4, 8, 7, 9, 4, 6, 10, 8, and 5.
Solution
Array: 2, 4, 4, 5, 6, 7, 8, 8, 9, 10 Mode 4 and 8 ie the data is bimodal.
n = 10 thus Median = (102+1 ) Value = 5.5th Value = 6 +2 7 = 6.5
th
Example 3
Estimate the mean, median and mode for the following frequency distribution:
Class 5-9 10-14 15-19 20-24 25-29 30-34 35-39
Freq 5 12 32 40 16 9 6
Solution
Boundarie 4.5- 9.5- 14.5- 19.5- 24.5- 29.5- 34.5-
s 9.5 14.5 19.5 24.5 29.5 34.5 39.5
Mid pts (x) 7 12 17 22 27 32 37
Frequency 5 12 32 40 16 9 6
Xf 35 144 544 880 432 288 222
CF 5 17 49 89 105 114 120
Mean x =
fx = 35 + 144 + ... + 222 = 2545 21.2083
n 120 120
n = 120 thus Median = 60.5th Value Median class is 19.5-24.5 thus
( n+1 ) − Cf a 60.5 − 49
Median = LCB + 2 i = 19.5 + 5 = 20.9375
f 40
The modal class (class with the highest frequency) is 19.5-24.5 therefore
f − fa 40 − 32
Mode = LCB + i = 19.5 + 5 = 20.75
2 f − f a − fb 80 − 32 − 16
Exercise
1. Find the mean median and mode for the following data: 9, 3, 4, 2, 9, 5, 8, 4, 7, 4
Page 3
2. Find the mean median and mode of 1, 2, 2, 3, 4, 4, 5, 5, 5, 5, 7, 8, 8 and 9
3. The number of goals scored in 15 hockey matches is shown in the table.
No of goals 1 2 3 4 5
No of matches 2 1 5 3 4
Calculate the mean number of goals cored
4. The table shows the heights of 30 students in a class calculate an estimate of the mean mode
and median height.
Height (cm) 140<x<14 144<x<14 148<x<15 152<x<15 156<x<16 160<x<16
4 8 2 6 0 4
No of 4 5 8 7 5 1
students
5. Estimate the mean, median and mode for the following 2 frequency distribution:
Class 1-4 5-8 9-12 13-16 17-20 21-24
frequency 10 14 20 16 12 8
Example 1 Consider the following table with marks obtained by two students James (mark
x) and John (mark y). The weights are to be used in determining who joins the engineering
course whose requirement is a weighted mean of 58% on the four subjects below;
Subject Maths English History Physics Total
Mark x 25 87 83 30 225
Mark y 70 45 35 75 225
Weight 3.6 2.3 1.5 2.6 10
Now x w = w y
wi xi 492 .6 603
= = 49.26 and yw = = = 60.3
i i
w i 10 w i 10
Clearly John qualifies but James does not.
Page 4
Example 2 If a final examination is weighted 4 times as much as a quiz, a midterm examination
3 times as much as a quiz, and a student has a final examination grade of 80, a midterm
examination grade of 95 and quiz grades of 90, 65 and 70, the mean grade is
1(90)+ 1(65)+ 1(70)+ 3(95)+ 4 (80) 830
X= = 83 . =
1+ 1+ 1+ 3 + 4 10
Question A tycoon has 3 house girls who he pays Ksh 4,000 each per month, 2 watch men
who he pays Ksh 5,000 each and some garden men who receives Ksh 7,000 each. If he pays
out an average of Ksh 5,700 per month to these people, find the number of garden men.
Question A student’s grades in laboratory, lecture, and recitation parts of a computer course
were 71, 78, and 89, respectively.
(a) If the weights accorded these grades are 2,4, and 5, respectively, what is an average grade?
(b) What is the average grade if equal weights are used?
where n = f i
n
for grouped data, one has; GM = n x f x f .... x f = n
1
1
2
2
n
n
x
i =1
fi
i
If n is too large, the nth root might be a challenge. In this case, the geometric mean is computed
n f n
through logarithm and in this case one has, log GM == log n x i i 1n f i log xi
i =1 i =1
Taking the antlog, one obtains G, the geometric mean.
Example
Yearly percentage growth of urban population in Kenya from 1965 to 1970 is given below;
year 1965 1966 1967 1968 1969 1970
%Increase 8.35 17.91 33.08 40.32 35.8 47.67
Obtain the average percentage growth rate of urban population from 1965 to 1971 in Kenya.
Solution
Geometric Mean should be used because the measurements are in percentages.
GM = 6 8.35 17.91 33.08 40.32 35.80 47.67 = 26.42405
Question Below are the yearly percentages profits made by a company in six successive years;
52.22 46.59 21.36 30.17 22.87 17.77 Determine the mean percentage profits made by the
company in the last six years.
n=fi
n n
For a grouped data we have; HM = =
f1
x1
+ f2
x2
+ .... + fn
xn ( ) where
fi
xi
Note: This type of mean cannot be used when any observation x1 , x2 ,...., xn is zero.
Example A production machine is programmed to run for four hours only per day. In the first
hour, its output is 40 units/h, in the second hour its output is 60 units/h, in the third hour the
output is 70 units/h while in the fourth hour the output is 30 units/h. Determine the average
output per hour of the machine.
Solution
Using Harmonic mean one has;
4
HM = 1 = 45 units/hr .
401 + 601 + 701 + 301
1 1 1
Page 6