Lecture 1
Lecture 1
i) Population
A group of individual persons, objects or items from which samples are taken.
ii) Sample: A sample is a subject of the population. A sample is a finite part of the
population whose properties are studied to gain information about the whole
population
iii) Survey/study population: This is the finite population from which we will select
our samples (the 100 stones).
iv) Population characteristic: this is the aspect of the population we wish to measure.
In this case, it is the weight of the pebbles.
v) Sampling unit: the individual unit we are sampling. In this case, it is an individual
pebble.
vi) Sampling frame: A list of all sampling units in the survey/study population. In this
case, it is a list containing the stones numbers 1 − 100 .
vii) Census: A survey consisting of every member of the population. A census would
involve weighing all 100 stones
1
Reasons for sampling
Cost
Sampling a fraction of the population is cheaper (cost effective) than conducting
a census.
Sampling rather than using a census saves time
We do sample because some population are partly accessible.
Some populations are very large
For accuracy purposes
Sampling Methods
Sampling is the act, process or technique of collecting a suitable sample, or presenting
part of the population for the purpose of determining parameters or characteristics of the
whole population.
i) Accessibility sampling: the most easily obtained observations are chosen.
ii) Judgment sampling: the experimenter chooses the sample based on what he
or she thinks is a representative sample
iii) Quota sampling: this typically combines accessibility and judgment sampling.
iv) Random sampling: members of the sample are chosen at random. There are
two types of basic random sampling.
Simple random sampling: This is random sampling without
replacement. Each population is either not in the sample or in once.
Simple random sampling gives equal probability of selection to every
permitted (unordered) sample of a given size.
Unrestricted random sampling: this is a random sampling with
replacement. All possible population members are available for each
random selection. So a population member may be in the sample more
than once.
v) Stratified sampling:
Sometimes populations within an entire population vary considerately. In this
case, it is advantageous to divide the sample into subpopulations called strata
and then perform simple random sampling within each stratum. This is known
as stratified sampling.
Types of data
2
Descriptive statistics
Presentation of;
Tables
Frequency table, cumulative frequency tables and the stem-and-leaf tables
Graphs
Histograms, frequency polygons, cumulative frequency polygons
𝑋 = 𝑙𝑎𝑟𝑔𝑒𝑡 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
If { 𝑚𝑎𝑥
𝑋𝑚𝑖𝑛 = 𝑙𝑜𝑤𝑒𝑠𝑡 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑛 = 100, find c.
Example 1
i) 𝑛 = 100 , find 𝑐
2𝑐 ≥ 100
21 ≱ 100
22 ≱ 100
23 ≱ 100
24 ≱ 100
25 ≱ 100
3
26 ≱ 100
27 ≥ 100
𝐶 = # 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 = 7
iii) Define the class limits (Lower limit ℒ 𝑖 and upper limit 𝒰𝑖 )
𝒰𝑖 = 𝐿𝑖 + 𝑤
𝐿𝑖 +𝒰𝑖
Class mark, 𝑋𝑖 = ,𝑖 = 1 , 2 ,3 ,⋯⋯𝑐
2
𝑛𝑖
ℱ𝑖 =
𝑛
4
Frequency table:
(𝑋𝑖 ) Freq
1 [𝐿1 − 𝒰1 [ 𝑋1 𝑛1 ℱ1 𝑛1
2 [𝐿2 − 𝒰2 [ 𝑋2 𝑛1 + 𝑛2 ℱ2 𝑛2
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
𝑐 [𝐿𝑐 − 𝒰𝑐 [ 𝑋𝑐 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑐 ℱ𝑐 𝑛𝑖
Example
50 , 55 , 58 , 60 , 62 , 63 , 64 , 69 , 70 , 72 , 77 , 80 , 84
Stems Leaves
5 0 5 8
6 0 2 3 4 9
7 0 2 7
8 0 4
5
Numerical or Descriptive Measures
Mean, 𝑋̅ (𝑋 bar):
Sample data 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛
𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛
𝑋̅ =
𝑛
𝑛
1
𝑋̅ = ∑ 𝑋𝑖
𝑛
𝑖=1
Median
Median observation refers to the middle observation or value when the data are
arranged in increasing sequence.
𝑋𝑚𝑖𝑛 𝑋𝑚𝑎𝑥
6
Median
𝑋(𝑛+1) 𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
2
⇒ 𝑀𝑑 = {𝑋(𝑛) + 𝑋(𝑛+1)
2 2
𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2
Example
Example
2, 4, 7, 11, 15
39
𝑋̅ = 5 = 7.8
𝑐
(𝑋1 − 𝑋̅)2 + (𝑋2 − 𝑋̅ )2 + ⋯ + (𝑋𝑐 − 𝑋̅)2 1
𝑉𝑎𝑟(𝑋) = = ∑(𝑋𝑖 − 𝑋̅ )2
𝑛−1 𝑛−1
𝑖=1
Variance
(2 − 7.8)2 + (4 − 7.8)2 + (7 − 7.8)2 + (11 − 7.8)2 + (15 − 7.8)2
(𝑆 2 ) =
5−1
110.8
= 4
= 27.7
Standard deviation S
S= √𝑆 2
7
Measure of skewness
3(𝑋̅−𝑀𝑑 )
𝑆𝑘 = 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑠𝑘𝑒𝑤𝑛𝑒𝑠𝑠 =
𝑆
ii) If 𝑆𝑘 is less than 0 (𝑋̅ < 𝑚𝑒𝑑𝑖𝑎𝑛), then the data is skewed to the left
iii) If 𝑆𝑘 is greater than 0(𝑋̅ > 𝑚𝑒𝑑𝑖𝑎𝑛), then the data is skewed to the right.
COEFFICIENT OF VARIATION
𝑆
CV = 𝑋̅
8
GROUPED DATA
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
𝑛𝑘
𝑘 [𝐿𝑘 − 𝒰𝑘 [ 𝑥𝑘 𝑛𝑘 𝐶𝐹𝑘 𝑛𝑘 𝑥 𝑘
𝑛
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
𝑛𝑐
𝑐 [𝐿𝑐 − 𝒰𝑐 [ 𝑥𝑐 𝑛𝑐 𝐶𝐹𝑐 𝑛𝑐 𝑥 𝑐
𝑛
9
Mean:
(𝑛1 𝑋1 + 𝑛2 𝑋2 + ⋯ + 𝑛𝑐 𝑋𝑐 )
𝑋̅ =
𝑛
Sample mean:
𝑐
1
= ∑ 𝑛𝑖 𝑋𝑖
𝑛
𝑖=1
Variance(𝑆 2 ):
Sample variance
𝑐
1
= ∑ 𝑛𝑖 (𝑋𝑖 − 𝑋̅ )2
𝑛−1
𝑖=1
Median(𝑴𝒅 )
Where 𝐶𝐹𝑘 is the cumulative frequency for the 𝑘 𝑡ℎ class and 𝑛 is the sample size
10
Mode (𝑀𝑜 )
Locate the modal class(class having the highest frequency). If we let 𝑘 be the modal
class, then
𝑑1
𝑀𝑜 = 𝐿𝑘 + 𝑤 (𝑑 )
1 +𝑑2
Where
𝑑1 = 𝑛𝑘 − 𝑛𝑘−1
𝑑2 = 𝑛𝑘 − 𝑛𝑘+1
Example
The following data represent the examinations scores obtained by 100 students in
MA110 course.
40 41 42 44 45 46 46 46 47 47 47 47
48 48 49 50 50 50 51 51 52 52 52 52
52 52 53 53 53 53 53 53 54 54 54 55
55 55 55 56 56 56 56 56 57 57 57 57
57 57 57 57 57 57 57 58 58 58 58 58
58 58 59 59 59 59 60 60 60 60 61 61
61 61 61 62 62 62 63 63 63 63 64 64
64 65 65 66 66 67 67 67 67 68 68 69
70 71 72 74
a. Construct a stem – and – leaf plot and use it to determine the mode and the
median for the above ungrouped data.
Stem leaves
4 0 1 2 4 5 6 6 6 7 7 7 7 8 8
5 0 0 0 1 1 2 2 2 2 2 2 3 3 3
6 0 0 0 0 1 1 1 1 1 2 2 2 3 3
7 0 1 2 3 4
11
Stem leaves
4 9
5 3 3 3 4 4 4 5 5 5 5 6 6 6 6 6
6 3 3 4 4 4 5 5 6 6 7 7 7 7 8 8
7
Stems leaves
5 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 9 9
6 9
Summary
Leaves
4 5 6 7
Total 15 51 30 4 100
Mode(𝑴𝒐 ) = 57%
= 57
12
Roughly, 50% of the students got above 57% mark and 50% of the students got below
or equal to 57%.
Answers
= 34
# of classes= 2𝑐 ≥ 𝑛 , 𝑛 = 100
∴ 𝑐 = 7 classes
NB: The frequency polygon starts and ends with frequency of zero
𝐿1 ≤ 40, thus we can have 35 as our first limit which implies that
𝒰𝑖 = 𝐿𝑖 + 𝜔
= 35 + 5 = 40
13
Note that the table below is the absolute cumulative frequency table because it has
both the frequency and the cumulative frequency columns.
1
= 100 × 5745
= 57.45
(35, 0), (40, 4), (45, 11), (50, 20), (55, 31);
35
Frequency 30
25
20 • frequency polygon
•
15
10 • •
5 • •
• •
35 40 45 50 55 60 65 70 75 80
Histogram
Frequency polygon
15
CF polygon
100 •
•
90
•
80
70 cumulative frequency
•
60 polygon
50 •
40
•
30
20 •
10 •
•
40 45 50 55 60 65 70 75 80 upper limit
median≈ 57
Mean:
𝑐
1
𝑥̅ = ∑ 𝑛𝑖 𝑥𝑖
𝑛
1=1
1
= 100 (5745)
= 57.45
Mode(𝑀0 )
16
𝑢 is the value for the modal class
i.e. 𝑢 = 4
𝑑1 = 𝑛4 − 𝑛3 = 31 − 20 = 11
𝑑2 = 𝑛4 − 𝑛5 = 31 − 19 = 19
𝑑1
𝑀𝑜 = 𝐿𝑘 + 𝜔 (𝑑 )
1 +𝑑2
11
= 55 + 5 (11+12)
= 57.39
Median (𝑀𝑑 )
Find the median class from the table using the following procedure.
*Modal class will not always be the median class, in our example, it was just a
coincidence.
𝑛
We find the median class by comparing 𝑛 and 𝐹𝑖 , since 𝑛 = 100, = 50. So the median
2
class is the class that reaches 50 (for the first time) in the cumulative frequency column,
that is
Class 1= 4
Class 2= 16
Class 3= 35
Class 4= 66
Thus class 4 is the median class because it reaches 50 for the first time
∴ 𝑘 = 4 class = 4
𝑤 𝑛
𝑀𝑑 = 𝐿𝑘 + 𝑛 ( 2 − 𝐶𝐹𝑘−1 )
𝑘
𝑤 𝑛
𝑀𝑑 = 𝐿4 + 𝑛 ( 2 − 𝐶𝐹3 )
4
5 100
= 55 + ( − 35)
31 2
= 57.42
17
Variance (𝑺𝟐 )
1
𝑆 2 = 𝑛−1 ∑𝑐𝑖=1 𝑛𝑖 (𝑋𝑖 − 𝑋̅ )2
1
= 100−1 (4974.75)
1
= 99 (4974.75)
= 50.25
Standard deviation
𝑠 = √𝑆 2
= √7.09
Coefficient of skewness
3(𝑋̅−𝑀𝑑 )
𝑆𝑘 = 𝑆
3(57.45−57.42)
= 7.09
= 0 (Rounding off)
18
Q11 (worksheet)
[𝐿𝑖 − 𝒰𝑖 [ (𝑥𝑖 )
||||| ||||
1 [10 − 12[ 9 9 = 9/50 11 99 53.5824
||||| |||||
2 [12 − 14[ ||||| 26 35 = 35/50 13 333 5.0336
| |||||
|||||
||||| |||||
3 [14 − 16 10 45 15 150 24.336
45 =
50
|||||
4 [16 − 18[ 5 50 = 50/50 17 85 63.368
672 146.32
Mean
1
𝑥̅ = ∑𝑛 𝑥
𝑛 𝑖 𝑖
1
= 50 (672)
= $13.44
19
1
Variance(𝑆 2 ) = 𝑛−1 ∑𝑐𝑖=1 𝑛𝑖 (𝑥𝑖 − 𝑥̅ )2
1
= (146.32)
50−1
1
= 49 (146.32)
= 2.986
Mode
𝑑1
𝑀𝑜 = 𝐿𝑘 + 𝜔 (𝑑 )
1 +𝑑2
𝐿(2) = 12
= 26 − 9 = 26 − 10
= 17 = 16
17
𝑀𝑜 = 12 + 2 (17+16 )
= $ 13.03
Median
𝜔 𝑛
𝑀𝑑 = 𝐿𝑘 + 𝑛 ( 2 − 𝐶𝐹𝑘+1 )
𝑘
𝑛
𝐶𝐹 ≥ for the first time in class 2 , thus 𝑘 = 2
2
𝐿𝑘 = 12 , 𝜔 = 2 , 𝑛𝑘 = 26 , 𝐶𝐹(2−1) = 9
2 50
𝑀𝑑 = 12 + 26 ( 2 − 9)
1
= 12 + 13 (25 − 9)
= 13.23
20
13. Find sample size 𝑛, 𝑛 = 300
𝑛(𝑆𝑂𝑇)
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
100 1
= 300 = 3 = 0.33̅
𝑛(𝑓𝑒𝑚𝑎𝑙𝑒) 82
= 300
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
41
= 150 = 0.273̅
𝑛(𝑑𝑒𝑔𝑟𝑒𝑒) 235
= 300 = 0.783̅
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
21