Chapter 2 - Descriptive Statistics
Chapter 2 - Descriptive Statistics
AMIRUDDIN AB AZIZ
LEARNING OUTCOMES
1. Able to organise and represent qualitative and quantitative
data using an appropriate analysis tool.
Table 1: Laptop model use by student in Class A Table 2: Income and Expenses for shops in Ayu Mall
Graphical Method for Qualitative Data
NUMBER OF SALES
50
40
200
30
100 20
0 10
Category 1 0
2016 2017 2018
FKK FKM FSG
Product C Product B Product A
Figure 1.2: Number of Students in KRBB in 2018 Figure 1.4: Sales in Ayu Trading
Multiple bar chart
5
Percentage Component Bar Chart
FKK FKM FSG 100
4.5
4 80
3.5
SALES PERCENTAGE
60
3
2.5 40
2
20
1.5
1 0
0.5 2016 2017 2018
0 Product C Product B Product A
2016 2017 2018
Figure 1.3: Number of Students in KRBB Figure 1.5: Percentage sales in Ayu Trading
Graphical Method for Quantitative Data
Class width = 6 – 4 = 2
Graphical Method for Quantitative Data
GRAPH OF FREQUENCY DISTRIBUTIONS
1. Histogram
• A graph that displays the data by using adjacent vertical bars of various
heights to represent the frequency of the classes.
• Guidelines for constructing histogram
a) Draw the x-axis and the y-axis. x-axis represent the class boundaries
and y-axis represent the frequency.
b) Using frequency as the height, draw a vertical bar for each classes
which is adjacent to each other (no gap).
Example: Based on example 2 in slide 12, draw a histogram for the given
data.
Graphical Method for Quantitative Data
30
25
Number of employees
20
15
10
Service
year
0.5 4.5 8.5 12.5 16.5 20.5 24.5 28.5
Figure 1.6: Histogram of service year distribution for employees at Company A (Example 2)
Graphical Method for Quantitative Data
2. Frequency Polygon polygon
A graph that displays the data by using lines that connect point plotted for
the frequencies at the midpoint of classes. The frequencies is represented
by the heights of the points.
Example: Based on example 2 in slide 12, draw a frequency polygon for the
given data.
Graphical Method for Quantitative Data
30
25
Number of employees
20
15
10
Service
2.5 6.5 10.5 14.5 18.5 22.5 26.5 year
Figure 1.7: Frequency polygon of service year distribution for employees at Company A (Ex. 2)
Graphical Method for Quantitative Data
3. Ogive
• A curve (also known as cumulative frequency graph) drawn based on
the cumulative frequency distribution by joining with smooth lines the
dots marked above the upper boundaries of classes at heights equal to
the cumulative frequencies of respective classes.
Example: Based on example 2 in slide 12, draw an ogive for the given data.
Graphical Method for Quantitative Data
30
25
Number of employees
20
15
10
Service
year
0.5 4.5 8.5 12.5 16.5 20.5 24.5 28.5
Figure 1.8: Ogive of service year distribution for employees at Company A (Example 2)
Graphical Method for Quantitative Data
Notes: If the data value are hundreds such as 325, the stem is 32 and the leaf is 5
Graphical Method for Quantitative Data
An insurance company researcher conducted a survey on the
number of car thefts in a large city for a period of 30 days. The raw
data are shown below. Construct a steam and leaf plot for the given
data.
22 32 21 20 29 28 47 36 23 27 45 26 22 37 43
49 29 38 35 42 27 21 33 39 45 35 23 48 36 25
3 2 3 5 5 6 6 7 8 9
4 2 3 5 5 7 8 9
2.2 Measures of Central Tendency
1. Rearrange the data in ascending order : 15, 17, 18, 19, 20, 22, 25 (n = 7)
7+1
2. Position = =4 median
2
Example 4
Find the median for the following data.
205, 150, 125, 180, 215, 175, 150, 140
1. Rearrange the data in ascending order : 125, 140, 150, 150, 175, 180, 205, 215, (n = 8)
8+1
2. Position = = 4.5
2
150 + 175
3. Median = = 162.5 median
2
2.2 Measures of Central Tendency
Example 5
Find the median for the following data.
20
Median = th = 10th
Frequency Cumulative 2
Class
𝒇 Frequency
20
5.5 – 10.5 3 3 −8
Median = 15.5 + 2 5
10.5 – 15.5 5 8 4
= 18
15.5 – 20.5 4 12
20.5 – 25.5 5 17
25.5 – 30.5 3 20
Median class
2.2 Measures of Central Tendency
3. Mode
• The most frequent data value in a data set.
• There is data set with no mode (it is wrong to say the mode is zero, since zero
can be a data value) or more than one mode
• Mode for ungrouped data
Data value that occur the most
Measures of
location
• Quartiles are separated into first quartile (Q1), second quartile (Q2) and
third quartile (Q3).
1 𝐧+𝟏
• Q1 is the th value in the data set. 𝐏𝐨𝐬𝐢𝐭𝐢𝐨𝐧 𝐨𝐟 Q1 =
4 𝟒
25% of all the observation is less than Q1 and another 75% is more than Q1
𝟐 𝐧+𝟏
• Q2 is the median value. 𝐏𝐨𝐬𝐢𝐭𝐢𝐨𝐧 𝐨𝐟 Q2 =
𝟒
3 𝟑 𝐧+𝟏
• Q3 is the th value in the data set. 𝐏𝐨𝐬𝐢𝐭𝐢𝐨𝐧 𝐨𝐟 Q3 =
4 𝟒
75% of all the observation is less than Q3 and another 25% is more than Q3
2.3 Measures of Location
a) Ungrouped data:
Step 1 : Arrange the data according to ascending order.
𝐧+𝟏 𝟑(𝐧+𝟏)
Step 2 : Position of Q1 is , position of Q3 is
𝟒 𝟒
Step 3 : Find Q1 or / and Q3.
2.3 Measures of Location
Example 8
Find Q1, Q2, and Q3 for the following data: 15, 13, 6, 5, 12, 50, 22, 18
ANSWER
Rearrange the data into ascending orders
5, 6, 12, 13, 15, 18, 22, 50 (n = 8)
8+1 6, 12 3(8 + 1)
Position of 𝑄1 = = 2.25 Position of 𝑄3 = = 6.75
4 4
𝑄1 = 6 + 0.25(12 − 6) = 7.5 18, 22
𝑄1
2(8 + 1)
= 4.5 13, 15
𝑄3
Position of 𝑄2 =
4 𝑄3 = 18 + 0.75 22 − 18 = 21
13 + 15
𝑄2 = = 14 𝑄2
2
2.3 Measures of Location
b) For grouped data:
Step 1 : Find the cumulative frequency.
Σ𝒇 𝟑Σ𝒇
Step 2 : Position of Q1 is , position of Q3 is
𝟒 𝟒
Σ𝒇 𝟑Σ𝒇
−Σ𝑓 − Σ𝑓Q3−1
Step 3 : Find QuartileQ1= 𝐿Q +
𝟒 Q1−1 𝐶 and Q3 = 𝐿Q3 + 4 𝐶
1 𝑓 𝑓Q3
Q1
where Σ𝒇 = total frequency = n
C = class size
𝐿Q = lower class boundary of the lower quartile class
1
Comment : 25% of the student marks is less than 55.75 and another 75%
marks is more than 55.75.
3f
Q3 = = 33*
Position of upper quartile is 4
33 − 28
Q3 = 69.5 + 10 = 74.5
10
Comment : 75% of the student marks is less than 74.5 and another 25%
marks is more than 74.5.
2.4 Measures of Dispersion
Example pg 67
• Measures of dispersion – measures the spread of data values.
• This measures include the range, inter-quartile range, quartile deviation, variance
and standard deviation.
• The measures of dispersion is important because two groups of data may have
same central values but they have different variability.
Group A: 20 30 40 50 60 70 Group B: 40 42 43 45 55
20 + 30 + 40 + 50 + 60 40 + 42 + 43 + 45 + 55
mean𝐴 = = 45 mean𝐵 = = 45
6 6
The average marks for both Group A and Group B are the same,
but the data for Group A are more disperse than Group B.
2.4 Measures of Dispersion
a) Range – The difference between the largest and the smallest values in the data set.
Range for group data = 𝐮𝐩𝐩𝐞𝐫 𝐜𝐥𝐚𝐬𝐬 𝐛𝐨𝐮𝐧𝐝𝐚𝐫𝐲 𝐨𝐟 𝐥𝐚𝐬𝐭 𝐜𝐥𝐚𝐬𝐬 − 𝐥𝐨𝐰𝐞𝐫 𝐜𝐥𝐚𝐬𝐬 𝐛𝐨𝐮𝐧𝐝𝐚𝐫𝐲 𝐨𝐟 𝐟𝐢𝐫𝐬𝐭 𝐜𝐥𝐚𝐬𝐬
Example pg 67
range𝐴 = 70 − 20 = 50 range𝐵 = 55 − 40 = 15
2.4 Measures of Dispersion
Example pg 67
35 – 39 34.5 – 39.5 5
40 – 44 39.5 – 44.5 7
45 – 49 44.5 – 50.5 9
range𝐴 = 70 − 20 = 50
50 – 54 50.5 – 54.5 2
2.4 Measures of Dispersion
1 𝑛
1 𝑛 2
𝑠= Σ𝑖=1 𝑥𝑖 − 𝑥ҧ 2 𝑠= Σ𝑖=1 𝑓 𝑥𝑖 − 𝑥ҧ
Standard 𝑛−1 Σ𝑓 − 1
Deviation 1 Σ𝑥 2 1 Σ𝑓𝑥 2
= 2
Σ𝑥 − = 2
Σ𝑓𝑥 −
𝑛−1 𝑛 𝑛−1 Σ𝑓
𝐕𝐚𝐫𝐢𝐚𝐧𝐜𝐞 = 𝐬 𝟐
2.4 Measures of Dispersion
Review Question pg 82
The number of sales made last month by each of the 14 members of the sales
staff of a company was recorded:
101, 137,
68, 77, 128,
101, 124,
108, 121,121,
112, 133,123,
77, 124,
127, 127,
108, 128,
139, 130,
68, 112,
133, 123,
137, 130
139,
Find a) mean 1628
a) mean = = 116.29
b) Standard deviation 14
c) range 1 1628 2
b) s = 195300 − = 21.46
d) Quartile deviation 14 − 1 14
c) range = 139 − 68 = 71
𝑄3 − 𝑄1 130.75 − 106.25
d) = = 12.25
2 2
2.4 Measures of Dispersion
• If two or more data sets have equal mean values, standard deviation can
be used as the measures of dispersion.
• If two or more data sets have different values of mean and standard
deviation, the dispersion of data can be measured using coefficient of
variation.
𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐝𝐞𝐯𝐢𝐚𝐭𝐢𝐨𝐧 𝐬
𝐂𝐨𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐨𝐟 𝐯𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧 = = × 𝟏𝟎𝟎
𝐦𝐞𝐚𝐧 ഥ
𝒙
Example pg 72 Set A Set B Set C
Mean, 𝑥ҧ 75 75 75
Standard deviation, 𝑠 2 10 15 7
➢ Set B has greater dispersion because it has the largest standard deviation.
➢ Set C is has dispersion which is more consistent because it has the smallest standard
deviation.
2.4 Measures of Dispersion
Example 10
Number of car sold Commissions (RM)
Mean, 𝑥ҧ 87 20900
Standard deviation, 𝑠 5 3092
Based on the data above, compare the variations between number of
cars sold and the commissions received.
s 5
𝐍𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐜𝐚𝐫 𝐬𝐨𝐥𝐝: Coefficient of variation = × 100 = × 100 = 5.7%
𝑥ҧ 87
s 3092
𝐂𝐨𝐦𝐦𝐢𝐬𝐬𝐢𝐨𝐧𝐬: Coefficient of variation = × 100 = × 100 = 14.8%
𝑥ҧ 20900
Conclusion
➢ Commissions are more variable than sales.
➢ Sales are more consistent than commissions.
2.4 Measures of Dispersion
Example 11
Grouped Data with Frequency
From the table below, find the range, Interquartile Range, Quartile Deviation and
variance NO. of Computers (x) 0 1 2 3 4
Frequency (f) 6 8 10 4 2
𝑓𝑥 2 6(02 ) 8(12 ) 10(22 ) 4(32 ) 2(42 )
𝑓𝑥 6(0) 8(1) 10(2) 4(3) 2(4)
Cum. Frequency 6 14 24 28 30
1) Range = 4 – 0 = 4
1 Σ𝑓𝑥 2
2) IQR = 𝑸𝟑 − 𝑸𝟏 = 2 – 1 = 1 SD = Σ𝑓𝑥 2 −
𝑛−1 Σ𝑓
𝑸𝟑 −𝑸𝟏 𝟏
3) QD = = = 𝟎. 𝟓
𝟐 𝟐
1 48 2
4) Variance, 𝐬𝟐 = 1.357 SD = 116 − = 1.1626
30−1 30
5) SD = 1.1626
2.4 Measures of Dispersion
Example 12
Grouped Data
From the table below, find the range, Interquartile Range, Quartile Deviation
and variance
Time Class Mid Point (x) Frequency (f) 5Cumulative
Boundary Frequency
30 – 39 29.5 – 39.5 34.5 25 25 (1 - 25)
40 – 49 39.5 – 49.5 44.5 16 41 (26 - 41)
50 – 59 49.5 – 59.5 54.5 12 53 (42 – 53)
60 – 69 59.5 – 69.5 64.5 7 60 (54 – 60)
70 - 79 69.5 – 79.5 74.5 5 65 (61 – 64)
Time Class Mid Point (x) Frequency (f) Cumulative 𝑓𝑥 2 𝑓𝑥
Boundary Frequency
30 – 39 29.5 – 39.5 34.5 25 25 (1 - 25) 25(34.52 ) 25(34.5)
40 – 49 39.5 – 49.5 44.5 16 41 (26 - 41) 16(44.52 ) 16(44.5)
50 – 59 49.5 – 59.5 54.5 12 53 (42 – 53) 12(54.52 ) 12(54.5)
60 – 69 59.5 – 69.5 64.5 7 60 (54 – 60) 7(64.52 ) 7(64.5)
70 - 79 69.5 – 79.5 74.5 5 65 (61 – 64) 5(74.52 ) 5(74.5)
Σ𝑓𝑥 2 = Σ𝑓𝑥 =
153965.25 3052.5
1) R = 79.5 – 29.5 = 50
2) IQR = 𝑸𝟑 − 𝑸𝟏 = 55.96 – 36 = 19.96
𝑸 −𝑸 𝟏𝟗.𝟗𝟔
3) QD = 𝟑 𝟏 = = 𝟗. 𝟗𝟖
𝟐 𝟐
4) Variance, 𝐬 𝟐 = 165.7212
5) SD = 12.8733
2.5 Measures of Skewness
• Frequency distribution can assume many shapes.
• Histogram and frequency polygon can be used to determine the shape
of distribution.
• Shape of distribution also can be determined by using min median and
mode.
Distribution is skewed to the right Normal distribution Distribution is skewed to the left
mode < median < mean mean = median = mode mean < median < mode
2.5 Measures of Skewness
Group A Group B
Mean = 47.5 Mean = 45
Standard deviation = 10.84 Standard deviation = 5.25
Median = 50 Median = 44
Mode = 50 Mode = 45
47.5 − 50 45 − 45
𝑃𝐶𝑆 = = −0.231 𝑃𝐶𝑆 = =0
10.84 5.25
Conclusion: The data for Group A is negatively skewed or skewed to the left
The data for Group B is positively skewed or skewed to the right.
2.5 Measures of Skewness
0 5 10 15 20 25 30
Q1 median
0 5 10 15 20 25 30