CHAPTER 2 - Descriptive Statistics
CHAPTER 2 - Descriptive Statistics
CHAPTER 2
DESCRIBING DATA & NUMERICAL METHODS
ORGANIZING & GRAPHING DATA
NUMERICAL METHODS
5) Standard Deviation
6) Variance
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 35
___________________________________________Chapter 2: Describing Data & Numerical Methods
Ù Frequency distribution is a table consisting of rows and columns with the purpose to
EXAMPLE 1
A car dealer in Kuala Lumpur makes the sales for the following types of cars in the
EXAMPLE 2
Construct a frequency distribution table for the type of blood of 21 staffs in a company
as given below.
A B B A O O A
B O A O A B A
B O A A B B B
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 36
___________________________________________Chapter 2: Describing Data & Numerical Methods
Ù STEPS:
(2) Find total for each category & total for all categories.
(3) Calculate the percentages for each category (convert to the nearest whole
number).
𝑻𝒐𝒕𝒂𝒍 𝒇𝒐𝒓 𝒆𝒂𝒄𝒉 𝒄𝒂𝒕𝒆𝒈𝒐𝒓𝒚
= × 𝟏𝟎𝟎
𝑻𝒐𝒕𝒂𝒍 𝒇𝒐𝒓 𝒂𝒍𝒍 𝒄𝒂𝒕𝒆𝒈𝒐𝒓𝒚
EXAMPLE 3
NZ Holdings’ current assets (RM million) for the year 2000 are given in table below.
Construct pie chart for the information given and give a brief comment.
Cash 720
Others 860
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 37
___________________________________________Chapter 2: Describing Data & Numerical Methods
Stocks
Cash
Others
TOTAL
STEP 2
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 38
___________________________________________Chapter 2: Describing Data & Numerical Methods
Ù This bar chart is frequently used in newspapers, magazines, companies’ annual report &
Ù It used the lengths of horizontal bars (in horizontal bar chart) or vertical columns (in
(1) Label the horizontal axis (categories) and vertical axis (with appropriate scale).
(2) Construct a rectangle over each category with the height of the rectangle equal
to the number of objects in that category. The base of each rectangle should
(3) Leave space between each category on the horizontal axis to distinguish
EXAMPLE 4
NZ Given below are data showing the quarterly profit (in RM ‘000) for XYZ Company
for the year 2005. In the space provided below, draw a vertical and horizontal bar chart
to represent the quarterly profit. Give a brief comment to the chart drawn.
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 39
___________________________________________Chapter 2: Describing Data & Numerical Methods
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 40
___________________________________________Chapter 2: Describing Data & Numerical Methods
EXAMPLE 5
Given below are data showing the quarterly profit (in RM ‘000) for companies A, B and
C for the year 2008. Draw a multiple bar chart to represent this data & give comments.
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 41
___________________________________________Chapter 2: Describing Data & Numerical Methods
Ù If the components are converted into percentage, then the percentage stacked bar
Ù STEPS:
(2) Calculate the percentage (if you are asked for percentage component bar chart)
EXAMPLE 6
A study has been undertaken to determine if there is a relationship between the place
of residence & ownership of foreign made cars. A random sample of 500 car owners was
STEP 2
TOTAL
Draw a component bar chart & percentage component bar chart for the above data in the space
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 42
___________________________________________Chapter 2: Describing Data & Numerical Methods
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 43
___________________________________________Chapter 2: Describing Data & Numerical Methods
EXERCISE 1
Information on the number of students enrolled in four diploma programs at a private college
in a particular semester was recorded. The following table shows the number of students by
gender.
GENDER
PROGRAM
MALE FEMALE
Computer Science 54 120
Mathematics 77 108
Statistics 89 114
Actuarial Science 64 88
Present the above information using stacked bar chart and give your comment.
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 44
___________________________________________Chapter 2: Describing Data & Numerical Methods
simultaneously.
EXAMPLE 7
A car manufacturer might be interested to know whether colour preference for a car
is independent of gender.
Contingency table above shows that men preferred black cars while women prefer red
cars. In addition, we can conclude that men dislike blue cars while women dislike green
cars.
EXAMPLE 8
A survey finds that from a total of 155 respondents taken randomly from a housing
park, 65 are male respondents and from these male respondents 25 are unmarried
respondents. From the survey, there are 92 married respondents. Construct a 2×2 table
for the above information. How many percent of the female respondents out of the
total respondents are married?
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 45
___________________________________________Chapter 2: Describing Data & Numerical Methods
Ù Quantitative data are normally summarized in tabular forms & we used frequency table to
Ù Quantitative data can be divided into two, ungrouped & grouped data.
UNGROUPED DATA
Ù The data give information on each member of the population or sample individually.
GROUPED DATA
Ù This plot separates data entries into leading digits (stem) & trailing digits (leaf).
Ù GUIDELINES:
1) Split each value into 2 sets of digits, the first set of digits is the stem and the
2) List all the possible stem digits from the lowest to highest.
3) For each value in mass data, write down the leaf numbers on the line labeled by
EXAMPLE 9
3.4 4.5 2.3 2.7 3.8 5.9 3.4 4.7 2.4 4.1 3.6 5.1
STEP 1
Arrange the data in ascending order as follows:
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 46
___________________________________________Chapter 2: Describing Data & Numerical Methods
2.3 2.4 2.7 3.4 3.4 3.6 3.8 4.1 4.5 4.7 5.1 5.9
STEP 2
Then, draw a vertical line to separate the stem (at left) and leaf values (at right) as
follows:
Stem Leaf
2 3 4 7
3 4 4 6 8
4 1 5 7
5 1 9
Note: 2 | 4 means 2.4
Based on stem-and-leaf plot above, we can say that the data is normally distributed.
measures the lack of symmetry in a data distribution. (Refer text book page 83 for
Ù SHAPE OF DISTRIBUTION
Mean < Median < Mode Mean > Median > Mode
3) Normally distributed/Symmetric
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 47
___________________________________________Chapter 2: Describing Data & Numerical Methods
EXERCISE 2
The statistics marks for 30 students taken randomly from ne final exam result are as follows.
Construct a stem-and-leaf display for the data below and comment on the shape of distribution.
75 68 62 82 80 55 91 65 71 84
52 72 77 63 84 92 60 53 45 80
58 60 74 72 54 64 68 70 62 58
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 48
___________________________________________Chapter 2: Describing Data & Numerical Methods
Ù Frequency distribution is a summary table where the data are grouped into a number
of classes.
Ù Objective: To obtain the number of responses associated with the different values of
the variable.
Ù The frequency of an observation is the number of times the observation has occurred.
Ù For ungrouped data, the frequency distribution is a table consisting of the observed
EXAMPLE 10
3 2 2 3 2 4 4 1 2 2
4 3 2 0 2 2 1 3 3 1
Ù The value 1 occurs 3 times, thus frequency for value 1 is 3. Likely, 2 occurs 8 times &
Class, (x) t
0 1
1 3
2 8
3 5
4 3
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 49
___________________________________________Chapter 2: Describing Data & Numerical Methods
Ù A frequency table summarizes the data collected by forming intervals of values &
Ù Disadvantage of grouped data: Some information is lost when data are grouped into
several class intervals. For instance, if it is known that there are six observations in
an interval labeled 15-20, one cannot say whether they are all at one end of the
Ù GUIDELINES:
1) Class interval should be mutually exclusive (class should be clearly defined &
not overlapped).
3) It should neither be too few classes nor too many classes. ( 5 ≤ Class ≤15)
4) Finally, frequency of each class is indicated in the frequency table. Note that
Ù CLASS LIMITS of a class are the highest & lowest values of a class. Every class has
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 50
___________________________________________Chapter 2: Describing Data & Numerical Methods
Ù CLASS MIDPOINT is also known as class mark. Every class has class midpoint.
Ù CLASS BOUNDARIES are values such that with the values each class is joined to the
next class. Each class has 2 class boundaries, the upper class boundary & lower class
without class boundaries. The class boundary is given by the midpoint of the upper
limit of one class and the lower limit of the next class.
Ù When ungrouped data is given to you & you are asked to construct the frequency
distribution & if the class width/size is not mentioned, the CLASS SIZE is calculated
as:
𝒍𝒐𝒈 𝒏
Where NO. OF CLASS is given by: 𝒌=
𝒍𝒐𝒈 𝟐
BOUNDARIES. For this type of frequency table, the upper class limit of a class is not
equal to the lower class limit of the next class. The class intervals do not overlapped.
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 51
___________________________________________Chapter 2: Describing Data & Numerical Methods
Ù FREQUENCY TABLE 2: The class boundaries can be constructed for the above table
as follows.
this type of frequency table, the upper class limit of a class is equal to the lower class
Ù Frequency Table 3 above also can be constructed in with open-ended classes as follow.
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 52
___________________________________________Chapter 2: Describing Data & Numerical Methods
EXAMPLE 11
a) Class boundary for the 2nd class is 600.5 to less than 800.5
EXERCISE 3
Calculate the class limit, class boundaries, class width and class midpoint for all the
classes based on data Example 11 for your understanding.
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 53
___________________________________________Chapter 2: Describing Data & Numerical Methods
EXERCISE 4
Construct a frequency distribution table based on data given below by using class limit.
Data on home runs hit by Major League Baseball teams during the 2002 season
SOLUTION:
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 54
___________________________________________Chapter 2: Describing Data & Numerical Methods
2.10 HISTOGRAM
represent frequencies.
Ù Horizontal axis represents the random variable while vertical axis represents the
EXAMPLE 12
Data below shows the weight of 100 honeydews produced from Farm X. Draw a
histogram for the data in the space provided below.
Weight ('00 g) Frequency
4–6 4
6–8 9
8 – 10 34
10 – 12 25
12 – 14 28
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 55
___________________________________________Chapter 2: Describing Data & Numerical Methods
EXERCISE 5
Data below shows the employees’ age (in years) in Company A. Draw a histogram for the data
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 56
___________________________________________Chapter 2: Describing Data & Numerical Methods
Ù Two additional classes with zero frequencies are added to the 2 ends of the histograms.
Thus, the 2 ends of the frequency polygon are connected to the horizontal axis.
EXAMPLE 13
Daily sales (in RM) of 35 hawkers taken randomly in a town are shown in the table below.
Draw a histogram and frequency polygon to represent the data.
Daily Sales (RM) No. of Hawkers
121 - 136 7
137 – 152 6
153 – 168 11
169 – 184 6
185 - 200 5
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 57
___________________________________________Chapter 2: Describing Data & Numerical Methods
Ù There are 2 types of cumulative frequency distributions, “less than” (frequently used)
Ù The frequencies up to the upper boundary of each class interval are progressively added
EXAMPLE 14
For the data below, calculate the cumulative frequency, relative frequency and
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 58
___________________________________________Chapter 2: Describing Data & Numerical Methods
EXERCISE 6
The following information shows the relative frequency distribution table of the monthly
telephone bills (in RM) for November 2008, spent by 200 households in Shah Alam.
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 59
___________________________________________Chapter 2: Describing Data & Numerical Methods
Ù Ogive is drawn based on the data from a cumulative frequency table or data from
Ù Ogives for relative frequencies are used when two cumulative distributions with
EXAMPLE 15
Step 2: Find lower limit for each class (if the class interval do not overlapped).
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 60
___________________________________________Chapter 2: Describing Data & Numerical Methods
Step 3: Find the upper limit for the last class, so that you can plot for the total number
of frequency. Then, create another table as follows. This step is unnecessary if
you understand how to read the cumulative frequency according to their
respective limits (since we used “less than ogive”). This step will ease you to plot
the ogive.
Service Years Cumulative Frequency
Less than 0.5 0
Less than 4.5 16
Less than 8.5 36
Less than 12.5 64
Less than 16.5 88
Less than 20.5 104
Less than 24.5 115
Less than 28.5 120
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 61
___________________________________________Chapter 2: Describing Data & Numerical Methods
EXERCISE 7
The following data show the monthly income (RM) of 50 fishermen in a village. Draw a less than
ogive. Hence, find the percentage of fishermen having income more than RM340.
Monthly Income (RM) No. of Fisherman
300 < 350 4
350 < 400 13
400 < 450 18
450 < 500 10
500 < 550 2
550 < 600 3
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 62
___________________________________________Chapter 2: Describing Data & Numerical Methods
NUMERICAL METHODS
MEASURES OF CENTRAL TENDENCY
MEASUREMENT UNGROUPED DATA GROUPED DATA
∑ 𝒇𝒙
Mean is the average of the data values. "
𝒙=
∑𝒇
MEAN
∑𝒙 Step 1: Find midpoint (x)
(𝒙
#) 𝒙
"=
𝒏
Step 2: Calculate fx
Step 3: Find total of f and total of fx
𝒏
𝒏+𝟏 − ∑ 𝒇𝒎"𝟏
𝑳𝒐𝒄𝒂𝒕𝒊𝒐𝒏 𝒐𝒇 𝒙
.= 𝒙 = 𝑳𝒎 + 4 𝟐
. 6×𝑪
𝟐 𝒇𝒎
𝒇𝟎 − 𝒇𝟏
𝒙
9 = 𝑳+: =×𝑪
(𝒇𝟎 − 𝒇𝟏 ) + (𝒇𝟎 − 𝒇𝟐 )
(𝒙
&) F Compute the number of times the value of Estimating Mode from Histogram
the data that occurs the most frequent.
MEASURES OF POSITION
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 63
___________________________________________Chapter 2: Describing Data & Numerical Methods
𝒏+𝟏
𝑳𝒐𝒄𝒂𝒕𝒊𝒐𝒏 𝒐𝒇 𝑸𝟐 =
SECOND 𝟐 Q2 = MEDIAN
QUARTILE
Step 3: Find the value of Q2 according to the (Refer how to calculate median for grouped data)
(Q2)
location.
Note:
If n = ODD, Q2 is in the middle of the data.
If n = EVEN, Q2 is the average of the 2
middle numbers.
Q1 = First quartile means 25% of the total data is less than first quartile and 75% of the total data is more than
first quartile.
Q3 = First quartile means 75% of the total data is less than third quartile and 25% of the total data is more than
third quartile.
25% 75%
Q1 Q2 Q3
MEASURES OF DISPERSION
50% 50%
75% 25%
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 64
___________________________________________Chapter 2: Describing Data & Numerical Methods
RANGE Range = Largest Value – Smallest Value Range = Upper Boundary – Lower Boundary
Highest Class Lowest Class
INTERQUARTILE
IR = Q3 – Q1 IR = Q3 – Q1
RANGE
QUARTILE
DEVIATION
@
QD = ½ (Q3 – Q1) QD = ½ (Q3 – Q1)
SEMI
INTERQUARTILE
RANGE
1) The more spread out or dispersed the data are, the larger will be the range, the interquartile range, the
2) The more clustered the data are, the smaller will be the range, the interquartile range, the quartile
3) If all data values all the same, that is, there is no variation in the data, the range, the interquartile range,
the quartile deviation, the variance & standard deviation will be equal to zero.
4) The range, the interquartile range, the quartile deviation, the variance & standard deviation can never be
negative.
VARIANCE 𝟏 ( ∑ 𝒙 )𝟐 𝟏 ( ∑ 𝒇𝒙 )𝟐
𝑺𝟐 = C D 𝒙𝟐 − E 𝑺𝟐 = C D 𝒇𝒙𝟐 − E
( 𝑺𝟐 ) 𝒏−𝟏 𝒏 𝒏−𝟏 𝒏
∑|𝒙 − 𝒙
V|
𝑴𝒆𝒂𝒏 𝑫𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏 =
𝒏
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 65
___________________________________________Chapter 2: Describing Data & Numerical Methods
V−𝒙
𝒙 W V−𝒙
𝟑(𝒙 Y)
𝑺𝒌𝒆𝒘𝒏𝒆𝒔𝒔 = OR 𝑺𝒌𝒆𝒘𝒏𝒆𝒔𝒔 =
𝒔 𝒔
Ù If MEAN > MEDIAN > MODE : The distribution is skewed to the right.
Ù If MEAN < MEDIAN < MODE : The distribution is skewed to the left.
EXAMPLE 16
5 3 8 12 18 20 24 25 8 2
(1) Mean
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 66
___________________________________________Chapter 2: Describing Data & Numerical Methods
(6) Variance
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 67
___________________________________________Chapter 2: Describing Data & Numerical Methods
(8) Range
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 68
___________________________________________Chapter 2: Describing Data & Numerical Methods
EXAMPLE 17
Data below shows the monthly income (in RM) for employees at Suria Company.
Monthly No. of
Income Employees
(in RM) (f)
700 - 799 8
800 - 899 13
900 - 999 14
1000 - 1099 10
1100 - 1199 25
1200 - 1299 4
(1) Mean
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 69
___________________________________________Chapter 2: Describing Data & Numerical Methods
(4) Variance
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 70
___________________________________________Chapter 2: Describing Data & Numerical Methods
(6) Range
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 71
___________________________________________Chapter 2: Describing Data & Numerical Methods
EXERCISE 8
The table below represents the record high temperatures in Fahrenheit for each of the 50
states.
Temperature Frequency
100 – 104 2
105 – 109 8
110 – 114 18
115 – 119 13
120 – 124 7
125 – 129 1
130 – 134 1
(a) Calculate the measures of central tendency and hence interpret the values obtained.
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 72
___________________________________________Chapter 2: Describing Data & Numerical Methods
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 73
___________________________________________Chapter 2: Describing Data & Numerical Methods
(d) Draw a less than ogive and find the value of the first and third quartile. Interpret the
values obtained.
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 74
___________________________________________Chapter 2: Describing Data & Numerical Methods
Ù CV compares the standard deviation & mean for a distribution & converts this value to
percent.
𝒔
𝑪𝑽 = × 𝟏𝟎𝟎
V
𝒙
Ù We also can say that distribution B is more consistent (or less dispersed or more stable)
than distribution A.
Ù A larger relative variation implies less consistency, while smaller relative variation
EXAMPLE 18
During the first six months of 2009, the mean share price of Company A was RM1.90
with standard deviation of RM0.50, while the mean share price of Company B was
RM8.00 with standard deviation of RM0.85, which company’s share price is more
consistent?
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 75
___________________________________________Chapter 2: Describing Data & Numerical Methods
Ù Box plot provides a useful graphical presentation of data using minimum, maximum, first quartile (𝑄' ), third
Minimum Maximum
𝑸𝟏 Median 𝑸𝟑
Ù Vertical line inside the box represents the location of the median value.
Ù Vertical line at the left-hand side of the box represents the location of the Q1.
Ù Vertical line at the right-hand side of the box represents the location of the Q3.
Ù The extreme end of line (a whisker) connecting to the left-hand side of the box is the location for the
smallest value.
Ù The extreme end of line (a whisker) connecting to the right-hand side of the box is the location for the
largest value.
Ù Normally distributed è Median is in the middle of the box & whiskers are of equal length.
Ù Negatively skewed è The whisker & the rectangular box is longer on the left-hand side.
Minimum Maximum
𝑸𝟏 Median 𝑸𝟑
Ù Positively skewed è The whisker & the rectangular box is longer on the right-hand side.
Minimum Maximum
𝑸𝟏 Median 𝑸𝟑
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 76
___________________________________________Chapter 2: Describing Data & Numerical Methods
EXAMPLE:
Construct a box-and-whisker plot for the data below.
12 18 20 34 8 42 30 58 40
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 77
___________________________________________Chapter 2: Describing Data & Numerical Methods
TUTORIAL 2
Please do all the questions listed below & show your calculations clearly.
REVIEW QUESTIONS 3
Text Book Page 64
REVIEW QUESTIONS 4
Text Book Page 93
REVIEW QUESTIONS 5
Text Book Page 127
P r e p a r e d B y : N O R A S L I L Y S A R K A M © | 78