58859266.K InterpData AUS
58859266.K InterpData AUS
Interpreting Data
Curriculum Ready
ACMSP: 248, 249, 250, 278, 283
www.mathletics.com
Interpreting
INTERPRETING DataDATA
Different lists of data have different properties. This unit is focused on the results and conclusions that
can be found from these different properties.
I used to think:
The median of a data set is the middle score. Does this mean that the number of scores greater than the
median is the same as the number of scores less than the median?
If the median splits the data in two halves, what do you think "quartiles" do?
The "range" of data is the difference between the highest and lowest score. What is "interquartile range"?
The median of a data set is the middle score. Does this mean that the number of scores greater
than the median is the same as the number of scores less than the median?
If the median splits the data in two halves, what do you think "quartiles" do?
The "range" of data is the difference between the highest and lowest score. What is "interquartile range"?
Basic Statistics
Data is just a list of numbers called 'scores' or 'results'. The basic statistics that can be found from these scores are the
mean, median or mode. (These are also called "measures of central tendency")
• The mean is the average score. The symbol for the mean is xr . It is found using the formula xr =
fx
.
/
• The mode is the score with the highest frequency. This is the score that occurs the most often.
f /
• The median is the middle score when the scores are arranged in ascending order.
• The cumulative frequency (cf) is the sum of the frequencies for all scores less than or equal to that score.
Also remember that the symbol / (called "sigma") means 'sum of' and so when / x is written, it means the 'sum
of the scores'.
Here is an example.
A group of people's height was measured (in cm) and the results were written in this table
Cumulative
Score (x) Frequency (f) fx f # x
frequency (cf)
110 3 3 3 # 110 = 330
112 5 3 + 5 = 8 5 # 112 = 560
113 10 8 + 10 = 18 10 # 113 = 1130
115 9 18 + 9 = 27 9 # 115 = 1035
116 8 27 + 8 = 35 8 # 116 = 928
xr =
/ fx
/f
= 3983 = 113.8 cm
35
A histogram is a column graph based on data. A polygon is made up of straight lines joining the centres of
these columns.
6
5
Polygon
4
3
2
Histogram
1
0
2 3 4 5 6 7 8 9 10
Leave half a column Score (x)
on either side
xr =
/ fx
/f
= 2 # 2 + 6 # 4 + 9 # 6 + 7 # 7 + 8 # 9 + 3 # 10
35
= 6.7 (1 d.p.)
A cumulative frequency histogram can also be joined. The polygon joins the right corners of the histogram.
A cumulative frequency polygon is also called an "ogive".
7
Cumulative
6
frequency polygon
5
4
Cumulative
3
frequency histogram
2
1
0
2 3 4 5 6 7 8 9 10
Score (x)
Take the cumulative frequency of the last score. This means that there are 10 scores in total.
Since there were 10 scores, the median is the average of the two middle scores (in position 5 and 6)
score in 5 th position + score in 6 th position
` median =
2
= 6 + 7
2
= 6.5
1. A group of people were asked how many languages they speak and this table was partly completed.
2. A group of people were asked how many movies they had seen in the last year. The diagram below shows the
frequency polygon for the results.
Number of movies seen
9
8
Frequency (f) 7
6
5
4
3
2
1
0
20 21 22 23 24 25
Movies (x)
a Use the diagram to complete the table below.
/ f= / fx =
b What is the mean (to 2 decimal places if necessary)?
3. A group of people were asked their age and this frequency histogram was produced.
Different Ages
6
4
Frequency (f)
0
30 31 32 33 34 35
Ages (x)
a Complete the table below.
/ f= / fx =
b Complete the polygon on the diagram.
The median is the middle score of the data (or the average of the two middle scores). This means that 50% of the
scores are less than or equal to the median. Quartiles work the same way.
Quartiles
` Q2 = 3 + 3 = 3
2
c Find the lower quartile Q1
Since there are 20 scores, the Q1 will be the average of the scores in the 5 th and 6 th positions. From the
table, the score in 5 th position is 2, and the score in the 6 th position is 2.
` Q1 = 2 + 2 = 2
2
d Find the upper quartile Q3 ?
Since there are 20 scores, the Q3 will be the average of the scores in the 15 th and 16 th positions. From the
table, the score in the 15 th position is 4, and the score in the 16 th position is 5.
` Q3 = 4 + 5 = 4.5
2
• The range of a data set is the difference between the highest score and the lowest score.
• The interquartile range is the difference between the upper and lower quartiles. It is written as IQR.
` IQR = Q3 - Q1
A 5-point summary of a data set is a list of : The lowest value; Q1 ; Q2 ; Q3 and the highest value. Here is an example:
/ f = 36
a Find Q1
There are 36 scores in total, so Q1 is the average of the 9th and 10th scores (median of the lower half). The
cf of x = 10 is 5 and the cf of x = 12 is 13.
` Q1 = 12 + 12 = 12
2
b Find Q2
There are 36 scores in total, so the median is the average of the scores in 18th and 19th position:
` Q2 = 16 + 16 = 16
2
c Find Q3
There are 36 scores in total, so Q3 is the average of the in 27th and the 28th scores. The cf of x = 16 is 27
and cf of x = 18 is 33:
` Q3 = 16 + 18 = 17
2
c Find Q1 .
/ f = 52
a Complete the cumulative frequency column.
d Find Q1 and Q3 .
Box-and-Whisker Plots
These are used to compare different data sets. They are drawn like this:
Q1 Median Q3
Lowest value Highest value
29 , 26 , 30 , 22 , 30 , 21 , 22 , 22 , 25 , 24 , 21 , 26
21 , 21 , 22 , 22 , 22 , 24 , 25 , 26 , 26 , 29 , 30 , 30
Q1 Q2 Q3
b Find a 5-point summary.
• Q3 will be the average of the 9rd and the 10th position so:
` Q3 = 26 + 29 = 27.5
2
• The greatest number in the set is 30.
20 21 22 23 24 25 26 27 28 29 30
The table below shows average temperatures for a city over two years
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2009 28 22 24 22 16 26 20 25 29 20 23 24
2010 21 24 26 25 24 25 24 25 30 27 28 36
2009 2010
Ascending Order 16 , 20 , 20 , 22 , 22 , 23 , 24 , 24 , 25 , 26 , 28 , 29 21 , 24 , 24 , 24 , 25 , 25 , 25 , 26 , 27 , 28 , 30 , 36
Lowest Temperature 16 21
Q1 21 24
Q2 23.5 25
Q3 25.5 27.5
Highest Temperature 29 36
b Draw box and whisker plots for the average temperatures in 2009 and 2010.
2009
2010
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Standard Deviation
Standard deviation measures the average distance each score is away from the mean. It has this symbol using
lower case sigma v n , pronounced 'sigma-n'. This is the formula for v n :
vn =
/ ^ x - xrh2
n
Where xr is the mean and / still means 'sum of'. Here is an example:
Find the standard deviation (correct to 1 decimal place) of this set of data: 11, 8, 13, 3, 9, 15, 17, 17, 6, 11
Score (x) x - xr ^ x - xr h2
11 11 - 11 = 0 0
8 8 - 11 = -3 9
13 13 - 11 = 2 4
3 3 - 11 = -8 64
9 9 - 11 = -2 4
15 15 - 11 = 4 16
17 17 - 11 = 6 36
17 17 - 11 = 6 36
6 6 - 11 = -5 25
11 11 - 11 = 0 0
vn =
/ ^ x - xrh2
n
= 194
10
= 4.4 ^1 d.p.h
1. An athlete runs the same race 16 times. This is how long it takes him (in seconds) to run each time:
14 , 12 , 18 , 14 , 16 , 18 , 19 , 14 , 16 , 17 , 15 , 13 , 20 , 16 , 14 , 19
11 12 13 14 15 16 17 18 19 20 21
2. During 8 days, a cat and a dog eat an amount of food (in grams) according to the table below.
Dog 65 100 90 80 75 85 50 85
c Draw box-and-whisker plots for the different data sets on the number line below.
d From the box-and-whisker plot, which had the greater interquartile range? Find the interquartile range.
3. 150 men and 150 women were in a survey and these are the resulting box-and-whisker plots from their ages:
Women
Men
20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58
f What is the age that half the men were older than?
4. Ava counted the number of books she read each month for a year. She wrote them in the table below:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
5 6 4 1 6 1 9 9 10 10 5 6
x x - xr ^ x - xr h2
5 -1
6 0
4
1 -5
6
1 -5
9
9 3 9
10 4
10
5
6 0 0
/ ^ x - xrh = / ^ x - xrh2 =
c What is the formula to calculate v n ?
d Show that the standard deviation of the above set of data to 2 decimal places is 2.97.
Skewness of Data
A set of data can be one of three things: a normal distribution, skewed to the right or skewed to the left.
This is only a general rule of thumb which holds most of the time and exceptions to this rule do occur. This can be
used to compare different data sets.
Two data sets are shown on the column graph below, data set 1 (white) and data set 2 (black).
60
50
40
30
20
10
0
median = 25 + 30 median = 15 + 20
2 2
= 27.5 = 17.5
c Find the mean of both data sets and comment on the skewness of each data set.
Spread of Data
The spread of a data set measures how consistent (close to the mean) a data set is. This depends on:
• Range – the wider the range of the data set the less likely scores will be close to the mean
• Interquartile range – the wider the range of the data set the less likely scores will be close to the mean
• Standard deviation – v n measures how far the scores are from the mean.
A gameplayer tries two strategies for playing a game. He tries each strategy eight times and these are the
points received
Strategy 1: 34 , 35 , 28 , 28 , 30 , 31 , 32 , 27 Strategy 2: 1 , 13 , 5 , 10 , 16 , 14 , 1 , 5
Strategy 1 Strategy 2
Lowest 27 1
1+5 = 3
Q1 28
2
Q2 30.5 7.5
13 + 14 = 13.5
Q3 33 2
Highest 35 16
Strategy 1 Strategy 2
Range 35 - 27 = 8 16 - 1 = 15
IQR 33 - 28 = 5 13.5 - 3 = 10.5
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Strategy 2 has a larger range and interquartile range, so it is expected to have a larger standard deviation.
vn =
/ ^ x - xrh2 v n for Strategy 1: 2.74 (2 d.p.) As expected, the standard deviation
n v n for Strategy 2: 5.53 (2 d.p.) for Strategy 2 is greater.
1. Two movies received reviews from eight critics who gave the movie a score between 1 and 10.
Here are their results:
Movie 1: 3 , 8 , 8 , 6 , 3 , 5 , 2 , 5 Movie 2: 8 , 7 , 3 , 4 , 7 , 10 , 6 , 3
a Use this table to find the standard deviation of the Movies' scores:
Movie 1 Movie 2
Score (x) x - xr ^ x - xr h2 Score (x) x - xr ^ x - xr h2
3 8
8 7
8 3
6 4
3 7
5 10
2 6
5 3
/ ^ x - xrh = / ^ x - xrh =
2
/ ^ x - xrh = / ^ x - xrh2 =
Movie 1 Movie 2
Lowest
Q1
Q2
Q3
Highest
vn
2. At the olympics, divers receive a score between 1 and 10 each time they dive. These are the scores after
12 dives for the divers who came in first and second place.
Diver A Diver B
7 ,8 ,5 ,7 ,8 ,6 ,6 ,5 ,8 ,5 ,8 ,5 7 , 6 , 4 , 5 , 6 , 5 , 10 , 9 , 8 , 6 , 9 , 9
c Draw a box-and-whisker plot for each of the divers' scores. Are the scores skewed?
0 1 2 3 4 5 6 7 8 9 10 11 12
d Draw up a table with the headings, score (x), x - xr and ^ x - xr h2 and use it to find v n for both divers.
a If the median is less than the mean which way would the data be skewed (according to the rule of thumb)?
d Standard deviation is a measure of how far each score is from the mean. What does this mean?
f Does data with a higher or lower standard deviation have more consistency?
e
d
b
Number of Cumulative
Yes,
xr =
Frequency (f) fx Basics:
languages (x) frequency (cf)
mode = 1
1 20 20 # 1 = 20
median = 2
2 38 - 20 = 18 18 # 2 = 36 20 + 18 = 38
0 + 20 = 20
/ f 60
3 50 - 38 = 12 12 # 3 = 36 38 + 12 = 50
Interpreting Data
4 7 4 # 7 = 28 50 + 7 = 57
/ fx = 135 = 2.25
Mathletics 100%
100% Interpreting Data
© 3P Learning
c
3. a
2. a
d
b
Ages Frequency Cumulative Movies Frequency Cumulative
fx fx
Basics:
mode = 20
median = 22
31 3 4+3=7 21 5 8 + 5 = 13
30 × 4 = 120 20 × 8 = 160
xr = 21.93 (2 d.p.)
32 3 7 + 3 = 10 22 7 13 + 7 = 20
31 × 3 = 93 21 × 5 = 105
33 3 10 + 3 = 13 23 3 20 + 3 = 23
32 × 3 = 96 22 × 7 = 154
34 2 13 + 2 = 15 24 2 23 + 2 = 25
SERIES
33 × 3 = 99 23 × 3 = 69
Answers
35 5 15 + 5 = 20 25 4 25 + 4 = 29
34 × 2 = 68 24 × 2 = 48
K 19
TOPIC
/ f = 20 / fx = 651 / f = 29 / fx = 636
35 × 5 = 175 25 × 4 = 100
25
Interpreting Data Answers
20 3 5+3=8
3 30 1 8+1=9
2 40 8 9 + 8 = 17
1
50 3 17 + 3 = 20
60 9 20 + 9 = 29
0
30 31 32 33 34 35
Ages (x)
70 6 29 + 6 = 35
80 3 35 + 3 = 38
c xr = 32.55
90 8 38 + 8 = 46
d mode = 35 100 6 46 + 6 = 52
/ f = 52
e median = 32.5
• Q1 = 14 (lower quartile)
• Q2 = 16 + 16 = 16 (the median)
2
• Q3 = 18 (upper quartile)
• Highest score = 20
35 40 45 50 55 60 65 70 75 80 85 90 95 100
Dog
Cat
d Range = 8
d IQR (cat) = 42.5
e IQR = 4
IQR (dog) = 17.5
/ (x xr) 2 Lowest 2 3
c vn =
n Q1 3 3.5
Movie 1: v n = 2.12
3 4 5 6 7 8 9 10
c
Movie 1 Movie 2
Range 8-2=6 10 - 3 = 7
Interquartile
7-3=4 7.5 - 3.5 = 4
range
vn 2.12 2.35
Diver A
Diver B
2. a
Diver A Diver B
The scores for Diver A are not skewed. The
scores for Diver B are skewed to the right
Lowest 5 4
(the box-and-whisker plot is longer on the
right hand side of the median)
Q1 5 5.5
d
Q2 6.5 6.5 Diver A
x x xr (x xr ) 2
Q3 8 9
5 -1.5 2.25
5 -1.5 2.25
Highest 8 10
5 -1.5 2.25
5 -1.5 2.25
b
Diver A Diver B 6 -0.5 0.25
6 -0.5 0.25
Range 8-5=3 10 - 4 = 6
7 0.5 0.25
Interquartile 7 0.5 0.25
8-5=3 9 - 5.5 = 3.5
range 8 1.5 2.25
8 1.5 2.25
8 1.5 2.25
8 1.5 2.25
/ ^ x - xrh = 0 / (x - xr) 2 = 19
Diver A: v n = 1.126
x x xr (x xr ) 2
4 -3 9
5 -2 4
5 -2 4
6 -1 1
6 -1 1
6 -1 1
7 0 0
e If the winner was based on total score, e The more consistent the scores, the
Diver B won. lower the standard deviation. The less
consistent the scores, the higher the
f Diver A had the more consistent scores. standard deviation.
This is shown by the lower standard
deviation of Diver A’s scores. f Data with a lower standard deviation has
more consistency.
www.mathletics.com