Further Statistics Chapter 5
Further Statistics Chapter 5
1
Last modified: 22nd July 2018
Register at: www.drfrostmaths.com
Everything is completely free.
Practise questions by chapter, including
past paper questions and extension
questions (e.g. MAT).
𝒙𝟏 𝟏 𝟐 𝟑 𝒙𝟐 𝟏 𝟐 𝟑
𝑝(𝑥1 ) 1 1 1 𝑝(𝑥2) 1 1 1
3 3 3 3 3 3
Then 𝑌 = 𝑋1 + 𝑋2 would represent the distribution of adding each
possible outcome from 𝑋1 with each possible outcome from 𝑋2 :
𝒚 = 𝒙𝟏 + 𝒙𝟐 𝟏 𝟐 𝟑 𝒚 𝟐 𝟑 𝟒 𝟓 𝟔
𝟏 2 3 4 𝑝(𝑦) 1 2 3 2 3
9 9 9 9 9
𝟐 3 4 5
𝑝(𝑦) Already, the shape of the
𝟑 4 5 6 distribution is vaguely
resembling a well-known
Each combined outcome has a distribution…
1 1 1 𝑦 3
probability of 3 × 3 = 9 2 3 4 5 6
Adding Random Variables
𝑋1 𝑋2 𝑋3
Let’s now get a sample of 3 values, i.e. spin it 3 times: 𝑌 = 𝑋1 + 𝑋2 + 𝑋3
𝑝(𝑦)
That’s looking pretty
𝒚 𝟑 𝟒 𝟓 𝟔 𝟕 𝟖 𝟗 damn like a normal
distribution now…
𝑝(𝑦) 1 3 6 7 6 3 1
27 27 27 27 27 27 27 𝑦
3 4 5 6 7 8 9
𝑋1 +𝑋2 +𝑋3
If we divide each of these combined outcomes by 3, then we’d have ത
= 𝑋:
3
ഥ
𝒙 𝟏 𝟏 𝟐 𝟐 𝟏 𝟐 𝟑 So it appears that the distribution of possible
𝟏 𝟏 𝟐 𝟐 means 𝑋ത of the sample of 3 spins approximately
𝟑 𝟑 𝟑 𝟑 forms a normal distribution; and becomes more
𝑝(𝑥)ҧ 1 3 6 7 6 3 1 normal as we increase the number of spinners in
27 27 27 27 27 27 27 our sample. This will always occur regardless of
what the distribution of the original spinner was
(whether discrete uniform or otherwise), provided
that we’re spinning the same spinner! 4
Central Limit Theorem
The central limit theorem says that if 𝑋1, 𝑋2, … , 𝑋𝑛 is a random sample of size 𝑛 from
𝜎2
2 ത
a population with mean 𝜇 and variance 𝜎 , then 𝑋 is approximately ~𝑁 𝜇 ,
𝑛
• 𝑋 represents the population distribution, i.e. a single choice of something from the population.
• We are generating a sample of size 𝑛, so we use a distribution 𝑋𝑖 to represent the choice of each thing
from the population for the sample, each 𝑋𝑖 obviously with the same distribution as the population.
Since each 𝑋𝑖 is independent, it means we could technically sample the same value twice (as could
happen with the spinner) but given a large population, would unlikely occur in practice.
• Don’t get confused between the distribution 𝑋 (i.e. a single choice from the population), and 𝑋ത (a
distribution over the different sample means we could get as we take different samples).
𝜎2
• ത
The variance of 𝑋 is . This means as we increase the sample size, the variance of the sample means
𝑛
decreases. This makes sense: with a larger sample size, we expect the sample means to be more
consistent and be closer to the true population mean 𝜇.
6
Example
7
Test Your Understanding
[Textbook] A six-sided dice is relabelled so that there are three faces marked 1, two faces
marked 3 and one face marked 6. The dice is rolled 40 times and the mean of the 40 scores
is recorded.
(a) Find an approximation distribution for the mean of the scores.
(b) Use your approximation to estimate the probability that the mean is greater than 3.
Help: The probability distribution of the dice is the population distribution (as it’s what we use to create samples).
Help: Use your Chapter 1 knowledge to find 𝐸(𝑋) and 𝑉𝑎𝑟(𝑋) of this distribution.
𝑥 Population
𝑃(𝑋 = 𝑥) distribution.
1 1 1
∴𝜇 =𝐸 𝑋 = 1× + 3× + ?6 × = 2.5
2 3 6
1 1 1 13
𝜎 2 = 𝑉𝑎𝑟 𝑋 = 12 × + 32 × ?+ 62 × − 2.52 =
2 3 6 4
2
13
By the Central Limit Theorem, 𝑋ത ≈ 𝑁 2.5,
160
𝑥 1 3 6 Population
𝑃(𝑋 = 𝑥) 1
2
? 1
3
1
6
distribution.
1 1 1
∴𝜇 =𝐸 𝑋 = 1× + 3× + ?6 × = 2.5
2 3 6
1 1 1 13
𝜎 2 = 𝑉𝑎𝑟 𝑋 = 12 × + 32 × ?+ 62 × − 2.52 =
2 3 6 4
2
13
By the Central Limit Theorem, 𝑋ത ≈ 𝑁 2.5,
160
10
CLT with Poisson, Binomial and Geometric Distribs
[Textbook] A supermarket manager is trying to model the number of customers that visit
her store each day. She observes that, on average, 20 new customers enter the store every
minute.
(a) Calculate the probability that fewer than 15 customers arrive in a given minute.
(b) Find the probability that in one hour no more than 1150 customers arrive.
(c) Use the Central Limit Theorem to estimate the probability that in one hour no more
than 1150 customers arrive. Compare your answer to part b.
a Let 𝑋 denote the number of customers that arrive in a minute. Then 𝑋~𝑃𝑜(20).
𝑃 𝑋 < 15 = 0.1049 (4𝑑𝑝) ? (c) is an unusual way of solving
the problem. Using the Stats
Year 2 approach, we could use
b Let 𝑇 denote the number of customers that arrive in an hour. 𝑇~𝑃𝑜(1200) and directly use a
𝑇~𝑃𝑜 60 × ? 20 → 𝑇~𝑃𝑜(1200) normal approximation
2
𝑃 𝑇 ≤ 1150 = 0.0758 (4𝑑𝑝) 𝑌~𝑁(1200, 1200 ), finding
𝑃(𝑌 < 1150.5)
c We could consider each of the 60 minutes as a separate sample. Thus the observed
1150
average customers per minute is = 19.1666 …
60
2
Since 𝑉𝑎𝑟 𝑋 = 𝐸 𝑋 = 𝜆, by CLT, 𝑋ത ≈ 𝑁
?20, 20
60
𝑃 𝑋ത ≤ 1150 = 0.0745 (4𝑑𝑝), which is close, so approximation using CLT is a good one.
11
Test Your Understanding
[Textbook] Billy is the captain of a football team. Each week he gets a team together by
calling his friends one by one and asking if they would like to play. The probability of each
2
friend agreeing to play is . Once he has 10 other players he stops calling.
3
(a) Calculate the number of friends Billy expects to have a call to find 10 other players.
(b) Find the probability that Billy has to call exactly 12 friends.
In a season, Billy’s team plays 25 matches.
(c) Estimate the probability that the mean number of calls per match Billy had to make was
less than 15.5.
a Let 𝑋 be the number of friends Billy calls. Then
𝑋~𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 10, so ?
2 30
𝐸 𝑋 = = 15 Recap: If 𝑋~𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝐵(𝑟, 𝑝)
3 2
then
b 11 2 10 1 2 𝑟 𝑟 1−𝑝
𝑃 𝑋 = 12 = × × 𝐸 𝑋 = 𝑝 and 𝑉𝑎𝑟 𝑋 =
9 3? 3 𝑝2
= 0.1060
1
c 10×3
𝐸 𝑋 = 15 and 𝑉𝑎𝑟 𝑋 = 2 2
= 7.5
3
? sample mean 𝑋ത is
For a sample of size 25, the
7.5
approximately ~𝑁 15, or 𝑁 15,0.3 by the CLT.
25
𝑃 𝑋ത < 15.5 ≈ 0.8193 12
Exercise 5B
Pearson Further Statistics 1
Pages 81-82
13