0% found this document useful (0 votes)
26 views13 pages

Further Statistics Chapter 5

The document discusses the Central Limit Theorem (CLT), explaining how it applies to random samples and the distribution of sample means. It outlines key concepts such as the relationship between population distribution, sample size, and variance, and provides examples to illustrate the theorem's application in various scenarios. Additionally, it includes exercises and test questions to reinforce understanding of the CLT and its implications in statistics.

Uploaded by

electricskr2574
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views13 pages

Further Statistics Chapter 5

The document discusses the Central Limit Theorem (CLT), explaining how it applies to random samples and the distribution of sample means. It outlines key concepts such as the relationship between population distribution, sample size, and variance, and provides examples to illustrate the theorem's application in various scenarios. Additionally, it includes exercises and test questions to reinforce understanding of the CLT and its implications in statistics.

Uploaded by

electricskr2574
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Further Stats 1 Chapter 5 ::

Central Limit Theorem


[email protected]
www.drfrostmaths.com
@DrFrostMaths

1
Last modified: 22nd July 2018
Register at: www.drfrostmaths.com
Everything is completely free.
Practise questions by chapter, including
past paper questions and extension
questions (e.g. MAT).

Teaching videos with


Teachers: you can create student accounts
topic tests to check (or students can register themselves).
understanding.
2
Recap of Adding Random Variables
Suppose that we had a fair three-sided spinner, and span it twice, represented by
random variables 𝑋1 , 𝑋2 , both of which are the same discrete uniform distribution:

𝒙𝟏 𝟏 𝟐 𝟑 𝒙𝟐 𝟏 𝟐 𝟑
𝑝(𝑥1 ) 1 1 1 𝑝(𝑥2) 1 1 1
3 3 3 3 3 3
Then 𝑌 = 𝑋1 + 𝑋2 would represent the distribution of adding each
possible outcome from 𝑋1 with each possible outcome from 𝑋2 :

𝒚 = 𝒙𝟏 + 𝒙𝟐 𝟏 𝟐 𝟑 𝒚 𝟐 𝟑 𝟒 𝟓 𝟔
𝟏 2 3 4 𝑝(𝑦) 1 2 3 2 3
9 9 9 9 9
𝟐 3 4 5
𝑝(𝑦) Already, the shape of the
𝟑 4 5 6 distribution is vaguely
resembling a well-known
Each combined outcome has a distribution…
1 1 1 𝑦 3
probability of 3 × 3 = 9 2 3 4 5 6
Adding Random Variables

𝑋1 𝑋2 𝑋3
Let’s now get a sample of 3 values, i.e. spin it 3 times: 𝑌 = 𝑋1 + 𝑋2 + 𝑋3
𝑝(𝑦)
That’s looking pretty
𝒚 𝟑 𝟒 𝟓 𝟔 𝟕 𝟖 𝟗 damn like a normal
distribution now…
𝑝(𝑦) 1 3 6 7 6 3 1
27 27 27 27 27 27 27 𝑦
3 4 5 6 7 8 9
𝑋1 +𝑋2 +𝑋3
If we divide each of these combined outcomes by 3, then we’d have ത
= 𝑋:
3


𝒙 𝟏 𝟏 𝟐 𝟐 𝟏 𝟐 𝟑 So it appears that the distribution of possible
𝟏 𝟏 𝟐 𝟐 means 𝑋ത of the sample of 3 spins approximately
𝟑 𝟑 𝟑 𝟑 forms a normal distribution; and becomes more
𝑝(𝑥)ҧ 1 3 6 7 6 3 1 normal as we increase the number of spinners in
27 27 27 27 27 27 27 our sample. This will always occur regardless of
what the distribution of the original spinner was
(whether discrete uniform or otherwise), provided
that we’re spinning the same spinner! 4
Central Limit Theorem
 The central limit theorem says that if 𝑋1, 𝑋2, … , 𝑋𝑛 is a random sample of size 𝑛 from
𝜎2
2 ത
a population with mean 𝜇 and variance 𝜎 , then 𝑋 is approximately ~𝑁 𝜇 ,
𝑛

Important key understanding points:

• 𝑋 represents the population distribution, i.e. a single choice of something from the population.
• We are generating a sample of size 𝑛, so we use a distribution 𝑋𝑖 to represent the choice of each thing
from the population for the sample, each 𝑋𝑖 obviously with the same distribution as the population.
Since each 𝑋𝑖 is independent, it means we could technically sample the same value twice (as could
happen with the spinner) but given a large population, would unlikely occur in practice.
• Don’t get confused between the distribution 𝑋 (i.e. a single choice from the population), and 𝑋ത (a
distribution over the different sample means we could get as we take different samples).
𝜎2
• ത
The variance of 𝑋 is . This means as we increase the sample size, the variance of the sample means
𝑛
decreases. This makes sense: with a larger sample size, we expect the sample means to be more
consistent and be closer to the true population mean 𝜇.

ഥ distribution for the mean of 3 spinners…


Our 𝑿
(Using techniques ഥ
𝒙 𝟏 𝟏 𝟐 𝟐 𝟏 𝟐 𝟑
𝟏 𝟏 𝟐 𝟐
𝒙 𝟏 𝟐 𝟑 from Chp1) 𝟑 𝟑 𝟑 𝟑
𝜇=2 𝑝(𝑥)ҧ 1 3 6 7 6 3 1
𝑝(𝑥) 1 1 1 2
2
𝜎 = 27 27 27 27 27 27 27
3 3 3 3 Mean of the distribution above is still 2. Standard deviation
2 𝜎2 5
of the distribution above is ; this is indeed !
9 𝑛
When does CLT apply and when doesn’t it?

If the original population distribution is already normally distributed (e.g. heights


of people), then the sample mean 𝑋ത will always be normally distributed, even if the
𝜎2

sample size is small. i.e. 𝑋 is distributed as 𝑁 𝜇, and the CLT need not be used.
𝑛

The Central Limit Theorem allows us to say that 𝑋ത is approximately normally


distributed, even if the original population distribution is not normally distributed.
However, we require the sample size to be large for the normal distribution to be a
good approximation of 𝑋. ത
For example if 𝑛 = 1, then 𝑋ത will have the
same distribution as the population!

6
Example

[Textbook] A sample of size 9 is taken from a population with distribution 𝑁 10,22 .


Find the probability the sample mean 𝑋ത is more than 11.

Population is normally distributed so 𝑋ത is ?


normally distributed despite the
small sample size.

2 2 2 2 Recall that we typically write a normal



𝑋~𝑁 10, ത
→ 𝑋~𝑁 10, distribution in the form 𝑁 … , . . 2 so that
9 3
the standard deviation is clear.
𝑃 𝑋ത > 11 = 1 − 𝑃 𝑋ത < 11
= 0.0668 (4𝑑𝑝) ? Use your calculator.

7
Test Your Understanding
[Textbook] A six-sided dice is relabelled so that there are three faces marked 1, two faces
marked 3 and one face marked 6. The dice is rolled 40 times and the mean of the 40 scores
is recorded.
(a) Find an approximation distribution for the mean of the scores.
(b) Use your approximation to estimate the probability that the mean is greater than 3.
Help: The probability distribution of the dice is the population distribution (as it’s what we use to create samples).
Help: Use your Chapter 1 knowledge to find 𝐸(𝑋) and 𝑉𝑎𝑟(𝑋) of this distribution.

𝑥 Population
𝑃(𝑋 = 𝑥) distribution.

1 1 1
∴𝜇 =𝐸 𝑋 = 1× + 3× + ?6 × = 2.5
2 3 6
1 1 1 13
𝜎 2 = 𝑉𝑎𝑟 𝑋 = 12 × + 32 × ?+ 62 × − 2.52 =
2 3 6 4
2
13
By the Central Limit Theorem, 𝑋ത ≈ 𝑁 2.5,
160

𝑃 𝑋ത > 3 = 0.0397 (4𝑑𝑝)


? (typo in textbook)
8
Test Your Understanding
[Textbook] A six-sided dice is relabelled so that there are three faces marked 1, two faces
marked 3 and one face marked 6. The dice is rolled 40 times and the mean of the 40 scores
is recorded.
(a) Find an approximation distribution for the mean of the scores.
(b) Use your approximation to estimate the probability that the mean is greater than 3.
Help: The probability distribution of the dice is the population distribution (as it’s what we use to create samples).
Help: Use your Chapter 1 knowledge to find 𝐸(𝑋) and 𝑉𝑎𝑟(𝑋) of this distribution.

𝑥 1 3 6 Population
𝑃(𝑋 = 𝑥) 1
2
? 1
3
1
6
distribution.

1 1 1
∴𝜇 =𝐸 𝑋 = 1× + 3× + ?6 × = 2.5
2 3 6
1 1 1 13
𝜎 2 = 𝑉𝑎𝑟 𝑋 = 12 × + 32 × ?+ 62 × − 2.52 =
2 3 6 4
2
13
By the Central Limit Theorem, 𝑋ത ≈ 𝑁 2.5,
160

𝑃 𝑋ത > 3 = 0.0397 (4𝑑𝑝)


? (typo in textbook)
9
Exercise 5A
Pearson Further Statistics 1
Pages 78-80

10
CLT with Poisson, Binomial and Geometric Distribs
[Textbook] A supermarket manager is trying to model the number of customers that visit
her store each day. She observes that, on average, 20 new customers enter the store every
minute.
(a) Calculate the probability that fewer than 15 customers arrive in a given minute.
(b) Find the probability that in one hour no more than 1150 customers arrive.
(c) Use the Central Limit Theorem to estimate the probability that in one hour no more
than 1150 customers arrive. Compare your answer to part b.

a Let 𝑋 denote the number of customers that arrive in a minute. Then 𝑋~𝑃𝑜(20).
𝑃 𝑋 < 15 = 0.1049 (4𝑑𝑝) ? (c) is an unusual way of solving
the problem. Using the Stats
Year 2 approach, we could use
b Let 𝑇 denote the number of customers that arrive in an hour. 𝑇~𝑃𝑜(1200) and directly use a
𝑇~𝑃𝑜 60 × ? 20 → 𝑇~𝑃𝑜(1200) normal approximation
2
𝑃 𝑇 ≤ 1150 = 0.0758 (4𝑑𝑝) 𝑌~𝑁(1200, 1200 ), finding
𝑃(𝑌 < 1150.5)

c We could consider each of the 60 minutes as a separate sample. Thus the observed
1150
average customers per minute is = 19.1666 …
60
2
Since 𝑉𝑎𝑟 𝑋 = 𝐸 𝑋 = 𝜆, by CLT, 𝑋ത ≈ 𝑁
?20, 20
60

𝑃 𝑋ത ≤ 1150 = 0.0745 (4𝑑𝑝), which is close, so approximation using CLT is a good one.
11
Test Your Understanding
[Textbook] Billy is the captain of a football team. Each week he gets a team together by
calling his friends one by one and asking if they would like to play. The probability of each
2
friend agreeing to play is . Once he has 10 other players he stops calling.
3
(a) Calculate the number of friends Billy expects to have a call to find 10 other players.
(b) Find the probability that Billy has to call exactly 12 friends.
In a season, Billy’s team plays 25 matches.
(c) Estimate the probability that the mean number of calls per match Billy had to make was
less than 15.5.
a Let 𝑋 be the number of friends Billy calls. Then
𝑋~𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 10, so ?
2 30
𝐸 𝑋 = = 15 Recap: If 𝑋~𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝐵(𝑟, 𝑝)
3 2
then
b 11 2 10 1 2 𝑟 𝑟 1−𝑝
𝑃 𝑋 = 12 = × × 𝐸 𝑋 = 𝑝 and 𝑉𝑎𝑟 𝑋 =
9 3? 3 𝑝2

= 0.1060
1
c 10×3
𝐸 𝑋 = 15 and 𝑉𝑎𝑟 𝑋 = 2 2
= 7.5
3

? sample mean 𝑋ത is
For a sample of size 25, the
7.5
approximately ~𝑁 15, or 𝑁 15,0.3 by the CLT.
25
𝑃 𝑋ത < 15.5 ≈ 0.8193 12
Exercise 5B
Pearson Further Statistics 1
Pages 81-82

13

You might also like