0% found this document useful (0 votes)
11 views

Statistical Foundations: SOST70151 - LECTURE 5

Uploaded by

zhaoluanw
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Statistical Foundations: SOST70151 - LECTURE 5

Uploaded by

zhaoluanw
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

STATISTICAL FOUNDATIONS

SOST70151 – LECTURE 5

Prof. Natalie Shlomo and Dr. Kathrin Morosow


Overview

The Normal Distribution and the Central Limit Theorem

1. Normal Distribution
2. Central Limit Theorem
Normal Distribution

i.e. NOT
General Properties:

• Continuous

dgamma(c(1:10000)/10000, 2, rate = 8)

3.0
2.5
2.0
1.5
1.0
• Symmetric

0.5
0.0
0 2000 4000 6000 8000 10000

Index

• uni-modaltakes all values either side of 0…

• takes all values either side of 0…


Normal Distribution

has two parameters that


General Properties: completely determines the pdf

• Continuous we say that


X~N(μ, σ2)
• Symmetric
expectation E(X) = μ
variance V(X) = σ2
• uni-modaltakes all values either side of 0…

• takes all values either side of 0…


Normal Distribution

General Properties:
μ=0 σ2=1
0.6

has two parameters that


μ=0 σ2=4
completely determines the pdf
0.5

μ=2 σ2=1
0.4

we say that
f(x)

0.3

X~N(μ, σ2)
0.2
0.1

expectation E(X) = μ
0.0

variance V(X) = σ2
-2 0 2 4

x
Normal Distribution

Error:
when there is no reason for error going one way or the other the distribution of
deviations is quite often:

(-) error (+) error


Normal Distribution

The curve described by the pdf


1
1 - ( x - m )2
f (x) = e 2s 2

2ps 2
size of error

(-) error (+) error


μ x
Normal Distribution

The curve described by the pdf


1
1 - ( x - m )2
f (x) = e 2s 2 size of error weighted

2ps 2
by the variance

σ2

(-) error (+) error


μ x
Normal Distribution

The curve described by the pdf


1
1 - ( x - m )2
f (x) = e 2s 2 further from mean
2ps 2  lower the probability density (function)

(-) error (+) error


μ x
0.0 0.1 0.2 0.3 0.4 0.5 0.6
Normal Distribution
f(x)

f (x) = 1
exp{- 1
2
(x - m ) 2
/ s 2
}
2 ps 2

-2 -1 0 1 2

x
x x-µ (x-µ)2 (x-µ)2/σ2 -.5*(x-µ)2/σ2 exp(-.5*(x-µ)2/σ2)
-2 -2 4 4 -2 0.1353
-1 -1 1 1 -0.5 0.6065
0 0 0 0 0 1.0000
1 1 1 1 -0.5 0.6065
2 2 4 4 -2 0.1353
Normal Distribution

Probabilities
finding areas under the curve P(X<x) = ?

Symmetric
 P(X<μ) = P(X>μ)
= 0.5

x
μ
Normal Distribution

Probabilities
finding areas under the curve P(X<x) = ?

≈ 100%
Normal Distribution

Probabilities
using known probabilities and symmetry we can use standard probability calculus

P(X<-1.96) =0.025 P(X>1.96) = 1 - P(X<1.96)


= 1-[1- P(X<-1.96)]
= 0.025

μ x
Normal Distribution

Standard normal distribution


Z~N (0,1)

P(0 < Z < z) = P(Z < z) – P( Z < 0)


= P(Z < z) – 1/2
Normal Distribution

Standard normal distribution


Z~N (0,1)

P(0 < Z < z) = P(Z < z) – P( Z < 0) = P(Z < z) – ½


P(0 < Z < 1) = P(Z < 1) - P( Z < 0) = 0.8413 – 0.5
= 0.3413

μ=0 1
x
Normal Distribution

Areas under normal curve

68.2% of the observations lie within one


SD either side of the mean

95.4% within two SDs either side of the


mean

Questions we can ask, e.g:

What % of the total area will lie 2.3% 13.6% 34.1% 34.1% 13.6% 2.3%
between mean and 2 SD to the right
of the mean?
Normal Distribution

Properties

In general, it is not true that a linear transformation of a variable has the same distribution
For the normal distribution however
X~N(μ , σ2) V(Y) = V(a+bX)
we knew this
= b2V(X)
means that
a+bX ~ N(a+bμ , b2σ2)

E(Y) = a+bE(X) we knew this


Normal Distribution

Properties

A particularly important transformation is the standard normal distribution for

X~N(μ , σ2)

the standard normal is obtained as


(𝑥−μ)
Z= ~ N(0, 1)
σ

when using tables


(𝑋−μ) (𝑥−μ) (𝑥−μ)
𝑃 𝑋<𝑥 =𝑃 < =𝑃 𝑍<
σ σ σ

Some examples…..
Application

• Researchers gather data from the population and calculate a sample mean with those
data; then they use the result to draw conclusions

• Why should we trust the mean? For one thing, if the researchers had interviewed
different people from the same population, they would have obtained a different result

• This points to the fact that the sample mean, as an instrument, before data are
available, has to have some desirable properties. So which properties are these?
Application

• Consider X, a random variable describing, for instance, income in the UK; and let FX (x),
fX (x) be the distribution and density of X respectively. Suppose you are going to
interview i = 1, 2, 3, . . . ,n people in this population, chosen at random, so that
responses are independent from each other.

• Before we interview people, their responses are unknown (uncertain). So the response
of person i is a random variable; say xi

• Which values might person i return? We don’t know for sure, but we know that in the
population, income is distributed FX (x).

• Since i is a member of the population, his/her response xi ought to mimic the behaviour
of X in the population, and the latter is summarised by FX (x)
Application

• We have X with distribution FX (x); let’s say that E(X) = μ and V(X) = 2. We will have a
sample from that population
X1,X2, . . . ,Xn, and with that future data we will compute,

σ𝑖 𝑥𝑖
𝑋ത =
𝑛
• Let’s start by noticing that the sample mean is a sum of random variables (scaled by a
constant, 1/n), so we can use the properties of expectations pertaining to sums of
random variables to work out the expected value of future data we will compute
ത 𝐸(𝑋)
𝑋: ത
ത will tell us what kind of information we should expect to receive from 𝑋ത (once we
• 𝐸(𝑋)
have data)
Application

Expected Value of the Sample Mean

Given the previous setting,


1 1
𝐸 𝑋ത = 𝐸 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 = [𝐸 𝑥1 ) + 𝐸(𝑥2 ) + ⋯ + 𝐸(𝑥𝑛 ]
𝑛 𝑛
Now xi and X have the same distribution; this means
𝐸 𝑥𝑖 = 𝐸 𝑋 = 𝜇
It thus follows,
1 1
𝐸 𝑋ത = 𝐸 𝑥1 ) + 𝐸(𝑥2 ) + ⋯ + 𝐸(𝑥𝑛 = nμ = 𝜇
𝑛 𝑛

We say that the sample mean, 𝑋ത , is unbiased for the population mean if 𝐸 𝑋ത = 𝜇
Application

Expected Value of the Sample Mean

• If we could draw a large number of samples b = 1, 2, 3,... of fixed size n from the
population using the same design, and we calculate the sample mean for each sample
𝑋𝑏 , b=1,2,3….
the average of these sample averages: and then we average all these sample
averages:
1
σ 𝑋 , this will equal E(X)
𝐵 𝑏 𝑏

• we expect that the sample mean will be a good proxy for E(X)
• Note that we have not bothered here about the actual shape of FX (x) -whether this was
normal, gamma, Poisson or something else
Application

Variance of the Sample mean

• xi and X have the same distribution; this means V(xi ) = V(X) = 𝜎 2

1 1 1
• 𝑉𝑎𝑟 𝑋ത = 𝑉𝑎𝑟 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 = 𝑛𝜎 2 = 𝜎 2
𝑛2 𝑛2 𝑛

• So, 𝑉𝑎𝑟 𝑋ത ≠ 𝑉𝑎𝑟(𝑋) and indeed it is much smaller

• The first conclusion to draw is then that the distributions of X and 𝑋ത cannot be the same
( the distribution of 𝑋ത is narrower) and as n increases that distribution becomes more
and more informative, i.e. centred around the mean
Application

Distribution of the Sample Mean

So we have two results:


𝐸 𝑋ത = 𝐸 𝑋 = 𝜇
𝑉𝑎𝑟 𝑋 𝜎2
𝑉𝑎𝑟 𝑋ത = =
𝑛 𝑛

• the larger our samples, the closer ഥ


X will be μ

• So we know that 𝐹𝑋 𝑋 ≠ 𝐹𝑋ത 𝑋ത ,


• What is the shape of 𝐹𝑋ത 𝑋ത ?
Central Limit Theorem
Central Limit Theorem

• Regardless what the distribution of X is, for large n, if xi are independent and E(xi ) = μ,
V(xi ) = 𝜎 2 are finite, then:
𝑛 𝑋ത − 𝜇 ~𝑁(0, 𝜎 2 ) for large n (approximately)

• The 𝑛 stops the distribution of 𝑋ത from concentrating around μ:


𝜎 2
• Note that Var 𝑛 𝑋ത − 𝜇 = Var 𝑛𝑋ത = nVar 𝑋ത = 𝑛 =𝜎 2
𝑛

• Standardized version of CLT:


𝑛 𝑋ത −𝜇
~𝑁(0,1)
𝜎2
𝑋 1 2
since 𝑉𝑎𝑟 = 𝜎 =1
𝜎 𝜎2
Central Limit Theorem
Central Limit Theorem

Example:

Research Question: What is the average annual income of


full-time students in the UK?
Central Limit Theorem

Example: Sampling from the population

Though experiment:
Imagine all 2.3M students in the UK (statistical population)
Central Limit Theorem

Example: Sampling from the population

Though experiment:
But what if we don’t know their income?
Central Limit Theorem

Example: Sampling from the population

We draw a sample of size N=30 students from this


population (2.3M) and compute their mean income
Central Limit Theorem

Example: Sampling from the population

We draw a sample of size N=30 students from this


population (2.3M) and compute their mean income
Central Limit Theorem

Example: Sampling from the population

… and then we draw another sample of size N=30 and


compute its mean, and another… and another…
Central Limit Theorem

Example: Sampling from the population

… and then we draw another sample of size N=30 and


compute its mean, and another… and another…
Central Limit Theorem

Example: Sampling from the population

… and then we draw another sample of size N=30 and


compute its mean, and another… and another…
Central Limit Theorem

Example: Sampling from the population

… and then we draw another sample of size N=30 and


compute its mean, and another… and another…
Central Limit Theorem

Example: Sampling from the population

… and then we draw another sample of size N=30 and


compute its mean, and another… and another…
Central Limit Theorem

Example: Sampling from the population

… and then we draw another sample of size N=30 and


compute its mean, and another… and another…
Central Limit Theorem

Example: Sampling from the population

Sample size is N=30 students


Central Limit Theorem

Example: Sampling from the population

Sample size is N=30 students


Central Limit Theorem

Example: Sampling from the population

Sample size is N=30 students


Central Limit Theorem

Example: Sampling from the population

Sample size is N=30 students


Central Limit Theorem

Example: Sampling from the population

Sample size is N=30 students


Central Limit Theorem

Example: Sampling from the population

Sample size is N=30 students


Central Limit Theorem

Drawing samples 10 thousand times


Central Limit Theorem
Central Limit Theorem - Summary

1. The sample means will cluster around the population mean

2. The sample means will follow a bell curve (normal distribution)

3. Unknown distribution of a characteristic in the population

4. The larger the sample size, the more precise the mean
 standard error gets smaller
Reading

The Normal Distribution and Central Limit Theorem

Crawshaw & Chambers (2014): Ch.: 7 & 8 & 9

Agresti, A. (2018): Ch.: 4.3 – 4.7

Gill (2006): Ch. 8.3.9

You might also like