Statistical Foundations: SOST70151 - LECTURE 5
Statistical Foundations: SOST70151 - LECTURE 5
SOST70151 – LECTURE 5
1. Normal Distribution
2. Central Limit Theorem
Normal Distribution
i.e. NOT
General Properties:
• Continuous
dgamma(c(1:10000)/10000, 2, rate = 8)
3.0
2.5
2.0
1.5
1.0
• Symmetric
0.5
0.0
0 2000 4000 6000 8000 10000
Index
General Properties:
μ=0 σ2=1
0.6
μ=2 σ2=1
0.4
we say that
f(x)
0.3
X~N(μ, σ2)
0.2
0.1
expectation E(X) = μ
0.0
variance V(X) = σ2
-2 0 2 4
x
Normal Distribution
Error:
when there is no reason for error going one way or the other the distribution of
deviations is quite often:
2ps 2
size of error
2ps 2
by the variance
σ2
f (x) = 1
exp{- 1
2
(x - m ) 2
/ s 2
}
2 ps 2
-2 -1 0 1 2
x
x x-µ (x-µ)2 (x-µ)2/σ2 -.5*(x-µ)2/σ2 exp(-.5*(x-µ)2/σ2)
-2 -2 4 4 -2 0.1353
-1 -1 1 1 -0.5 0.6065
0 0 0 0 0 1.0000
1 1 1 1 -0.5 0.6065
2 2 4 4 -2 0.1353
Normal Distribution
Probabilities
finding areas under the curve P(X<x) = ?
Symmetric
P(X<μ) = P(X>μ)
= 0.5
x
μ
Normal Distribution
Probabilities
finding areas under the curve P(X<x) = ?
≈ 100%
Normal Distribution
Probabilities
using known probabilities and symmetry we can use standard probability calculus
μ x
Normal Distribution
μ=0 1
x
Normal Distribution
What % of the total area will lie 2.3% 13.6% 34.1% 34.1% 13.6% 2.3%
between mean and 2 SD to the right
of the mean?
Normal Distribution
Properties
In general, it is not true that a linear transformation of a variable has the same distribution
For the normal distribution however
X~N(μ , σ2) V(Y) = V(a+bX)
we knew this
= b2V(X)
means that
a+bX ~ N(a+bμ , b2σ2)
Properties
X~N(μ , σ2)
Some examples…..
Application
• Researchers gather data from the population and calculate a sample mean with those
data; then they use the result to draw conclusions
• Why should we trust the mean? For one thing, if the researchers had interviewed
different people from the same population, they would have obtained a different result
• This points to the fact that the sample mean, as an instrument, before data are
available, has to have some desirable properties. So which properties are these?
Application
• Consider X, a random variable describing, for instance, income in the UK; and let FX (x),
fX (x) be the distribution and density of X respectively. Suppose you are going to
interview i = 1, 2, 3, . . . ,n people in this population, chosen at random, so that
responses are independent from each other.
• Before we interview people, their responses are unknown (uncertain). So the response
of person i is a random variable; say xi
• Which values might person i return? We don’t know for sure, but we know that in the
population, income is distributed FX (x).
• Since i is a member of the population, his/her response xi ought to mimic the behaviour
of X in the population, and the latter is summarised by FX (x)
Application
• We have X with distribution FX (x); let’s say that E(X) = μ and V(X) = 2. We will have a
sample from that population
X1,X2, . . . ,Xn, and with that future data we will compute,
σ𝑖 𝑥𝑖
𝑋ത =
𝑛
• Let’s start by noticing that the sample mean is a sum of random variables (scaled by a
constant, 1/n), so we can use the properties of expectations pertaining to sums of
random variables to work out the expected value of future data we will compute
ത 𝐸(𝑋)
𝑋: ത
ത will tell us what kind of information we should expect to receive from 𝑋ത (once we
• 𝐸(𝑋)
have data)
Application
We say that the sample mean, 𝑋ത , is unbiased for the population mean if 𝐸 𝑋ത = 𝜇
Application
• If we could draw a large number of samples b = 1, 2, 3,... of fixed size n from the
population using the same design, and we calculate the sample mean for each sample
𝑋𝑏 , b=1,2,3….
the average of these sample averages: and then we average all these sample
averages:
1
σ 𝑋 , this will equal E(X)
𝐵 𝑏 𝑏
• we expect that the sample mean will be a good proxy for E(X)
• Note that we have not bothered here about the actual shape of FX (x) -whether this was
normal, gamma, Poisson or something else
Application
1 1 1
• 𝑉𝑎𝑟 𝑋ത = 𝑉𝑎𝑟 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 = 𝑛𝜎 2 = 𝜎 2
𝑛2 𝑛2 𝑛
• The first conclusion to draw is then that the distributions of X and 𝑋ത cannot be the same
( the distribution of 𝑋ത is narrower) and as n increases that distribution becomes more
and more informative, i.e. centred around the mean
Application
• Regardless what the distribution of X is, for large n, if xi are independent and E(xi ) = μ,
V(xi ) = 𝜎 2 are finite, then:
𝑛 𝑋ത − 𝜇 ~𝑁(0, 𝜎 2 ) for large n (approximately)
Example:
Though experiment:
Imagine all 2.3M students in the UK (statistical population)
Central Limit Theorem
Though experiment:
But what if we don’t know their income?
Central Limit Theorem
4. The larger the sample size, the more precise the mean
standard error gets smaller
Reading