ParameterEstimation_slides
ParameterEstimation_slides
Nathaniel E. Helwig
2. Sampling Distribution
4. Quality of Estimators
5. Estimation Frameworks
Table of Contents
2. Sampling Distribution
4. Quality of Estimators
5. Estimation Frameworks
The functions F (·) and f (·) are typically assumed to depend on a finite
number of parameters, where a parameter θ = t(F ) is some function of
the probability distribution.
Table of Contents
2. Sampling Distribution
4. Quality of Estimators
5. Estimation Frameworks
iid iid
Assume that xi ∼ F for i = 1, . . . , n, where the notation ∼ denotes
that the xi are iid observations from the distribution F .
• x = (x1 , . . . , xn )> denotes the sample of data as an n × 1 vector
Table of Contents
2. Sampling Distribution
4. Quality of Estimators
5. Estimation Frameworks
iid
Given a sample of data x1 , . . . , xn where xi ∼ F , an estimate of a
parameter θ = t(F ) is some function of the sample θ̂ = g(x) that is
meant to approximate θ.
Table of Contents
2. Sampling Distribution
4. Quality of Estimators
5. Estimation Frameworks
Overview
Like statistics, not all estimators are created equal. Some estimators
produce “better” estimates of the intended population parameters.
Bias of an Estimator
The bias of an estimator refers to the difference between the expected
value of the estimate θ̂ = g(x) and the parameter θ = t(F ), i.e.,
Bias(θ̂) = E(θ̂) − θ
iid
Given a sample of data x1 , . . . , xn where xi ∼ F and F has mean
1 Pn
µ = E(X), the sample mean x̄ = n i=1 xi is an unbiased estimate of
the population mean µ.
iid
Given a sample of data x1 , . . . , xn where xi ∼ F and F has mean
µ = E(X)Pand variance σ 2 = E[(X − µ)2 ], the sample variance
1 n
s2 = n−1 2
i=1 (xi − x̄) is an unbiased estimate of σ .
2
1
Pn
which implies that E(s2 ) = 2 − nE(x̄2 ) .
n−1 i=1 E(xi )
This result can be used to show that s̃2 = n1 ni=1 (xi − x̄)2 is biased:
P
Variance of a Estimator
iid
Given a sample of data x1 , . . . , xn where xi ∼ F and F has mean
µ = E(X) and variance σ 2 = E[(X − µ)2 ], the sample mean
2
x̄ = n1 ni=1 xi has a variance of Var(x̄) = σn .
P
To prove that this is the variance of x̄, we can use the variance rules
from the Introduction to Random Variables chapter, i.e.,
n n
!
1X 1 X σ2
Var(x̄) = Var xi = 2 Var(xi ) =
n n n
i=1 i=1
where the first term is squared bias and the second term is variance.
Next, note that we can write the squared bias and variance as
2
Bias(θ̂)2 = E(θ̂) − θ = E(θ̂)2 − 2θE(θ̂) + θ2
Var(θ̂) = E(θ̂2 ) − E(θ̂)2
and adding these two terms together gives
Bias(θ̂)2 + Var(θ̂) = E(θ̂)2 − 2θE(θ̂) + θ2 + E(θ̂2 ) − E(θ̂)2
= E(θ̂2 ) − 2θE(θ̂) + θ2
which is the form of the MSE given on the previous slide.
Nathaniel E. Helwig (Minnesota) Parameter Estimation c August 30, 2020 23 / 40
Quality of Estimators
Consistency of an Estimator
iid
Given a sample of data x1 , . . . , xn with xi ∼ F , an estimator θ̂ = g(x)
p
of a parameter θ = t(F ) is said to be consistent if θ̂ →− θ as n → ∞.
p
The notation →
− should be read as “converges in probability to”, which
means that the probability that θ̂ 6= θ goes to zero as n gets large.
All of the estimators that we’ve discussed (i.e., x̄, s2 and s̃2 ) are
consistent estimators.
Efficiency of an Estimator
iid
Given a sample of data x1 , . . . , xn with xi ∼ F , an estimator θ̂ = g(x)
of a parameter θ = t(F ) is said to be efficient if it is the best possible
estimator for θ using some loss function.
The chosen loss function is often MSE, so the most efficient estimator
is the one with the smallest MSE compared to all other estimators of θ.
If you have two estimators θ̂1 = g1 (x) and θ̂2 = g2 (x), we would say
that θ̂1 is more efficient than θ̂2 if MSE(θ̂1 ) < MSE(θ̂2 ).
• If θ̂1 and θ̂2 are both unbiased, the most efficient estimator is the
one with the smallest variance
Table of Contents
2. Sampling Distribution
4. Quality of Estimators
5. Estimation Frameworks
Least squares estimation methods can work well for mean parameters
and regression coefficients, but will not work well for all parameters.
• Variance parameters are best estimated using other approahces
µj = E(X j ) = mj (θ1 , . . . , θp )
iid
Given data xi ∼ F for i = 1, . . . , n, the method of moments estimates
of the parameters are the values θ̂1 , . . . , θ̂p that solve the equations
iid
Suppose that xi ∼ N (µ, σ 2 ) for i = 1, . . . , n. The first two moments of
the normal distribution are µ1 = µ and µ2 = µ2 + σ 2 .
Solving the first equation gives b = 2µ1 − a and plugging this into the
second equation gives µ2 = 13 a2 − 2aµ1 + 4µ21 , which is a simple
quadratic function of a.
√ p 2
1 −p 3 µ2 − µ1 ,
Applying the quadratic formula (see here) gives a = µ√
2
and plugging this into b = 2µ1 − a produces b = µ1 + 3 µ2 − µ1 .
Using µ̂1 and µ̂2 in these equations gives the methods of moments
estimates of a and b.
iid
Suppose that xi ∼ N (µ, σ 2 ) for i = 1, . . . , n. Assuming that
X ∼ N (µ, σ 2 ), the probability density function can be written as
2 1 1 2
f (x|µ, σ ) = √ exp − 2 (x − µ)
2πσ 2 2σ
1 Pn
The MLE of µ is the sample mean, i.e., µ̂MLE = x̄ = n i=1 xi .
iid
Suppose that xi ∼ B[N, p] for i = 1, . . . , n. Assuming that
X ∼ B[N, p], the probability density function can be written as
N x N!
f (x|N, p) = p (1 − p)N −x = px (1 − p)N −x
x x!(N − x)!
(1 − p)nx̄ − pn (N − x̄) = 0 → x̄ − pN = 0
References