0% found this document useful (0 votes)
28 views16 pages

Stat709 19

The document discusses unbiased estimators and the method of moments. It introduces Horvitz-Thompson estimators which provide a general method to obtain unbiased estimators using inclusion probabilities. The estimators are unbiased if the inclusion probabilities are known. It also discusses constructing unbiased estimators of the variance using Horvitz-Thompson's idea. Finally, it examines simple random sampling without replacement as a special case.

Uploaded by

Michael Barasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views16 pages

Stat709 19

The document discusses unbiased estimators and the method of moments. It introduces Horvitz-Thompson estimators which provide a general method to obtain unbiased estimators using inclusion probabilities. The estimators are unbiased if the inclusion probabilities are known. It also discusses constructing unbiased estimators of the variance using Horvitz-Thompson's idea. Finally, it examines simple random sampling without replacement as a special case.

Uploaded by

Michael Barasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Lecture 19: Construction of unbiased or approximately

unbiased estimators and method of moments


Survey samples from a finite population
Let P = {1, ..., N} be a finite population of interest
For each i ∈ P, let yi be a value of interest associated with unit i
Let s = {i1 , ..., in } be a subset of distinct elements of P, which is a
sample selected with selection probability p(s), where p is known.
The value yi is observed if and only if i ∈ s.
Y = ∑N
j=1 yj is the unknown population total of interest.
Define
πi = probability that i ∈ s, i = 1, ..., N.
Horvitz-Thompson estimators
It gives a general method to obtain unbiased estimators.
All we need is the inclusion probability πi , which is known in
beamer-tu-logo
sample surveys since p(s) is known.
UW-Madison (Statistics) Stat 709 Lecture 19 2018 1 / 16
Theorem 3.15.
(i) (Horvitz-Thompson). If πi > 0 for i = 1, ..., N and πi is known when
i ∈ s, then Y
bht = ∑i∈s yi /πi is an unbiased estimator of the
population total Y .
(ii) Define
πij = probability that i ∈ s and j ∈ s, i = 1, ..., N, j = 1, ..., N.
Then
N N N π −π π
1 − πi 2 ij i j
Var(Y
bht ) = ∑ yi + 2 ∑ ∑ yi yj (1)
i=1
π i i=1 j=i+1
π π
i j
N N
yi yj 2
 
= ∑ ∑ (πi πj − πij ) − . (2)
i=1 j=i+1
πi πj

Horvitz-Thompson’s idea: inverse probability weighting


The unbiasedness of the sample mean under simple random
sampling without replacement is a special case of Theorem 3.15.
Extension: P is a sample of size N and yi is missing if i 6∈ s beamer-tu-logo

If πi is unknown, we need to replace it by an estimator.


UW-Madison (Statistics) Stat 709 Lecture 19 2018 2 / 16
Proof.
(i) Let ai = 1 if i ∈ s and ai = 0 if i 6∈ s, i = 1, ..., N.
Then E(ai ) = πi and
!
N N
a i y i
E(Ybht ) = E ∑ = ∑ yi = Y .
i=1
πi i=1

(ii) Since ai2 = ai ,


Var(ai ) = E(ai ) − [E(ai )]2 = πi (1 − πi ).
Cov(ai , aj ) = E(ai aj ) − E(ai )E(aj ) = πij − πi πj , i 6= j.
Then !
N
bht ) = Var ∑ ai yi
Var(Y
i=1
πi
N yi2 N N yy
i j
= ∑ π2 Var(ai ) + 2 ∑ ∑ πi πj Cov(ai , aj )
i=1 i i=1 j=i+1
N N N π −π π
1 − πi 2 ij i j
= ∑ yi + 2 ∑ ∑ yi yj . beamer-tu-logo

i=1
πi i=1 j=i+1
π π
i j
UW-Madison (Statistics) Stat 709 Lecture 19 2018 3 / 16
Proof (continued)
Hence (1) follows.
To show (2), note that
N
∑ πi = n and ∑ πij = (n − 1)πi ,
i=1 j=1,...,N,j6=i

which implies
∑ (πij − πi πj ) = (n − 1)πi − πi (n − πi ) = −πi (1 − πi ).
j=1,...,N,j6=i

Hence
N
1 − πi 2 N yi2
∑ yi = ∑ ∑ (πi πj − πij )
i=1
πi i=1 j=1,...,N,j6=i πi2
2
!
N N yi2 yj
= ∑ ∑ (πi πj − πij ) +
i=1 j=i+1 πi2 πj2

beamer-tu-logo
and, (2) follows from (1).
UW-Madison (Statistics) Stat 709 Lecture 19 2018 4 / 16
How do we get an unbiased estimator of Var(Y
bht )?
Using Horvitz-Thompson’s idea, the following estimators are unbiased:
1 − πi 2 πij − πi πj
v1 = ∑ 2
yi + 2 ∑ ∑ yi yj
i∈s πi i∈s j∈s,j>i
πi πj πij
πi πj − πij yi yj 2
 
v2 = ∑ ∑ − .
i∈s j∈s,j>i
πij πi πj

Simple random sampling without replacement


For simple random sampling without replacement,
N−1

n−1 n
πi = E(ai ) = P(ai = 1) = N
=
N

n
N−2

n−2 n(n − 1)
πij = E(ai aj ) = P(ai = 1, aj = 1) = N
=
N(N − 1)

n
N
bht = N ∑ yi = N ∑ ai yi = N(the sample mean)
Y beamer-tu-logo
n i∈s n i=1
UW-Madison (Statistics) Stat 709 Lecture 19 2018 5 / 16
n(n−1) n2
1 − Nn N(N−1) − N 2
N N N
Var(Y
bht ) = ∑ n yi2 + 2 ∑ ∑ n2
yi yj
i=1 N i=1 j=i+1 N2
N − n N 2 2(N − n) N N
= ∑ yi − n(N − 1) ∑ ∑ yi yj
n i=1 i=1 j=i+1
" #
N
N −n 1
=
n ∑ yi2 − N − 1 ∑ yi yj
i=1 i6=j
N 
Y 2

N −n N
=
n N − 1 i=1∑ yi − N
n  N2 1 N Y 2
  
= 1− ∑ yi − N
N n N − 1 i=1
 n  S2
= N2 1 −
N n
n
N is called the finite sample fraction and 1 − Nn is called the finite
sample correction. beamer-tu-logo

S 2 is called the population variance.


UW-Madison (Statistics) Stat 709 Lecture 19 2018 6 / 16
For simple random sampling without replacement, variance estimators
v1 and v2 are the same.
Note that
n 2 n(n−1) n2
1 N − Nn 2 1 N(N−1) − N 2
v1 = ∑ n n2
yi2 + 2 ∑ ∑ n(n−1) n2
yi yj
i∈s N N2 i∈s j∈s,j>i N(N−1) N2
N(N − n) N(N − n)
= 2 ∑ yi2 − 2 ∑ yi yj
n i∈s n (n − 1) i,j∈s,i6 =j
N(N − n) N(N − n)
= ∑ yi2 − n2 (n − 1) ∑ yi yj
n(n − 1) i∈s i,j∈s
 !2 
N(N − n)  1
= ∑ yi2 − ∑ yi 
n(n − 1) i∈s n i∈s
 n  s2
= N2 1 −
N n
beamer-tu-logo

UW-Madison (Statistics) Stat 709 Lecture 19 2018 7 / 16


where
1
s2 = ∑ (yi − ȳ )2
n − 1 i∈s
is called the sample variance.
Since E(v1 ) = Var(Ybht ), we have shown that E(s2 ) = S 2 .
Since s2 is symmetric in its arguments, the early result implies that s2
is a UMVUE of S 2 and v1 is a UMVUE of Var(Y bht ), under simple
random sampling without replacement.
To finish, we note that
πij − πi πj yi2
v2 = v1 + ∑
i,j∈s
πij πi2
!
1 1
= v1 + ∑ 2
− yi2
i,j∈s π i
πij

N2 N N(N − 1)
= v1 + ∑ yi2 − ∑ yi2 − ∑ yi2
n i∈s n i∈s n(n − 1) i,j∈s,j6 =i beamer-tu-logo

= v1
UW-Madison (Statistics) Stat 709 Lecture 19 2018 8 / 16
Deriving asymptotically unbiased estimators
An exactly unbiased estimator may not exist, or is hard to obtain.
We often derive asymptotically unbiased estimators.
Functions of sample means are popular estimators.
Functions of unbiased estimators
If the parameter to be estimated is ϑ = g(θ ) with a vector-valued
parameter θ and Un is a vector of unbiased estimators of components
of θ , then Tn = g(Un ) is often asymptotically unbiased for ϑ .
Note that E(Tn ) = Eh(Un ) may not exist.
Assume that g is differentiable and
cn (Un − θ ) →d Y .
Then, by Theorem 2.6,
amseTn (P) = E{[∇g(θ )]τ Y }2 /cn2
Hence, Tn has a good performance in terms of amse if Un is optimal in
terms of mse (such as the UMVUE or BLUE). beamer-tu-logo

UW-Madison (Statistics) Stat 709 Lecture 19 2018 9 / 16


Method of moments
The method of moments is the oldest method of deriving
asymptotically unbiased estimators, which may not be the best
estimators, but they are simple and can be used as initial estimators.
Consider a parametric problem where X1 , ..., Xn are i.i.d. random
variables from Pθ , θ ∈ Θ ⊂ R k , and E|X1 |k < ∞.
Let µj = EX1j be the jth moment of P and let

1 n j
bj =
µ ∑ Xi
n i=1

be the jth sample moment, which is an unbiased estimator of µj ,


j = 1, ..., k .
Typically,
µj = hj (θ ), j = 1, ..., k , (3)
for some functions hj on R k .
beamer-tu-logo

UW-Madison (Statistics) Stat 709 Lecture 19 2018 10 / 16


By substituting µj ’s on the left-hand side of (3) by the sample moments
bj , we obtain a moment estimator θb, i.e., θb satisfies
µ

bj = hj (θb),
µ j = 1, ..., k ,

which is a sample analogue of (3).


This method of deriving estimators is called the method of moments.
An important statistical principle, the substitution principle, is applied in
this method.
Let µb = (µ bk ) and h = (h1 , ..., hk ).
b1 , ..., µ
Then µ b = h(θb).
If the inverse function h−1 exists, then the unique moment estimator of
θ is θb = h−1 (µ
b ).
When h−1 does not exist (i.e., h is not one-to-one), any solution of
b = h(θb) is a moment estimator of θ .
µ
If possible, we always choose a solution θb in the parameter spacebeamer-tu-logo
Θ.

UW-Madison (Statistics) Stat 709 Lecture 19 2018 11 / 16


In some cases, however, a moment estimator does not exist (see
Exercise 111).
Moment estimators may not be unique.
We usually use moments with the lowest possible order.
Assume that θb = g(µ
b ) for a function g.
If h−1 exists, then g = h−1 .
If g is continuous at µ = (µ1 , ..., µk ), then θb is strongly consistent for θ ,
since µ bj →a.s. µj by the SLLN.
If g is differentiable at µ and E|X1 |2k < ∞, then θb is asymptotically
normal, by the CLT and Theorem 1.12, and
amseθb (θ ) = n−1 [∇g(µ)]τ Vµ ∇g(µ),
where Vµ is a k × k matrix whose (i, j)th element is µi+j − µi µj .
Furthermore, the n−1 order asymptotic bias of θb is
 
(2n)−1 tr ∇2 g(µ)Vµ . beamer-tu-logo

UW-Madison (Statistics) Stat 709 Lecture 19 2018 12 / 16


Example 3.24
Let X1 , ..., Xn be i.i.d. from a population Pθ indexed by the parameter
θ = (µ, σ 2 ), where µ = EX1 ∈ R and σ 2 = Var(X1 ) ∈ (0, ∞).
This includes cases such as the family of normal distributions, double
exponential distributions, or logistic distributions (Table 1.2, page 20).
Since EX1 = µ and EX12 = Var(X1 ) + (EX1 )2 = σ 2 + µ 2 , setting µ b1 = µ
b2 = σ 2 + µ 2 we obtain the moment estimator
and µ
! 
n 
1 2 n−1 2
θ = X̄ , ∑ (Xi − X̄ ) = X̄ ,
b S .
n i=1 n

n−1 2
Note that X̄ is unbiased, but n S is not.
If Xi is normal, then θb is sufficient and is nearly the same as an optimal
estimator such as the UMVUE.
On the other hand, if Xi is from a double exponential or logistic
distribution, then θb is not sufficient and can often be improved. beamer-tu-logo

UW-Madison (Statistics) Stat 709 Lecture 19 2018 13 / 16


Example 3.25
Let X1 , ..., Xn be i.i.d. from the uniform distribution on (θ1 , θ2 ),
−∞ < θ1 < θ2 < ∞.
Note that
EX1 = (θ1 + θ2 )/2 and EX12 = (θ12 + θ22 + θ1 θ2 )/3.
Setting µb1 = EX1 and µ b2 = EX12 and substituting θ1 in the second
equation by 2µ b1 − θ2 (the first equation), we obtain that
(2µb1 − θ2 )2 + θ22 + (2µ b1 − θ2 )θ2 = 3µb2 ,
which is the same as
(θ2 − µb1 )2 = 3(µ b2 − µb12 ).
Since θ2 > EX1 , we obtain that
q q
b1 + 3(µ
θb2 = µ b2 − µb12 ) = X̄ + 3(n−1)n S2
q q
θb1 = µb1 − 3(µ b2 − µb12 ) = X̄ − 3(n−1)n S2.
These estimators are not functions of the sufficient and complete
statistic (X(1) , X(n) ). beamer-tu-logo

UW-Madison (Statistics) Stat 709 Lecture 19 2018 14 / 16


Example 3.26
Let X1 , ..., Xn be i.i.d. from the binomial distribution Bi(p, k ) with
unknown parameters k ∈ {1, 2, ...} and p ∈ (0, 1).
Since
EX1 = kp
and
EX12 = kp(1 − p) + k 2 p2 ,
we obtain the moment estimators

p
b = (µ b12 − µ
b1 + µ b1 = 1 − n−1
b2 )/µ 2
n S /X̄

and
b12 /(µ
kb = µ b12 − µ
b1 + µ b2 ) = X̄ /(1 − n−1 2
n S /X̄ ).

The estimator pb is in the range of (0, 1).


But k may not be an integer.
b
It can be improved by an estimator that is kb rounded to the nearest
beamer-tu-logo
positive integer.
UW-Madison (Statistics) Stat 709 Lecture 19 2018 15 / 16
Nonparametric problems
Consider the estimation of the central moments
j  
j j
cj = E(X1 − µ1 ) = ∑ (−µ1 )t µj−t , j = 2, ..., k .
t=0
t
the moment estimator of cj is
j  
j 1 n
cj = ∑
b (−X̄ )t µ
bj−t = ∑ (Xi − X̄ )j , j = 2, ..., k ,
t=0
t n i=1
which are sample central moments, (µ b0 = 1).
From the SLLN, b
cj ’s are strongly consistent.
If E|X1 |2k < ∞, then
√ 
c2 − c2 , ..., b
n b ck − ck →d Nk −1 (0, D)

where the (i, j)th element of the (k − 1) × (k − 1) matrix D is

ci+j+2 − ci+1 cj+1 − (i + 1)ci cj+2 − (j + 1)ci+2 cj + (i + 1)(j + 1)ci cjbeamer-tu-logo


c2 .
UW-Madison (Statistics) Stat 709 Lecture 19 2018 16 / 16

You might also like