0% found this document useful (0 votes)
20 views18 pages

Paper 4 SiT Dec2009 Shu Tha Pat Raj

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views18 pages

Paper 4 SiT Dec2009 Shu Tha Pat Raj

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

STATISTICS IN TRANSITION-new series, December 2009 397

STATISTICS IN TRANSITION-new series, December 2009


Vol. 10, No. 3, pp. 397—414

ESTIMATION OF MEAN UNDER IMPUTATION


OF MISSING DATA USING FACTOR-TYPE ESTIMATOR
IN TWO-PHASE SAMPLING

Diwakar Shukla, Narendra Singh Thakur, Sharad Pathak1


and Dilip Singh Rajput2

ABSTRACT

In sample surveys, the problem of non-response is one of the most frequent and
widely appearing, whose solution is required to obtain using relevant statistical
techniques. The imputation is one such methodology, which uses available data
as a source for replacement of missing observations. Two-phase sampling is
useful when population parameter of auxiliary information is unknown. This
paper presents the use of imputation for dealing with non-responding units in the
setup of two-phase sampling. Two different two-phase sampling strategies (sub-
sample and independent sample) are compared under imputed data setup. Factor-
Type (F-T) estimators are used as tools of imputation and simulation study is
performed over multiple samples showing the comparative strength of one over
other. First imputation strategy is found better than second whereas second
sampling design is better than first.

Key words: Estimation, Missing data, Imputation, Bias, Mean squared error
(MSE), Factor Type (F-T) estimator, Two-phase sampling, Simple Random
Sampling Without Replacement (SRSWOR), Compromised Imputation (C. I.).

1. Introduction

Let Ω = {1, 2,..., N } be a finite population with Yi as a variable of main


N
interest and X i (i = 1,2,..., N ) an auxiliary variable. As usual, Y = N −1 Y
i =1
i ,

1
Diwakar Shukla, Narendra Singh Thakur, Sharad Pathak, Deptt. of Mathematics and Statistics,
H.S. Gour University of Sagar, Sagar,(M.P.) INDIA, Pin-470003.
e-mails:[email protected], [email protected], [email protected].
2
Dilip Singh Rajput, Govt. College, Rehli, Sagar (M.P.), INDIA.
398 D. Shukla, N. S. Thakur, S. Pathak, D. S. Rajput: Estimation of…

N
X = N −1  X i are population means, X is assumed known and Y under
i =1
investigation. Singh and Shukla (1987) proposed Factor Type (F-T) estimator to
obtain the estimate of population mean under setup of SRSWOR. Some other
contributions on Factor-Type estimator, in similar setup, are due to Singh and
Shukla (1991) and Singh et al. (1993).
With X unknown, the two-phase sampling is used to obtain the estimate of
population mean and Shukla (2002) suggested F-T estimator under this case. But
when few of observations are missing in the sample, the F-T estimator fails to
estimate. This paper undertakes the problem of Shukla (2002) with suggested
imputation procedures for missing observations.
Rubin (1976) addressed three missing observation concepts: missing at
random (MAR), observed at random (OAR) and parameter distribution (PD).
Heitjan and Basu (1996) explained the concept of missing at random (MAR) and
introduced the missing completely at random (MCAR). The present discussion is
on MCAR wherever the non-response is quoted. Rao and Sitter (1995) discussed
a new linearization variance estimator that makes more complete use of the
sample data than a standard one. They have shown its application to ‘mass’
imputation under two-phase sampling and deterministic imputation for missing
data. Singh and Horn (2000) suggested a Compromised Imputation (C-I)
procedure in which the estimator of mean obtained through C-I remains better
than obtained from ratio method of imputation and mean method of imputation.
Ahmed et al. (2006) designed several generalized structure of imputation
procedures and their corresponding estimators of the population mean. Motivation
is derived from these and from Shukla (2002) to extend the content for the
imputation setup.
Consider a preliminary large sample S ' of size n ' drawn from population Ω
by SRSWOR and a secondary sample S of size n (n < n ' ) drawn in either of the
following manners:
Case I: as a sub-sample from sample S ' (denoted by design F1) as in fig.
1(a),
Case II: independent to sample S ' (denote by design F2) as in fig. 1(b)
without replacing S ' .
STATISTICS IN TRANSITION-new series, December 2009 399

Population (N)
Population (N)

Y X Y X Y X Y X

S'
S ' S

R x
'
C n'
n' ' yr R
x xr
n,
S
x
R
y r xr RC

n, x

Fig. 1(a) [Case I, F1] Fig. 1(b) [Case II, F2]

Let sample S of n units contains r responding units (r < n) forming a subspace


C
R and (n – r) non-responding with sub-space R C in S = R  R . For every
i ∈ R , the y i is observed available. For i ∈ R C , the y i values are missing and
imputed values are computed. The ith value xi of auxiliary variate is used as a
source of imputation for missing data when i ∈ R C . Assume for S, the data

{ } are known with mean


n
x s = {xi : i ∈ S } and xi ' : i ' ∈ S ' x = (n ) x
−1
i and
i =1
n'
'
x = n( ) x
' −1
i'
respectively.
i ' =1
400 D. Shukla, N. S. Thakur, S. Pathak, D. S. Rajput: Estimation of…

2. F-T Imputation Strategies

Two proposed strategies d1 and d2 for missing data under both cases are :

 yi if i∈R

d1 : ( yd1 )i = 1   ( A + C )x ' + fB x   (2.1)
 (n − r ) n y r   − r y r  if i ∈ RC
   ( A + fB )x ' + C x  

 yi if i∈R

d2 : ( yd 2 )i = 1   ( A + C )x ' + fB x r   (2.2)
 (n − r ) n y r   − r y r  if i ∈ RC
   ( A + fB )x ' + C x r  

Under (2.1) and (2.2) point estimators of Y are :


 ( A + C )x ' + fB x 
(y ) = y r ; (2.3)
 ( A + fB )x + C x
d1 '


 ( A + C )x ' + fB x r 
(y ) = y r ; (2.4)
 ( A + fB )x ' + C x r 
d2

where A = (k − 1)(k − 2) ; B = (k − 1)(k − 4) ; C = (k − 2)(k − 3)(k − 4) and


k ( 0 < k < ∞) is a constant.

2.1. Some Special Cases

(i) At k = 1; A = 0; B = 0; C = – 6

(y ) = y  x  '
 (2.5)
d1 r

x 
 
(y ) = y  x
'
 (2.6)
d2 r

 xr 
(ii) At k = 2; A = 0; B = –2; C = 0

(y ) = y  x 
d1 r '
(2.7)
x 
STATISTICS IN TRANSITION-new series, December 2009 401

(y ) = y  x
d2 r


' 
r
(2.8)
 
x
(iii) At k = 3; A= 2; B = –2 ; C = 0

(y ) = y  x + f x 
'
(2.9)
 (1 − f )x 
d1 r '

(y ) = y  x − f x 
'
(2.10)
 (1 − f )x 
d2 r '

(iv) At k = 4; A = 6; B = 0; C = 0

(y ) = y
d1 r (2.11)

(y ) = y
d2 r (2.12)

3. Properties of Imputation Strategies

Let B(.)t and M(.)t denote the bias and mean squared error (M.S.E.) of
estimator under sampling design t = I, II (or F1, F2). Large sample approximations
are:

y r = Y (1 + e1 ) ; x r = X (1 + e2 ) ; x = X (1 + e3 ) and x ' = X (1 + e3' )


Using two-phase sampling, following Rao and Sitter (1995) and the
mechanism of MCAR, for given r, n and n ' , we write:
(i) Under design F1 [Case I ]:

( )
E (e1 ) = E (e2 ) = E (e3 ) = E e3' = 0 ; E (e12 ) = δ 1CY2 ; E (e22 ) = δ 1C X2 ;
( )
E e32 = δ 2 C X2 ; E e ( )= δ C
'2
3 3
2
x ; E (e1e2 ) = δ 1 ρ CY C X ;
E (e1e3 ) = δ 2 ρ CY C X ; E (e1e3' ) = δ 3 ρCY C X ; E (e2 e3 ) = δ 2 C X2 ;
( )
E e2 e3' = δ 3 C X2 ; E e3 e3' = δ 3 C X2 ; ( )
(ii) Under design F2 [Case II ]:

( )
E (e1 ) = E (e2 ) = E (e3 ) = E e3' = 0 ; E (e12 ) = δ 4 CY2 ; E (e22 ) = δ 4 C X2 ;
( )
E (e32 ) = δ 5 C X2 ; E e3' = δ 3C x2 ; E (e1e2 ) = δ 4 ρ CY C X ;
2
402 D. Shukla, N. S. Thakur, S. Pathak, D. S. Rajput: Estimation of…

E (e1e3 ) = δ 5 ρ CY C X ; E (e1e3' ) = 0 ; E (e2 e3 ) = δ 5 C X2 ; E (e2 e3' ) = 0 ;


( )
E e3 e3' = 0
where
1 1 1 1  1 1
δ1 =  − ' 
; δ2 =  − '  ; δ3 =  ' −  ;
r n  n n  n N 
1 1  1 1 
δ4 =  − ' 
; δ5 =  − ' 
r N −n  n N −n 
Remark 3.1: Let
A+C fB A + fB C
θ1 = ; θ2 = ; θ3 = ; θ4 = ;
A + fB + C A + fB + C A + fB + C A + fB + C
C
P = −(θ1 − θ 3 ) = (θ 2 − θ 4 ) ; (θ1θ 4 + θ 2θ 3 − 2θ 3θ 4 ) = P (θ 3 − θ 4 ) ; V = ρ Y .
CX

( ) ( )
Theorem 3.1: Estimators y d 1 and y d 2 , in terms of ei ; i = 1,2,3 and ei' ,
could be expressed :
[ ]
(i) y d 1 = Y 1 + e1 + P{e3 − e3' − θ 4 e32 + θ 3 e3'2 + e1e3 − e1e3' − (θ 3 − θ 4 )e3 e3' } (3.1)
[
(ii) y d 2 = Y 1 + e1 + P{e2 − e − θ e + θ e + e1e2 − e e − (θ 3 − θ 4
'
3
2
4 2
'2
3 3
'
1 3 )e e }]
'
3 3
(3.2)

[ ]
While ignoring terms E eir e sj , E eir e 'j [ ( ) ] for r + s > 2 , r, s = 0,1,2,... and
s

i, j = 1,2,3,... which is first order of approximation [see Cochran (2005)].

Proof:
 ( A + C )x ' + fB x 
(i) y d 1 = y r  
 ( A + fB )x + C x 
'

( )(
= Y (1 + e1 ) 1 + θ 1e3' + θ 2 e3 1 + θ 3 e3' + θ 4 e3 )
−1

[ {
= Y 1 + e1 + P e3 − e3' − θ 4 e32 + θ 3 e3'2 + e1e3 − e1e3' − (θ 3 − θ 4 )e3 e3' }]
 ( A + C )x ' + fB x r 
(ii) y d 2 = y r  
 ( A + fB )x + C x r 
'

( )(
= Y (1 + e1 ) 1 + θ 1e3' + θ 2 e2 1 + θ 3 e3' + θ 4 e2 )
−1

[ {
= Y 1 + e1 + P e2 − e3' − θ 4 e22 + θ 3 e3'2 + e1e2 − e1e3' − (θ 3 − θ 4 )e2 e3' }]
STATISTICS IN TRANSITION-new series, December 2009 403

Theorem 3.2: Biases of (y ) d1 t and y d 2 ( ) t


under t = I, II (or design F1 and
F2 ), up to first order of approximation are:
[ ] = −Y P(δ − δ )(θ C − ρC C )
(i) B y d 1 I 2 3 4
2
X Y X (3.3)

(ii) B[y ] = Y P[(θ δ − θ δ )C + δ ρC C ]


d 1 II 3 3 4 5
2
X 5 Y X (3.4)

(iii) B[y ] = −Y P(δ − δ )(θ C − ρC C )


d2 I 1 3 4
2
X Y X (3.5)

(iv) B[y ] = Y P[(θ δ − θ δ )C + δ ρC C ]


2
d 2 II 3 3 4 4 X 4 Y X (3.6)

Proof:
[ ]
(i) B y d 1 I
[
= E y d1 − Y ] I

[ {
= Y E 1 + e1 + P e3 − e3' − θ 4 e32 + θ 3 e3' 2 + e1e3 − e1e3' − (θ 3 − θ 4 )e3 e3' − 1 } ]
= −Y P(δ 2 − δ 3 )(θ C 4
2
X − ρCY C X )
[ ]
(ii) B y d 1 II
[
= E y d1 − Y ] II

[ {
= Y E 1 + e1 + P e3 − e3' − θ 4 e32 + θ 3 e3' 2 + e1e3 − e1e3' − (θ 3 − θ 4 )e3 e3' − 1 } ]
= Y P[(θ δ3 3 − θ 4δ 5 )C X2 + δ 5 ρCY C X ]
[ ]
(iii) B y d 2 I
= E yd2 − Y[ ]I

(
= −Y P(δ 1 − δ 3 ) θ 4 C − ρCY C X 2
X )
[ ]
(iv) B y d 2 II
[
= E yd2 − Y ] II

[
= Y P (θ 3δ 3 − θ 4δ 4 )C + δ 4 ρCY C X 2
X ]
Theorem 3.3: Mean squared errors of (y ) d1 t ( )
and y d 2 t
under design F1
and F2 , up to first order of approximation are:
[ ] = Y [δ C + (δ − δ )(P C + 2PρC C )]
(i) M y d 1 I
2
1
2
Y 2 3
2 2
X Y X (3.7)
(ii) M [y ] = −Y [δ C + (δ + δ )P C + 2 Pδ ρC C ]
2 2 2 2
d 1 II 4 Y 3 5 X 5 Y X (3.8)
(iii) M [y ] = Y [δ C + (δ − δ )(P C + 2 PρC C )]
2 2 2 2
d2 I 1 Y 1 3 X Y X (3.9)
(iv) M [y ] = Y [δ C + (δ + δ )P C + 2 Pδ ρC C ]
2 2 2 2
d 2 II 4 Y 3 4 X 4 Y X (3.10)

Proof:
[ ]
(i) M y d 1 I
[
= E y d1 − Y ]
2
I
404 D. Shukla, N. S. Thakur, S. Pathak, D. S. Rajput: Estimation of…

2
[
= Y E e1 + P e3 − e3' ( )] 2

[
= Y δ 1 CY2 + (δ 2 − δ 3 ) P 2 C X2 − 2 PρCY C X
2
( )]
(ii) M y d 1 [ ] II
= E y d1 − Y [ ]
2
II
2
[
= Y E e1 + P e3 − e ( '
3 )] 2

[
= Y δ 4 CY2 + (δ 3 + δ 5 )P 2 C X2 + 2 Pδ 5 ρCY C X
2
]
(iii) M y d 2 [ ] I
= E yd2 − Y [ ] 2
I
2
[
= Y E e1 + P e2 − e ( )] '
3
2

2
= Y δ 1 CY2[ + (δ − δ )(P C 1 3
2
X
2
− 2 PρCY C X )]
(iv) M y d 2 [ ] II
= E yd2 −Y[ ]
2
II

[
= Y δ 4 CY2 + (δ 3 + δ 4 )P 2 C X2 + 2 Pδ 4 ρCY C X
2
]
Theorem 3.4: Minimum mean squared errors of (y )
d1 t ( )
and y d 2 t
under
design F1 and F2 are :
[ ( ) ] = [δ − (δ − δ )ρ ]S when P = −V
(i) Min M y d1 I 1 2 3
2 2
Y (3.11)

Min [M (y )] = [δ − (δ + δ ) δ ρ ]S
−1 2 2 2
(ii) d1 II 4 3 5 5 Y

when P = −δ 5 V / (δ 3 + δ 5 ) (3.12)
[ ( )] = [δ − (δ − δ )ρ ]S when
(iii) Min M y d 2 I 1 1 3
2 2
Y P = −V (3.13)

Min[M (y )] = [δ − (δ + δ ) δ ρ ] S
−1 2 2 2
(iv) d2 II 4 3 4 4 Y

when P = −δ 4 V / (δ 3 + δ 4 ) (3.14)

Proof:
(i)
d
dP
[ ( ) ] = 0  P = − ρ CC = −V and using this in (3.7)
M y d1 I
Y

Min[M (y )] = [δ − (δ − δ )ρ ]S
d1 I 1 2 3
2 2
Y

(ii)
d
dP
[M (y ) ] = 0  P = −δ V (δ + δ ) and using this in (3.8)
d 1 II 5 3 5
−1

Min[M (y )] = [δ − (δ + δ ) δ ρ ]S
−1 2 2 2
d1 II 4 3 5 5 Y

(iii)
d
dP
[M (y ) ] = 0  P = − ρ
C
d2 I
C
= −V Y

X
STATISTICS IN TRANSITION-new series, December 2009 405

[ ( )] [
Min M y d 2 I = δ1 − (δ1 − δ 3 )ρ 2 SY2 ]
(iv)
d
dP
[ ( )]
M y d 2 II = 0  P = −δ 4V (δ 3 + δ 4 )
−1

Min[M (y )] = [δ − (δ + δ 4 ) δ 42 ρ 2 SY2 ]
−1
d2 II 4 3

Lemma 3.0 [By Shukla (2002)]:


F-T estimator in two-phase sampling (without imputation) is

 ( A + C )x ' + fB x 
(y ) = y  (3.15)
 ( A + fB )x + C x 
d w '

With optimum MSE conditions


under design F1 [Case I]: P = −V (3.16)
under design F2 [Case I]: P = −V (1 + δ )
−1
(3.17)
and optimum MSE expressions

[ ( ) ] = Y V [1 − ρ (1 − δ )]
opt M y d w I
2
20
2
(3.18)

opt [M (y ) ] = Y V [1 − ρ (1 + δ ) ]
2 2 −1
d w II 20 (3.19)
−1
1 1 1 1 
where δ =  '
−   −  ;
n N  n N 

Vi j = E y − Y [( ) (x − X ) ]
i j i j
Y X ; i = 0,1,2; j = 0,1,2

4. Appropriate Choice of k for Bias Reduction

[ ]
(i) B y d 1 I
= 0  P (θ 4 C X2 − ρCY C X ) = 0
[
If P = 0  (k − 4 ) k 2 − (5 + f )k + (6 + f ) = 0 ] (4.1)


and k = k1 = 4 

k = k2 =
1
2
[
(5 + f ) + f 2 + 6 f + 1 1 / 2 ( ) ] 
 (4.2)

1
[
k = k3 = (5 + f ) − f 2 + 6 f + 1
2
1/ 2
( ) ] 

406 D. Shukla, N. S. Thakur, S. Pathak, D. S. Rajput: Estimation of…

If θ 4 C X2 − ρCY C X = 0
 AV + VfB + (V − 1)C = 0 (4.3)
 (V − 1)k 3 − [(8 − f )V − 9]k 2 − [(23 + 5 f )V + 26]k − 2[(11 − 2 f )V − 12] = 0
(4.4)
[ ]
(ii) B y d 1 II
[
= 0  P (θ 3δ 3 − θ 4δ 5 )C + δ 5 ρCY C X
2
X ] =0
If P = 0 we have solution as per (4.2) and
if (θ 3δ 3 − θ 4δ 5 )C X + δ 5 ρCY = 0
 δ    δ  
 (V − 1)k 3 +  3 + V (1 + f ) − 9(V − 1) k 2 −  3 + V (3 + 5 f ) − 26(V − 1) k
 δ 5    δ 5  
 δ  
+ 2 3 + V (1 + 2 f ) − 12(V − 1) = 0 (4.5)
 δ 5  
[ ]
(iii) B y d 2 I
= 0 provides similar solution as in (i).
(iv) B[y ]
d 2 II [
= 0  P (θ 3δ 3 − θ 4δ 4 )C X2 + δ 4 ρCY C X ] =0
if P = 0 we have solution as per (4.2) and
if (θ 3δ 3 − θ 4δ 4 )C X2 + δ 4 ρCY C X = 0
 [( A + fB )δ 3 − Cδ 4 ] = −δ 4V ( A + fB + C )
 δ    δ  
 (V − 1)k 3 +  3 + V (1 + f ) − 9(V − 1) k 2 −  3 + V (3 + 5 f ) − 26(V − 1) k
 δ 4    δ 4  
 δ  
+ 2  3 + V (1 + 2 f ) − 12(V − 1) = 0 (4.6)
 δ 4  

5. Comparison of the Estimators

[ ( ) ]− min[M (y ) ] =  1r − N1  ρ S
(i) Δ 1 = min M y d 1 I d2 I
2 2
Y
 
(y ) is better than (y ) if Δ > 0  N > r which is always true.
d2 I d1 I 1

(ii) Δ = min [M (y ) ] − min [M (y ) ]


2 d 1 II d 2 II

 δ 42 δ 52  2 2
=  −  ρ S Y
 (δ 3 + δ 4 ) (δ 3 + δ 5 ) 
STATISTICS IN TRANSITION-new series, December 2009 407

(y )d 2 II is better than y d 1 ( ) II
if Δ 2 > 0

[ (
 (n − r ) N 3 − n ' n + n ' r + nr N + 2n ' nr > 0 ) ]
which generates two options as
(A) when (n − r ) > 0  n > r and

[ (
(B) N 3 − n ' n + n ' r + nr N + 2n' nr ) ] >0
if n ' ≈ N [i.e. n '
→N ]
[
then N N − (n − r )N + nr
2
] > 0 (since N > 0 always)
 ( N − n )( N − r ) > 0
 ( N − n ) > 0  N > n and N − r > 0  N > r
The ultimate is N > n > r , which is always true.
(iii) Δ 3 = min M y d 2 [ ( ) ]− min[M (y ) ]
I d 2 II

(δ 1 − δ 4 )(δ 3 + δ 4 ) + (δ + δ − δ 1δ 3 − δ 1δ 4 + δ 3δ 4 ) 2 2
2 2
= 4 3
ρ SY
(δ 3 + δ 4 )
(y )d 2 II ( )
is better than y d 2 I , if Δ 3 > 0
 1+ m 
 ρ2 >  where m =
 r N − n'  ( )
 ' 
1 + 2m   n (N − r )
1+ m 1+ m
 −1 < ρ < − or < ρ <1
1 + 2m 1 + 2m

6. Empirical Study

The attached appendix A has an artificial population of size N = 200 [see


Appendix A] containing values of main variable Y and auxiliary variable X.
Parameters of this are given in table 6.1.

Table 6.1. Population Parameters


CY
Y X S Y2 S X2 ρ CX CY V =ρ
CX
42.485 18.515 199.0598 48.5375 0.8652 0.3763 0.3321 0.7635
Under design-I, we draw a preliminary random sample S of size n ' = 110 '

to compute x ' and further draw a random sample S of size n = 50 such that
408 D. Shukla, N. S. Thakur, S. Pathak, D. S. Rajput: Estimation of…

S ⊂ S ' by SRSWOR. The V is a stable quantity over time and assumed to be


known [see Reddy (1978)].

Table 6.2.
Optimum Condition for
Design Three optimum values of k on one condition
MSE
P = −V k1 = 1.5206 k 2 = 2.4505 k 3 = 8.9456
I
P = −δ 5 V / (δ 3 + δ 5 ) k 4 = 1.5880 k5 = 2.8768 k 6 = 6.4279
P = −V k7 = 1.5206 = k1 k8 = 2.4505 = k 2 k9 = 8.9456 = k 3
II
P = −δ 4 V / (δ 3 + δ 4 ) k10 = 1.5645 k11 = 2.8572 k12 = 6.7221

7. Simulation

The bias and optimum m.s.e. of proposed estimators under both designs are
computed through 50,000 repeated samples n, n ' as per design. Computations are
in table 7.1 where efficiency loss measurement due to imputation is as

( )
LI t y s =
[ ( )]
Opt M y s t
with Opt M y s [ ( )] the optimum mean squared
Opt [M (y ) ]
t
d w

error of estimator y s ,
s = d , d1 , d 2 ; t = I , II , t = w = without imputation.
For design I and II the simulation procedure has following steps :
Step 1: Draw a random sample S ' of size n ' = 110 from the population of N
= 200 by SRSWOR.
Step 2: Draw a random sub-sample of size n = 50 from S ' for design I and
(
independent random sample n= 50 from N − n ' for design II. )
Step 3: Drop down 5 units randomly from each second sample corresponding
to Y in both I and II.
Step 4: Impute dropped units of Y by proposed methods and available
methods and compute the relevant statistic.
Step 5: Repeat the above steps 50,000 times, which provides multiple sample
( ) ( )
based estimates ( yˆ1s )t , ( yˆ 2 s )t , ( yˆ 3 s )t ....( yˆ 50000 s )t for estimators y d1 t , y d 2 t and

(y ) .
d w

Step 6: Bias of ( ŷs )t is B ( yˆ s )t =


1 50000
50000 i =1
[
 ( yˆ is )t − Y ]
STATISTICS IN TRANSITION-new series, December 2009 409

Step 7: M.S.E. of ( ŷ s )t is M ( yˆ s )t =
1 50000
[
 ( yˆ is )t − Y
50000 i =1
]
Step 8: The efficiency comparisons are

Design efficiency E1 =
( ) ×100 ; M y d1 I

M (y ) d 1 II

M (y )
Design efficiency E = × 100 d2 I

M (y )
2
d 2 II

M (y )
Estimator efficiency E = × 100 ; d1 I

M (y )
3
d2 I

M (y )
Estimator efficiency E = × 100 d 1 II

M (y )
4
d 2 II

Table 7.1. Bias and Mean Squared Error


Design F1 Design F2

Opt (k) (y ) d1 k
i
(y ) d2 k
i
(y )
d1 k
i
(y ) d2 k
i

Bias MSE Bias MSE Bias MSE Bias MSE


k1 = 1.3813 -0.1314 2.5947 -0.2855 2.1727 0.3665 2.7997 0.3680 3.0111

k 2 = 2.7576 -0.1780 2.6282 -0.3339 2.2192 0.1485 2.7801 0.1287 2.9542

k3 = 9.9538 -0.1350 2.5968 -0.2892 2.1758 0.3498 2.7911 0.3497 2.9982

k 4 = 1.5880 -0.0895 2.2549 -0.2892 2.1758 0.4131 3.7278 0.3680 3.0110

k5 = 2.8768 -0.0855 2.2706 -0.2268 1.9645 0.2664 3.5517 0.2499 3.4296

k6 = 6.4279 -0.0951 2.2580 -0.2114 1.9333 0.3832 3.7016 0.3778 3.5901

k7 = 1.3813 -0.1314 2.5947 -0.2855 2.1727 0.3665 2.7997 0.3680 3.0111

k8 = 2.7576 -0.1780 2.6282 -0.3339 2.2192 0.1485 2.7801 0.1287 2.9542

k9 = 9.9538 -0.1350 2.5968 -0.2892 2.1758 0.3498 2.7911 0.3497 2.9982

k10 = 1.5645 -0.0953 2.2743 -0.2082 1.9476 0.4044 3.4592 0.4024 3.3839

k11 = 2.8572 -0.1287 2.2961 -0.2429 1.9705 0.2473 3.2895 0.2301 3.1970

k12 = 6.7221 -0.1015 2.7780 -0.2146 1.9515 0.3756 3.4273 0.3707 3.3472
410 D. Shukla, N. S. Thakur, S. Pathak, D. S. Rajput: Estimation of…

Table 7.2. Estimator without Imputation [Lemma 3.0]


Optimum k - values
k1 k2 k3 Optimum MSE
Case - I 1.3813 2.7576 9.9538 1.9021
Case - II 1.5311 2.8337 7.1604 2.7702

Table 7.3. Loss due to Imputation LI t [ ]


Design F1 Design F2
Opt (k)
( )
LI I y d1 k
( )
LI I y d 2 k
( )
LI II y d1 k
( )
LI II y d 2 k

k1 = 1.3813 1.364123863 1.142263814 1.01064905 1.08696123


k 2 = 2.7576 1.381735976 1.166710478 1.00357375 1.0664212

k3 = 9.9538 1.365227906 1.143893591 1.00754458 1.08230453

k 4 = 1.5880 1.185479207 1.143893591 1.34567901 1.08692513

k5 = 2.8768 1.193733242 1.032805846 1.28210959 1.23803335


k6 = 6.4279 1.187108985 1.016402923 1.33622121 1.29597141

k7 = 1.3813 1.364123863 1.142263814 1.01064905 1.08696123


k8 = 2.7576 1.381735976 1.166710478 1.00357375 1.0664212

k9 = 9.9538 1.365227906 1.143893591 1.00754458 1.08230453

k10 = 1.5645 1.195678461 1.023920929 1.2487185 1.22153635


k11 = 2.8572 1.207139477 1.035960254 1.18745939 1.1540683

k12 = 6.7221 1.460491036 1.025971295 1.23720309 1.20828821


STATISTICS IN TRANSITION-new series, December 2009 411

Table 7.4. Efficiency Comparisons E1 , E2 , E3 , E4

Opt (k) E1 E2 E3 E4
k1 = 1.3813 92.67% 72.15% 119.42% 92.97%

k 2 = 2.7576 94.53% 75.12% 118.43% 94.10%

k3 = 9.9538 93.03% 72.57% 119.34% 93.09%

k 4 = 1.5880 60.48% 72.26% 103.63% 123.80%

k5 = 2.8768 63.92% 57.28% 115.58% 103.56%

k6 = 6.4279 61.00% 53.85% 116.79% 103.10%

k7 = 1.3813 92.67% 72.15% 119.42% 92.97%

k8 = 2.7576 94.53% 75.12% 118.43% 94.10%

k9 = 9.9538 93.03% 72.57% 119.34% 93.09%

k10 = 1.5645 65.74% 54.24% 116.77% 102.22%

k11 = 2.8572 69.80% 61.63% 116.52% 102.89%

k12 = 6.7221 81.05% 58.30% 142.35% 102.39%

8. Almost Unbiased Imputation Methods

Using equations of section 4.0 we get


From (i) k1' = 4 ; k 2' = 3.4253 ; k 3' = 1.8246 ; k 4' = 0.1812 .
From (ii) k1' = 4 ; k 2' = 3.4253 ; k 3' = 1.8246 ; k 4' = 1.4236 ; k 5' = 2.4469 ;
k 6' = 10.7864 . From (iii) similar to (i).
From (iv) k1' = 4 ; k 2' = 3.4253 ; k 3' = 1.8246 ; k 4' = 1.4339 ; k 5' = 2.4488 ;
k 6' = 10.5426
Using these k-values we can make proposed F-T imputation strategies almost
unbiased. The best among them will be that having the lowest m.s.e. By this we
have option to choose almost unbiased estimator with a control over mean
squared error.
412 D. Shukla, N. S. Thakur, S. Pathak, D. S. Rajput: Estimation of…

9. Discussion and Conclusion

The proposed estimators are found useful for situation when some
observations are missing in the sample. As per table 7.3 for y d1 and y d 2 under
design F1, the efficient performance of both is found when k = 1.5880, 1.5645,
6.4279 and 6.7221. On these specific choices the loss of efficiency with respect to
without imputation is very low. Similarly, for y d1 and y d 2 under design F2, the
efficient performance observed at k = 1.3813, 2.7576, 9.9538. It seems even by
adopting imputation, the suggested estimators are loosing a little in terms of
relative m.s.e. to the without imputation usual F-T estimator.
While mutual comparisons are in table 7.4, the design F2 is uniformly
efficient as F1 at all the optimum k-values, over both suggested F-T strategies.
Within F1, the estimator y d 2 is more efficient than y d1 whereas within F2 it does
not hold uniformly for all k-optimals. The y d 2 under F2 found better when k =
1.2, 2.8, 6.4 and 6.7. One can get almost unbiased estimators also on choices k =
0.1812, 1.4236, 1.4339, 1.8246, 2.4469, 2.4488, 3.4253, 4, 10.5426, 10.7864. The
most suitable will be that which has the lowest m.s.e. So these suggested
strategies are almost unbiased with a reducing control over m.s.e. also.
STATISTICS IN TRANSITION-new series, December 2009 413

Appendix A

Population (N = 200)
Yi 45 50 39 60 42 38 28 42 38 35
Xi 15 20 23 35 18 12 8 15 17 13
Yi 40 55 45 36 40 58 56 62 58 46
Xi 29 35 20 14 18 25 28 21 19 18
Yi 36 43 68 70 50 56 45 32 30 38
Xi 15 20 38 42 23 25 18 11 09 17
Yi 35 41 45 65 30 28 32 38 61 58
Xi 13 15 18 25 09 08 11 13 23 21
Yi 65 62 68 85 40 32 60 57 47 55
Xi 27 25 30 45 15 12 22 19 17 21
Yi 67 70 60 40 35 30 25 38 23 55
Xi 25 30 27 21 15 17 09 15 11 21
Yi 50 69 53 55 71 74 55 39 43 45
Xi 15 23 29 30 33 31 17 14 17 19
Yi 61 72 65 39 43 57 37 71 71 70
Xi 25 31 30 19 21 23 15 30 32 29
Yi 73 63 67 47 53 51 54 57 59 39
Xi 28 23 23 17 19 17 18 21 23 20
Yi 23 25 35 30 38 60 60 40 47 30
Xi 07 09 15 11 13 25 27 15 17 11
Yi 57 54 60 51 26 32 30 45 55 54
Xi 31 23 25 17 09 11 13 19 25 27
Yi 33 33 20 25 28 40 33 38 41 33
Xi 13 11 07 09 13 15 13 17 15 13
Yi 30 35 20 18 20 27 23 42 37 45
Xi 11 15 08 07 09 13 12 25 21 22
Yi 37 37 37 34 41 35 39 45 24 27
Xi 15 16 17 13 20 15 21 25 11 13
Yi 23 20 26 26 40 56 41 47 43 33
Xi 09 08 11 12 15 25 15 25 21 15
Yi 37 27 21 23 24 21 39 33 25 35
Xi 17 13 11 11 09 08 15 17 11 19
Yi 45 40 31 20 40 50 45 35 30 35
Xi 21 23 15 11 20 25 23 17 16 18
Yi 32 27 30 33 31 47 43 35 30 40
Xi 15 13 14 17 15 25 23 17 16 19
Yi 35 35 46 39 35 30 31 53 63 41
Xi 19 19 23 15 17 13 19 25 35 21
Yi 52 43 39 37 20 23 35 39 45 37
Xi 25 19 18 17 11 09 15 17 19 19
414 D. Shukla, N. S. Thakur, S. Pathak, D. S. Rajput: Estimation of…

REFERENCES

AHMED, M. S., AL-TITI, O., AL-RAWI, Z. and ABU-DAYYEH, W. (2006):


Estimation of a population mean using different imputation methods,
Statistics in Transition, 7, 6, 1247–1264.
COCHRAN, W. G. (2005): Sampling Techniques, John Wiley and Sons, Fifth
Edition New York.
HEITJAN, D. F. and BASU, S. (1996): Distinguishing ‘Missing at random’ and
‘missing completely at random’, The American Statistician, 50, 207–213.
RAO, J. N. K. and SITTER, R. R. (1995): Variance estimation under two-phase
sampling with application to imputation for missing data, Biometrica, 82,
453–460.
RUBIN, D. B. (1976): Inference and missing data, Biometrica, 63, 581–593.
SHUKLA, D. (2002): F-T estimator under two-phase sampling, Metron, 59, 1–2,
253–263.
SHUKLA, D., SINGH, V. K. and SINGH, G. N. (1991): On the use of
transformation in factor type estimator, Metron, 49 (1–4), 359–361.
SINGH, S. and HORN, S. (2000): Compromised imputation in survey sampling,
Metrika, 51, 266–276.
SINGH, V. K. and SHUKLA, D. (1987): One parameter family of factor type
ratio estimator, Metron, 45, 1–2, 273–283.
SINGH, V. K. and SHUKLA, D. (1993): An efficient one parameter family of
factor - type estimator in sample survey, Metron, 51, 1–2, 139–159.
SINGH, V. K. and SINGH, G. N. (1991): Chain type estimator with two auxiliary
variables under double sampling scheme, Metron, 49, 279–289.
REDDY, V. N. (1978): A study on the use of prior knowledge on certain
population parameters in estimation, Sankhya, C, 40, 29–37.

You might also like