0% found this document useful (0 votes)
40 views

Chapter 5 - 2010

sampling

Uploaded by

Admasu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Chapter 5 - 2010

sampling

Uploaded by

Admasu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

CHAPTER 5: RATIO ESTIMATORS

5.1 Estimation of a Ratio Under Simple Random Sampling

Suppose the units of the population possess two characteristics that are correlated to each other.
Let two variates Xi and Yi, i = 1, 2, - - -, N, represent the two characteristics. Assume that Y
represents the study variable and X represents an auxiliary variable. The ratio of these two
variables is the simplest and most commonly used method of the complex estimation techniques
for improving the precision or reliability. Consider the following examples in which the variables
are somehow related:
 Percentage of food expenditure to that of total income;
 The use of fertilizer per hectare;
 Gross enrolment ratio;

The interest is to estimate the ratio (R), the mean and the total of study variable. For each unit in
the population, there are values for X and values for Y: Y = {Y1, Y2, - - -, YN} and X = {X1, X2,
- - -, XN}.
N

Y
i 1
i
Y y
Then the ratio can be defined as: R  N
 
X x
X
i 1
i

For each unit in the sample of size n elements selected by SRS there are sample values:
y = {y1, y2, - - -, yn} and x = {x1, x2, - - -, xn}. The sample estimate for the ratio is
n

y i
y y
Rˆ  i 1
n
 
x x
x
i 1
i

The estimated ratio, R̂ , is not necessarily an unbiased estimate of a population ratio, R, that is,
E( R̂ )  R. In most situations the bias is small, and estimated ratios are widely used. For large
samples, mostly if n  30, the ratio estimate is a consistent estimate, which tends to normality
with negligible bias.

y
For ratio estimate, Rˆ  , both the numerator ( y ) and the denominator ( x ) are subject to
x
sampling variability. Therefore, an exact expression for its standard error is complicated, but in
large samples its approximation is valid.

Theorem 5.1: If variates yi and xi are measured on each unit of a simple random sample of size
y
n, assumed large, the MSE and variance of Rˆ  are each approximately
x

1
N

(y i  R xi ) 2
1  f i 1 y
MSE ( Rˆ )  V ( Rˆ )  , where R  is the ratio of the population means and
nX 2 N 1 x
f = n / N. Prove this theorem.
If we have a sample estimate R̂ from n sample observations, then the variance of the estimated
n

(y i  Rˆ xi ) 2
1 f i 1 1 f 2
ratio is given by: v( R̂ ) =  sd ,
nX 2 n 1 nX 2
n

(y i  Rˆ xi ) 2
where s d2  i 1
. If X is unknown, substitute it by its sample estimate, x .
n 1
n

(y i  Rˆ xi ) 2
1 f i 1 1 f 2 1  f sd
The standard error of R̂ is s. e. ( R̂ ) =  s d 
nX 2 n 1 nX 2 n X
Example: (Cochran page 33)
A simple random sample of size n = 33 low-income families was considered. Let x1i represent
the size of the family, yi denotes weekly expenditure on food per family, and x2i represent
weekly family income. The following data were available on these variables.
n n n n

 x1i  123
i 1
 x2i  2394
i 1
 yi  907.2
i 1
x i 1
1i y i  3595.5
n n n n
2 2 2
x
i 1
1i  533 x
i 1
2i  177254 y
i 1
i  28224 x i 1
2i y i  66678

Estimate: (a) The mean weekly expenditure on food per family.


(b) The mean weekly expenditure on food per person.
(c) The percentage of the income that is spent on food.
In each case compute the standard error of the estimate and ignore the fpc.
n

y
i 1
i
907.2
Solution: a) mean weekly expenditure on food per family: y   = $27.49
n 33
n

 y   y  2 2
n
1 f 2 s 2y i
i i
28224  (907.2) 2 33
v( y ) = sy   = = 3.11
n n n(n  1) 33(32)
 s.e( y ) = 1.76
b) Let R̂1 = mean weekly expenditure on food per person.
n

y
i 1
i
907.2
R̂1 = n
  $7.38
123
x i 1
1i

2
n

x
i 1
1i
123
Average person per family is x1    3.7273 . Then the variance of R̂1 is
n 33

2 (y i  Rˆ1 x1i ) 2 y 2


i  Rˆ12  x12i  2 Rˆ1  y i x1i
1 f 2 s d i i i
v( R̂1 ) = 2
s d   2 2
 2
nx1 nx 1 x n(n  1)
1 x n(n  1)
1
2
28224  (7.38) (533)  2(7.38)(3595.5)
v( R̂1 ) =  0.2852  s.e.( R̂1 ) = 0.534
(3.7273) 2 x33 x32
c) Let R̂ 2 = percentage of income spent on food.
n n

y
i 1
i
907.2
x
i 1
2i
2394
R̂ 2 = n
 x100%  37.9% , x2    72.5455
2394 n 33
x
i 1
2i

2 (y i  Rˆ 2 x 2i ) 2 y 2
i  Rˆ 22  x 22i  2 Rˆ 2  y i x 2i
1 f 2 s d i i i
v( R̂ 2 ) = 2
s d   2

nx 2 nx 2 x 22 n(n  1) x 22 n(n  1)
28224  (0.379) 2 (177254)  2(0.379)(66678)
v( R̂ 2 ) =  0.00056554
(72.5455) 2 x33 x32
 s.e.( R̂ 2 ) = 0.02378

5.2 The Ratio Estimator and Its variance

Assuming simple random sampling and considering the advantage of the correlation between Xi
and Yi, we can estimate the population total of the Yi, (Y) and the mean (  y ) using the ratio
estimate as stated in the following theorem.

Theorem 5.2: The ratio estimate of the population total Y and the population mean  y are,
y y
respectively, YˆR  X , YˆR   y assuming X is known. In simple random sample of size
x x
n, with large n, the variances of ratio estimates of population total and mean are given by:
N

(y i  R xi ) 2
1 f 1 f 2
V (YˆR )  N 2 i 1
 N 2 Sd ,
n N 1 n
N

(y i  R xi ) 2
1 f 1 f 2
V (YˆR )  i 1
 Sd (Prove both variances)
n N 1 n
1 f 2
Corollary: i) V (YˆR )  N 2 ( S y  R 2 S x2  2 R  S y S x )
n

3
2
ˆ 1  f 2  S y S x2 2 S xy 
ii) V (YR )  Y  
n Y 2 X 2 Y X 
 
1 f 2
iii) V (YˆR )  Y (CV y2  CV x2  2 CV xy )
n
Where CVy and CVx are coefficients of variation of yi and xi respectively, and CVxy is the
relative covariance.
iv) The coefficient of variation of Yˆ , Yˆ , and R̂ are the same for the three estimates, i.e.,
R R

1  f Sd
CV( YˆR ) = CV( YˆR ) = CV( R̂ ) = (Verify i-iv)
n Rˆ X

5.3 Estimation of the Variance from a Sample

In the above expressions the population parameters S y2 , S x2 , S xy , and R are usually unknown.
Therefore, we substitute these parameters by s 2y , s x2 , s xy , and R̂ , respectively to get the
following variances of the estimates.
n

(y i  Rˆ xi ) 2
1 f 1 f 2 1 f 2 ˆ 2 2
i) v(YˆR )  N 2 i 1
 N 2 sd  N 2 ( s y  R s x  2 Rˆ s xy )
n n 1 n n
n

(y i  Rˆ xi ) 2
1 f 1 f 2 ˆ 2 2
ii) v(YˆR )  i 1
 ( s y  R s x  2 Rˆ s xy )
n n 1 n
n

(y i  Rˆ xi ) 2
1 f 1 f 2 ˆ 2 2
iii) v( Rˆ )  i 1
 (s y  R s x  2 Rˆ s xy )
nX2 n 1 nX2
If X is unknown, we substitute it by the sample mean x .

Confidence Limits

A 100(1-)% confidence interval may be constructed for large samples to apply normal
distribution. For total, Y: Y = YˆR  Z  s.e.(YˆR ) , For Mean,  y :  y  YˆR  Z  s.e.(YˆR ) ,
2 2

For ratio, R: R  Rˆ  Z  s.e.( Rˆ )


2
Example: A company wants to estimate the average amount of money paid to employees for
medical expenses during the first three months of the current calendar year. A random sample of
100 employees’ records is taken from the population of 1000 employees and the medical
expenses in the first three months of the last year (xi) and this year (yi) recorded. From last year’s
balance sheet it was estimated that the total medical expenses in the first three months of last
year were $12,500.
From the sample of employees the following result was found.

4
100 100 100 100 100

 xi = 1200,
i 1
 xi2 = 15,620,
i 1
 yi = 1750,
i 1
 yi2 = 31,650,
i 1
x y
i 1
i i = 22059.35

a) Find the point estimate for the ratio.


b) Find the estimated mean expenditure for medical expenses per employee.
c) Find the standard errors of both estimates.
d) Find 95% confidence interval for the population mean.
n

y i
1750 y 12500
Solution: a) Rˆ  i 1
n
= = 1.4583, b) YˆR  X = R̂ X = 1.4583 ( ) = 18.23
1200 x 1000
x
i 1
i

c) To find standard error of the estimates, the following sample statistics must be calculated.
n n

 xi2  ( xi ) 2 n 15620  (1200) 2 / 100


s x2 = i 1 i 1
= = 12.32,
n 1 100  1
n n

 yi2  ( yi ) 2 n 31650  (1750) 2 / 100


s 2y = i 1 i 1
= = 10.35
n 1 100  1
n n n

 x i y i  ( x i ) (  y i ) n
i 1 i 1 22059.35  (1200) (1750) / 100
i 1
s yx = = = 10.7005
n 1 100  1
1 f 2 ˆ 2 2 1  0. 1
v( Rˆ )  2
( s y  R s x  2 Rˆ s xy ) = (10.35  (1.4583) 2 (12.32)  2(1.4583) (10.7005))
nX 100 (12.5) 2
v(Rˆ )  0.00033382  s.e( R̂ ) = 0.01827
1 f 2 ˆ 2 2 1  0. 1
v(YˆR )  ( s y  R s x  2 Rˆ s xy ) = (10.35  (1.4583) 2 (12.32)  2(1.4583) (10.7005)) =
n 100
0.04807
 s.e( Yˆ ) = 0.21925
R

d) 95% confidence limits for the population mean


Y = Yˆ  Z s.e( Yˆ ) = 18.23  1.96 x 0.21925 = (17.8, 18.7)
R  R
2
5.4 Bias of the Ratio Estimate

There are different methods of examining the bias, but here we treat only one method which
gives an exact result for the bias and an upper bound to the ratio of the bias to the standard error.
Consider the relationship between R̂ and x . That is, to consider the covariance and correlation of
R̂ and x in simple random sample of size n. Assume that E( x ) =  x and E( y ) =  y .
By definition, Cov( R̂ , x ) = E R̂ -E( R̂ )  x -E( x ) =   x E( R̂ -R) =   x B Rˆ (verify).

5
Cov ( Rˆ , x )   x BRˆ
Then the correlation between R̂ and x is given by:  Rˆ x  
S Rˆ S x S Rˆ S x
B Rˆ
 BRˆ   Rˆ x S Rˆ CV ( x )   CV ( x ) , since  Rˆ x  1 .
S Rˆ
It shows that the magnitude of the bias in R̂ as a ratio of its standard error cannot exceed the
coefficient of variation of x . If R̂ and x are uncorrelated, the bias vanishes. If CV( x ) < 10%,
the bias can be ignored.
The bias in Yˆ and Yˆ can be obtained in a similar way.
R R

Comparison of Ratio Estimate With the Mean Per Unit

Theorem 5.3: In large samples, with simple random sample, the ratio estimate YˆR has a smaller
variance than the estimate Yˆ  N y obtained by simple expansion, if
1 S x X  1 CV ( x)
  , where  is the correlation between Yi and Xi.
2 S y Y  2 CV ( y )
N 2 (1  f ) 2 N 2 (1  f ) 2
Proof: V( Yˆ ) = Sy V( YˆR ) = ( S y  R 2 S x2  2 R S y S x )
n n
If the V( YˆR ) < V( Yˆ ), then V( Yˆ )  V( YˆR ) > 0
N 2 (1  f ) 2 N 2 (1  f ) 2
 Sy  ( S y  R 2 S x2  2 R S y S x > 0
n n
1 R 2 S x2 Y
 S y2  S y2  R 2 S x2  2 R S y S x > 0    , assuming R  is positive.
2 RS x S y X
1 RS x 1 S x X  1 CV ( x)
    
2 Sy 2 S y Y  2 CV ( y )
Therefore, if the difference between the two variances is greater than zero, i.e., V( Yˆ )  V( YˆR ) >
0, then a ratio estimate is more efficient. If the difference is zero, then both estimates are equally
efficient. If the difference is less than zero, the ratio estimate is not as efficient as the estimate
from simple expansion.

Ratio Estimate in Stratified Random Sampling

In stratified random sampling design, there are two methods for estimating ratios: the separate
ratio estimate and the combined ratio estimate.
y y
The separate ratio estimate: For stratum h: YˆRh = h X h  h X h = R̂ h Xh , and its variance will
xh xh
ˆ N h2 (1  f h ) 2
be V( YRS ) = ( S yh  Rh2 S xh2  2 R h  h S yh S xh ) , where yh and xh are sample totals in
nh
th
the h stratum, and Xh is the population stratum total and it should be known.

6
For the overall total, the separate ratio estimate is represented by YˆRS and is given as:
L L
yh yh
YˆRS = x Xh   X h , with the variance given in the following theorem.
h 1 h h 1 xh

Theorem 4.4: If an independent simple random sample is drawn in each stratum and sample
sizes are large in all strata, then the variance of YˆRS is
N h2 (1  f h ) 2
L
V( YˆRS ) = (S yh  Rh2 S xh2  2 Rh  h S yh S xh ) , where Rh and h are the true ratio and
h 1 nh
correlation in stratum h respectively. Prove this theorem.

The combined Ratio Estimate: In stratified sample, estimate of the population total Y is
Yˆst = N y st = N  Wh y h   N h y h . For population total X, its estimate is
X̂ st = N x st = N  Wh x h   N h x h .
The combined ratio estimate, YˆRc , is given as YˆRc = ( Yˆst / X̂ st ) X = ( y st / x st ) X, where y st and
x st are the estimated population means from a stratified sample, and X is known.
Theorem 4.5: If the total sample size n is large, the variance of Yˆ is given by: Rc
L 2
W (1  f h ) 2
V( YˆRc ) =  h
( S yh  R 2 S xh2  2 R h S yh S xh ) . Prove this theorem.
h 1 nh
ˆ
L
N h2 (1  f h ) 2
Corollary: V( YRc )   (S yh  R 2 S xh2  2 R h S yh S xh ) .
h 1 nh

Corollary: Separate ratio estimate, R̂s , is given by R̂s = RS 


Yˆ  YˆRh and its variance is
X X
V (YˆRS )  V (YˆRh ) 1 (1  f h ) 2
V( R̂s ) =
X 2

X 2
 2  N h2
X nh

S yh  Rh2 S xh2  2 Rh  h S x h S yh 
1 (1  f h ) 2
V( R̂s )  2 Wh2
X nh

S yh  Rh2 S xh2  2 Rh  h S x h S yh 
y y  X  Yˆ
Corollary: The combined ratio estimate for R is R̂c = st  st    Rc and its variance is V(
x st x st  X  X
V (YˆRc ) 1 (1  f h ) 2
R̂c ) =
X 2
 2 Wh2
X nh

S yh  R 2 S xh2  2 R  h S x h S yh
Note: The separate ratio estimator is more efficient if Rh, population stratum ratio, varies
considerably and the sample size is large enough in each stratum. It is unlikely to improve the
efficiency of estimators if the same auxiliary variable is used for stratification and then for the
ratio method of estimation. If the population parameters are unknown in above expressions,
2 2
substitute the appropriate sample estimators for the parameters, i.e., use R̂h , R̂ , s yh , s xh , s xyh ,
h .

7
Example: Cochran 3rd page167

You might also like