Chapter 5 - 2010
Chapter 5 - 2010
Suppose the units of the population possess two characteristics that are correlated to each other.
Let two variates Xi and Yi, i = 1, 2, - - -, N, represent the two characteristics. Assume that Y
represents the study variable and X represents an auxiliary variable. The ratio of these two
variables is the simplest and most commonly used method of the complex estimation techniques
for improving the precision or reliability. Consider the following examples in which the variables
are somehow related:
Percentage of food expenditure to that of total income;
The use of fertilizer per hectare;
Gross enrolment ratio;
The interest is to estimate the ratio (R), the mean and the total of study variable. For each unit in
the population, there are values for X and values for Y: Y = {Y1, Y2, - - -, YN} and X = {X1, X2,
- - -, XN}.
N
Y
i 1
i
Y y
Then the ratio can be defined as: R N
X x
X
i 1
i
For each unit in the sample of size n elements selected by SRS there are sample values:
y = {y1, y2, - - -, yn} and x = {x1, x2, - - -, xn}. The sample estimate for the ratio is
n
y i
y y
Rˆ i 1
n
x x
x
i 1
i
The estimated ratio, R̂ , is not necessarily an unbiased estimate of a population ratio, R, that is,
E( R̂ ) R. In most situations the bias is small, and estimated ratios are widely used. For large
samples, mostly if n 30, the ratio estimate is a consistent estimate, which tends to normality
with negligible bias.
y
For ratio estimate, Rˆ , both the numerator ( y ) and the denominator ( x ) are subject to
x
sampling variability. Therefore, an exact expression for its standard error is complicated, but in
large samples its approximation is valid.
Theorem 5.1: If variates yi and xi are measured on each unit of a simple random sample of size
y
n, assumed large, the MSE and variance of Rˆ are each approximately
x
1
N
(y i R xi ) 2
1 f i 1 y
MSE ( Rˆ ) V ( Rˆ ) , where R is the ratio of the population means and
nX 2 N 1 x
f = n / N. Prove this theorem.
If we have a sample estimate R̂ from n sample observations, then the variance of the estimated
n
(y i Rˆ xi ) 2
1 f i 1 1 f 2
ratio is given by: v( R̂ ) = sd ,
nX 2 n 1 nX 2
n
(y i Rˆ xi ) 2
where s d2 i 1
. If X is unknown, substitute it by its sample estimate, x .
n 1
n
(y i Rˆ xi ) 2
1 f i 1 1 f 2 1 f sd
The standard error of R̂ is s. e. ( R̂ ) = s d
nX 2 n 1 nX 2 n X
Example: (Cochran page 33)
A simple random sample of size n = 33 low-income families was considered. Let x1i represent
the size of the family, yi denotes weekly expenditure on food per family, and x2i represent
weekly family income. The following data were available on these variables.
n n n n
x1i 123
i 1
x2i 2394
i 1
yi 907.2
i 1
x i 1
1i y i 3595.5
n n n n
2 2 2
x
i 1
1i 533 x
i 1
2i 177254 y
i 1
i 28224 x i 1
2i y i 66678
y
i 1
i
907.2
Solution: a) mean weekly expenditure on food per family: y = $27.49
n 33
n
y y 2 2
n
1 f 2 s 2y i
i i
28224 (907.2) 2 33
v( y ) = sy = = 3.11
n n n(n 1) 33(32)
s.e( y ) = 1.76
b) Let R̂1 = mean weekly expenditure on food per person.
n
y
i 1
i
907.2
R̂1 = n
$7.38
123
x i 1
1i
2
n
x
i 1
1i
123
Average person per family is x1 3.7273 . Then the variance of R̂1 is
n 33
y
i 1
i
907.2
x
i 1
2i
2394
R̂ 2 = n
x100% 37.9% , x2 72.5455
2394 n 33
x
i 1
2i
2 (y i Rˆ 2 x 2i ) 2 y 2
i Rˆ 22 x 22i 2 Rˆ 2 y i x 2i
1 f 2 s d i i i
v( R̂ 2 ) = 2
s d 2
nx 2 nx 2 x 22 n(n 1) x 22 n(n 1)
28224 (0.379) 2 (177254) 2(0.379)(66678)
v( R̂ 2 ) = 0.00056554
(72.5455) 2 x33 x32
s.e.( R̂ 2 ) = 0.02378
Assuming simple random sampling and considering the advantage of the correlation between Xi
and Yi, we can estimate the population total of the Yi, (Y) and the mean ( y ) using the ratio
estimate as stated in the following theorem.
Theorem 5.2: The ratio estimate of the population total Y and the population mean y are,
y y
respectively, YˆR X , YˆR y assuming X is known. In simple random sample of size
x x
n, with large n, the variances of ratio estimates of population total and mean are given by:
N
(y i R xi ) 2
1 f 1 f 2
V (YˆR ) N 2 i 1
N 2 Sd ,
n N 1 n
N
(y i R xi ) 2
1 f 1 f 2
V (YˆR ) i 1
Sd (Prove both variances)
n N 1 n
1 f 2
Corollary: i) V (YˆR ) N 2 ( S y R 2 S x2 2 R S y S x )
n
3
2
ˆ 1 f 2 S y S x2 2 S xy
ii) V (YR ) Y
n Y 2 X 2 Y X
1 f 2
iii) V (YˆR ) Y (CV y2 CV x2 2 CV xy )
n
Where CVy and CVx are coefficients of variation of yi and xi respectively, and CVxy is the
relative covariance.
iv) The coefficient of variation of Yˆ , Yˆ , and R̂ are the same for the three estimates, i.e.,
R R
1 f Sd
CV( YˆR ) = CV( YˆR ) = CV( R̂ ) = (Verify i-iv)
n Rˆ X
In the above expressions the population parameters S y2 , S x2 , S xy , and R are usually unknown.
Therefore, we substitute these parameters by s 2y , s x2 , s xy , and R̂ , respectively to get the
following variances of the estimates.
n
(y i Rˆ xi ) 2
1 f 1 f 2 1 f 2 ˆ 2 2
i) v(YˆR ) N 2 i 1
N 2 sd N 2 ( s y R s x 2 Rˆ s xy )
n n 1 n n
n
(y i Rˆ xi ) 2
1 f 1 f 2 ˆ 2 2
ii) v(YˆR ) i 1
( s y R s x 2 Rˆ s xy )
n n 1 n
n
(y i Rˆ xi ) 2
1 f 1 f 2 ˆ 2 2
iii) v( Rˆ ) i 1
(s y R s x 2 Rˆ s xy )
nX2 n 1 nX2
If X is unknown, we substitute it by the sample mean x .
Confidence Limits
A 100(1-)% confidence interval may be constructed for large samples to apply normal
distribution. For total, Y: Y = YˆR Z s.e.(YˆR ) , For Mean, y : y YˆR Z s.e.(YˆR ) ,
2 2
4
100 100 100 100 100
xi = 1200,
i 1
xi2 = 15,620,
i 1
yi = 1750,
i 1
yi2 = 31,650,
i 1
x y
i 1
i i = 22059.35
y i
1750 y 12500
Solution: a) Rˆ i 1
n
= = 1.4583, b) YˆR X = R̂ X = 1.4583 ( ) = 18.23
1200 x 1000
x
i 1
i
c) To find standard error of the estimates, the following sample statistics must be calculated.
n n
x i y i ( x i ) ( y i ) n
i 1 i 1 22059.35 (1200) (1750) / 100
i 1
s yx = = = 10.7005
n 1 100 1
1 f 2 ˆ 2 2 1 0. 1
v( Rˆ ) 2
( s y R s x 2 Rˆ s xy ) = (10.35 (1.4583) 2 (12.32) 2(1.4583) (10.7005))
nX 100 (12.5) 2
v(Rˆ ) 0.00033382 s.e( R̂ ) = 0.01827
1 f 2 ˆ 2 2 1 0. 1
v(YˆR ) ( s y R s x 2 Rˆ s xy ) = (10.35 (1.4583) 2 (12.32) 2(1.4583) (10.7005)) =
n 100
0.04807
s.e( Yˆ ) = 0.21925
R
There are different methods of examining the bias, but here we treat only one method which
gives an exact result for the bias and an upper bound to the ratio of the bias to the standard error.
Consider the relationship between R̂ and x . That is, to consider the covariance and correlation of
R̂ and x in simple random sample of size n. Assume that E( x ) = x and E( y ) = y .
By definition, Cov( R̂ , x ) = E R̂ -E( R̂ ) x -E( x ) = x E( R̂ -R) = x B Rˆ (verify).
5
Cov ( Rˆ , x ) x BRˆ
Then the correlation between R̂ and x is given by: Rˆ x
S Rˆ S x S Rˆ S x
B Rˆ
BRˆ Rˆ x S Rˆ CV ( x ) CV ( x ) , since Rˆ x 1 .
S Rˆ
It shows that the magnitude of the bias in R̂ as a ratio of its standard error cannot exceed the
coefficient of variation of x . If R̂ and x are uncorrelated, the bias vanishes. If CV( x ) < 10%,
the bias can be ignored.
The bias in Yˆ and Yˆ can be obtained in a similar way.
R R
Theorem 5.3: In large samples, with simple random sample, the ratio estimate YˆR has a smaller
variance than the estimate Yˆ N y obtained by simple expansion, if
1 S x X 1 CV ( x)
, where is the correlation between Yi and Xi.
2 S y Y 2 CV ( y )
N 2 (1 f ) 2 N 2 (1 f ) 2
Proof: V( Yˆ ) = Sy V( YˆR ) = ( S y R 2 S x2 2 R S y S x )
n n
If the V( YˆR ) < V( Yˆ ), then V( Yˆ ) V( YˆR ) > 0
N 2 (1 f ) 2 N 2 (1 f ) 2
Sy ( S y R 2 S x2 2 R S y S x > 0
n n
1 R 2 S x2 Y
S y2 S y2 R 2 S x2 2 R S y S x > 0 , assuming R is positive.
2 RS x S y X
1 RS x 1 S x X 1 CV ( x)
2 Sy 2 S y Y 2 CV ( y )
Therefore, if the difference between the two variances is greater than zero, i.e., V( Yˆ ) V( YˆR ) >
0, then a ratio estimate is more efficient. If the difference is zero, then both estimates are equally
efficient. If the difference is less than zero, the ratio estimate is not as efficient as the estimate
from simple expansion.
In stratified random sampling design, there are two methods for estimating ratios: the separate
ratio estimate and the combined ratio estimate.
y y
The separate ratio estimate: For stratum h: YˆRh = h X h h X h = R̂ h Xh , and its variance will
xh xh
ˆ N h2 (1 f h ) 2
be V( YRS ) = ( S yh Rh2 S xh2 2 R h h S yh S xh ) , where yh and xh are sample totals in
nh
th
the h stratum, and Xh is the population stratum total and it should be known.
6
For the overall total, the separate ratio estimate is represented by YˆRS and is given as:
L L
yh yh
YˆRS = x Xh X h , with the variance given in the following theorem.
h 1 h h 1 xh
Theorem 4.4: If an independent simple random sample is drawn in each stratum and sample
sizes are large in all strata, then the variance of YˆRS is
N h2 (1 f h ) 2
L
V( YˆRS ) = (S yh Rh2 S xh2 2 Rh h S yh S xh ) , where Rh and h are the true ratio and
h 1 nh
correlation in stratum h respectively. Prove this theorem.
The combined Ratio Estimate: In stratified sample, estimate of the population total Y is
Yˆst = N y st = N Wh y h N h y h . For population total X, its estimate is
X̂ st = N x st = N Wh x h N h x h .
The combined ratio estimate, YˆRc , is given as YˆRc = ( Yˆst / X̂ st ) X = ( y st / x st ) X, where y st and
x st are the estimated population means from a stratified sample, and X is known.
Theorem 4.5: If the total sample size n is large, the variance of Yˆ is given by: Rc
L 2
W (1 f h ) 2
V( YˆRc ) = h
( S yh R 2 S xh2 2 R h S yh S xh ) . Prove this theorem.
h 1 nh
ˆ
L
N h2 (1 f h ) 2
Corollary: V( YRc ) (S yh R 2 S xh2 2 R h S yh S xh ) .
h 1 nh
7
Example: Cochran 3rd page167