Chapter 9
Chapter 9
9. Analysis of Variance Models
Applies several treatments or treatment combinations to randomly selected experimental units
and compare the treatment means for some response y . In ANOVA, we use linear models to
facilitate a comparison of these means. The model is often expressed with more parameters
than can be estimated, and results in an X matrix that is not of full rank.
9.1 Non‐Full‐Rank Models.
(a) One Way Model
Suppose a researcher has developed two chemical additives for increasing the mileage of
gasoline. To formulate the model, we might start with the notion that without additives, a gallon
yields an average of miles. Then if chemical 1 is added, the mileage is expected to increase by
1 miles per gallon, and if chemical 2 is added, the mileage would increase by 2 miles per
gallon. The model could be expressed as
y1 1 1 ; y2 2 2
where y1 is the miles per gallon from a tank of gasoline containing chemical 1 and 1 is a
random error term. The variables y2 and 2 are defined similarly. The researcher would like to
estimate the parameters ,1 and 2 and test hypothesis such as H 0 : 1 2 .
Suppose the experiment consists of filling the tanks of six identical cars with gas, then adding
chemical 1 to three tanks and chemical 2 to the other three tanks. Thus, a model for each of the
six observations is:
y11 1 11 , y12 1 12 , y13 1 13 ,
(9.1)
y21 2 21 , y22 2 22 , y23 2 23 ,
Or
yij i ij , i 1, 2, j 1, 2,3 (9.2)
where yij is the observed miles per gallon of the j th car that contains the i th chemical in its
tank and ij is the associated random error. The six equations in (9.1) in matrix form as:
y11 1 1 0 11
y12 1 1 0 12
y13 1 1 0 13
1 (9.3)
y21 1 0 1 21
y 1 0 1 2 22
22
y 1 1
23 0 23
or
y Xβ ε .
In (9.3), X is a 6 3 matrix whose rank is 2 since the first column is the sum of the second and
third column, which are linearly independent. Since X is not of full rank, thus the parameters
,1 and 2 cannot be estimated by βˆ XX Xy because XX does not exist.
1 1
107
With three parameters and rank X 2 , the model is said to be overparameterized. By
increasing the number of observations (replication) for each of the two additives will not change
the rank of X .
Three approaches to remedy this problem: (1) redefine the model using two new parameters
that are unique, (2) use the overparameterized model but place constraints on the parameters
so that they become unique, and (3) in the overparameterized model, work with linear
combinations of the parameters that are unique and can be estimated. To illustrate these three
techniques:
1. To reduce the number of parameters, consider for example if 15,1 1, 2 3 , and the
model becomes
yij 15 1 ij 16 1 j , j 1, 2,3,
(9.4)
y2 j 15 3 ij 18 2 j , j 1, 2,3,
The values 16 and 18 are the means after the two treatments have been applied. Generally, the
means could be labeled 1 and 2 and the model could be written as
y1 j 1 1 j and y2 j 2 2 j
The means 1 and 2 are unique and can estimated. The redefined model for all six
observations in (9.1) or (9.2) takes the form
y11 1 0 11
y12 1 0 12
y13 1 0 1 13
y21 0 1 2 21
y 0 1
22 22
y 0 1
23 23
which we write as y Wμ ε .
The matrix W is full rank, and we can estimate μ as
ˆ
μˆ 1 WW Wy
1
ˆ 2
This solution is called reparameterization.
2. Alternatively, to reduce the number of parameters, by introducing constraints on the
parameters , 1 and 2 denoted as * , 1* and 2* . In (9.1) & (9.2), the constraint 1* 2* 0
has the effect of defining * to be the new mean after the treatments are applied and 1* and
2* to be deviations from this mean. With this constraint, (9.4) can be written as
yij 17 1 ij 16 1 j , j 1, 2,3,
.
y2 j 17 1 ij 18 2 j , j 1, 2,3,
This model is now unique because there is no other way to express it so that 1* 2* 0 . Such
constraint are often called side conditions.
108
Thus, model yij * i* ij subject to 1* 1* 0 can be expressed in a full‐rank format by
substituting 2* 1* to obtain y1 j * 1* 1 j and y2 j * 1* ij . So that the matrix
form for the six observations is written as:
y11 1 1 11
y12 1 1 12
y13 1 1 13
*
y21 1 1 1* 21
y 1 1
22 22
y 1 1
23 23
or
y X*β* ε .
Thus, matrix X* is full rank, and the parameters * and 1* can be estimated.
3. In (9.4), exist some linear combinations that are unique. For example, 1 2 2, 1 16,
and 2 18 remain the same for all possible values of , 1 and 2 . Such unique linear
combinations can be estimated.
(b) Two‐Way Model
Suppose we want to measure the effect of two different vitamins and two different methods of
administering the vitamins on the weight gain of chicks. This leads to a two‐way model. Let 1
and 2 be the effects of the two vitamins, and let 1 and 2 be the effects of the two methods
of administration. If we assume that these effects are additive (no interaction), the model
becomes:
y11 1 1 11, y12 1 2 12
or
y21 2 1 21 , y22 2 2 22
yij i j ij ; i 1, 2, j 1, 2, (9.5)
where yij is the weight gain of the ij th chick and ij is the associated random error. In matrix
form, (9.5) becomes
y11 1 1 0 1 0 11
1
y12 1 1 0 0 1 12
2 (9.6)
y21 1 0 1 1 0 21
y22 1 0 1 0 1 1 22
2
or y Xβ ε .
Since rank X 3 , only three unique parameters are possible, unless side conditions are
imposed on the five parameters. There are many ways to reparameterize in order to reduce to
three parameters in the model. For example, consider the parameters 1 , 2 , and 3 defined as
1 1 1 , 2 2 1 , 2 2 1 .
The model can be written in terms of the ’s as
109
In matrix form, this becomes
y11 1 0 0 11
1
y12 1 0 1 12
y21 1 1 0 2 21
1 1 1 3
22
y 22
or y Zγ ε (9.7)
The rank of Z is clearly 3, and we have a full‐rank model for which γ can be estimated by
γˆ ZZ Zy . This provides estimates of 2 2 1 and 3 2 1 , which are typically of
1
interest to the researcher.
Now, consider the side conditions on the parameters. Since rank X 3 and there are five
parameters, we need two (linearly independent) side conditions. If these two constraints are
appropriately chosen, the five parameters become unique and thereby estimable. Denote the
constrained parameters by * , i* , and *j and consider the side conditions 1* 2* 0 and
1* 2* 0 . These lead to unique definition of i* and *j as deviations from means. To show
this, start by writing the model as
y11 11 11 , y12 12 12 ,
(9.8)
y21 21 21 , y22 22 22
where ij E yij is the mean weight gain with vitamin i and method j . The means are
displayed in Table 9.1 , and the parameters 1* , 2* , 1* , and 2* are defined as row and
column effects.
The first column effect, 1* 1 , is the deviation of the mean for vitamin 1 from the overall
mean (after treatments) and is unique. The parameters 2* , 1* , and 2* are likewise uniquely
defined. From definition in Table 9.1, we obtain
1* 2* 1 2 1 2 2
(9.9)
2 2
and similarly, 1* 2* 0 . Thus with the side conditions 1* 2* 0 and 1* 2* 0 , the
redefined parameters are both unique and meaningful. In (9.5) in terms of
* , i* i , and *j j :
ij i j ij i j
* i* *j
110
The term ij i j , which is required to balance the equation, is associated with the
interaction of vitamins and methods. In order for i* and *j to be additive effects, the
interaction ij i j must be zero.
Table 9.1 Means and Effects for the Model in (9.8)
Column 1 Column 2 Row means Row effects
Row 1 11 21 1 1* 1
Row 2 12 22 2 1* 2
Column means 1 2
Column effects 1
*
1 2
*
2
9.2 Estimation
Consider estimation of β and of linear function of β in the non‐full‐rank model y Xβ ε .
W/out reparameterize or impose side conditions and w/out normality assumption of y .
Estimability of β .
Model y Xβ ε , E y Xβ,cov y 2 I, X is n p of rank k p n . X is not of full rank.
Using LS approach, we get the normal equations as XXβˆ Xy . Here, XX has no inverse, and
therefore this normal equations do not have a unique solution. However, there are an infinite
number of solutions.
Theorem 9.2A. If X is n p of rank k p n , the system of equations XXβˆ Xy is
consistent.
Example 9.2. In separate h/out.
Estimable Functions of β .
Since β cannot be estimated, can we estimate any linear
combination of the ’s, say λ β . A linear function of
parameters λ β is said to be estimable if there exists a
linear combination of the observations with an expected
value equal to λ β . Meaning that, λ β is estimable if there
exists a vector a such that E ay λ β .
Theorem 9.2B. In a model y Xβ ε , where E y Xβ and X is n p of rank k p n , the
linear function λ β is estimable if and only if any one of the following conditions holds:
(i) λ is a linear combination of the rows of X , that is, there exists a vector a such that
111
aX λ . (9.10)
Proof: If there exists a vector a such that aX λ , then using this vector a , we have
E ay aE y aXβ λ β
(ii) λ is a linear combination of the rows of XX or λ is a linear combination of the
columns of XX , that is, there exists a vector r such that
r XX λ or XXr λ
(9.11)
Proof: If there exists a solution r for XXr λ , then by defining a Xr , we obtain
E ay E rXy rXE y
rXXβ λ β
(iii) λ or λ is such that
XX XX λ λ or λ XX XX λ .
(9.12)
where XX is any (symmetric) generalized inverse of XX .
Proof: If XX XX λ = λ , then XX λ is a solution to XXr λ in part(ii). Conversely, if
- -
6 3 3
XX 3 3 0 .
3 0 3
1 1
To find a vector r such that XXr λ 0,1, 1 , consider r 0, , , which gives
3 3
6 3 3 0 0
XXr 3 3 0 1 3 1 λ .
3 0 3 1 3 1
1 1
Using the generalized inverse XX diag 0, , given in Example 9.2, the product
(iii)
3 3
XX XX becomes
0 1 1
XX XX 0 1 0 .
0 0 1
Then for λ 0,1, 1 . The condition XX XX λ λ in (9.12) holds:
0 1 1 0 0
0 1 0 1 = 1 .
0 0 1 1 1
Note: A set of functions λ1β, λ 2β,, λ m β is said to be linearly independent if the coefficient
vectors λ1 , λ 2 ,, λ m are linearly independent.
Theorem 9.2C. In the non‐full‐rank model y Xβ ε , the
number of linearly independent estimable functions of β
is the rank of X .
Note: All estimable functions can be obtained from Xβ or
XXβ .
Theorem 9.2D. In the model y Xβ ε , where E y Xβ and X is n p of rank k p n ,
any estimable function λ β can be obtained by taking a linear combination of the rows
(elements) of Xβ or of the rows of XXβ .
Note: We can examine linear combinations of the rows of X or XX to obtain a set of estimable
functions of the parameters.
Example 9.2(b). Consider the model (9.6) with
113
1 1 0 1 0
1 1 0 0
1 1
X , β 2 .
1 0 1 1 0
1
1 0 1 0 1
2
To examine what is estimable, take linear combinations aX of the rows of X to obtain three
linearly independent rows. For example, subtract the first row of X from the third row and
multiply by β , to obtain 0 1 1 0 0 β 1 2 , which involves only the ’s.
Subtracting the first row of X from the third row can be expressed as
aX 1 0 1 0 X x1 x3 , where x1 and x3 are the first and third rows of X .
Subtracting the first row from each succeeding row in X gives
1 1 0 1 0
0 0 0 1 1 .
0 1 1 0 0
0 1 1 1 1
Subtracting the second and third rows from the fourth row of this matrix yields
1 1 0 1 0
0 0 0 1 1 .
0 1 1 0 0
0 0 0 0 0
Multiplying the first three rows by β , we obtain the three linearly independent
estimable functions
λ1β 1 1 , λ 2β 2 1 , λ 3β 2 1 .
These functions are identical to the functions 1 , 2 , 3 used before for (9.6) to reparameterize
to a full‐rank model.
In Example 9.2(b), the two estimable functions 2 1 and 2 1 are such that the
coefficients of the ’s or of the ’s sum to zero. A linear combination of this type is called a
contrast.
9.3 Estimators
9.3.1 Estimators of λ β .
From Theorem 9.2B(i) and (ii) we have the estimators ay and r Xy for λ β , where a and
r satisfy λ aX and λ r XX , respectively. A third estimator of λ β is λ βˆ , where β̂ is a
solution of XXβˆ Xy .
The properties of r Xy and λ βˆ are in Theorem 9.3A.
114
Theorem 9.3A. Let λ β be an estimable function of β in the model y Xβ ε where
E y Xβ and X is n p of rank k p n . Let β̂ be any solution to the normal equations
XXβˆ Xy , and let r be any solution to XXr λ . Then the two estimators λ βˆ and r Xy
have the following properties :
(i)
E λ βˆ E r Xy λ β.
ˆ 0 1
βˆ ˆ1 y1 ˆ 1
ˆ y 1
2 2
To estimate 1 2 0,1, 1 β λ β , we can set ˆ 0 to obtain βˆ 0, y1 , y2 and
λ β y1 y2 . If we leave ̂ arbitrary, we likewise obtain
ˆ
λ βˆ 0,1, 1 y1 ˆ
y ˆ
2
y1 ˆ y2 ˆ y1 y2
Since βˆ XX Xy is not unique for the non‐full‐rank model y Xβ ε with cov y 2 I , it
does not have a unique covariance matrix. However, for a particular (symmetric) generalized
inverse XX , we can obtain the following covariance matrix:
cov βˆ cov XX Xy
XX X 2 I X XX
(9.13)
The following Theorem gives the variance of r Xy and λ βˆ .
Theorem 9.3B. Let λ β be an estimable function in the model y Xβ ε , where X is n p of
rank k p n and cov y 2 I . Let r be any solution to XXr λ , and let β̂ be any solution
to XXr λ . Then the variance of λ βˆ or of r Xy has the following properties:
(i) var r Xy 2rXXr 2r λ .
(ii)
var λ βˆ 2 λ XX λ .
We define
SSE y Xβˆ y Xβˆ (9.14)
where β̂ is any solution to the normal equations XXβˆ Xy . Two alternative expressions for
SSE are
SSE y y βˆ Xy
(9.15)
SSE y I X XX X y
For an estimator of 2 , we define
SSE
s2 (9.16)
nk
where n is the number of rows of X and k rank X .
Theorem 9.3E. For s 2 defined in (9.16) for the non‐full‐rank model y Xβ ε with E y Xβ
and cov y 2 I , we have the following properties:
(i)
E s 2 2 .
s is invariant to the choice of β̂ or to the choice of generalized inverse XX .
2
(ii)
9.3.3 Normal Model
For the non‐full‐rank model y Xβ ε , now assume that
y is N n Xβ, 2 I or ε is N n 0, 2 I .
With the normality assumption we can obtain maximum likelihood estimators.
Theorem 9.3F. If y is N n Xβ, 2 I , y Xβ ε , where X is n p of rank k p n , then the
maximum likelihood estimators for β and 2 are given by
βˆ XX Xy
(9.17)
ˆ 2
1
n
y Xβˆ y Xβˆ (9.18)
Note: The form of the maximum likelihood estimator β̂ in (9.17) is the same as that of the least‐
squares estimator . The estimator ˆ 2 is biased. We often use the unbiased estimator s 2 given in
(9.16).
The mean vector and covariance matrix for β̂ are given as E βˆ XX XXβ and
cov βˆ 2 XX XX XX . Next theorem gives some additional properties of β̂ and s 2 .
117
Theorem 9.3G. If y is N n Xβ, 2 I , where X is n p of rank k p n , then the maximum
likelihood estimators β̂ and s 2 (corrected for bias) have the following properties:
β̂ is N p XX XXβ, 2 XX XX XX .
(i)
(ii) n k s 2 2 is 2 n k .
(iii) β̂ and s 2 are independent.
Theorem 9.3H. If y is N n Xβ, 2 I , where X is n p of rank k p n , and if λ β is an
estimable function, then λ βˆ has minimum variance among all unbiased estimators.
Note: The estimator λ βˆ was shown to have minimum variance among all linear unbiased
estimators. With the normality assumption added in Theorem 9.3G, λ βˆ has minimum variance
among all unbiased estimators.
9.4 Reparameterization
Now, we formalize and extend this approach to obtaining a model based on estimable
parameters.
In reparameterization, we transform the non‐full‐rank model y Xβ ε , where X is n p of
rank k p n , to the full‐rank model y Zγ ε , where Z is n k of rank k and γ Uβ is a
set of k linearly independent estimable functions of β . Thus Zγ Xβ , and we can write
Zγ ZUβ Xβ (9.19)
where X ZU . Since U is k p of rank k p , the matrix U is nonsingular (by Theorem) and
multiply X ZU by U to solve for Z in terms of X and U :
ZUU XU
1 (9.20)
Z XU UU
Thus, model y Zγ ε is a full‐rank model and its normal equations ZZγˆ Zy have the
unique solution γˆ ZZ Zy .
1
In the reparameterized full‐rank model y Zγ ε , the unbiased estimator of 2 is given
y Zγˆ y Zγˆ
1 SSE
s2 (9.21)
nk nk
Since Zγ Xβ , the estimators Zγˆ and Xβˆ are also equal,
Zγˆ Xβˆ .
and therefore SSE in (9.14) and SSE in (9.21) are the same:
y Xβˆ y Xβˆ y Zγˆ y Zγˆ . (9.22)
118
The set Uβ γ is only one possible set of linearly independent estimable functions. Let Vβ δ
be another set of linearly independent estimable functions. Then there exists a matrix W such
that y Wδ ε . Now an estimable function λ β can be expressed as a function of γ or of δ :
λ β bγ cδ. (9.23)
Hence
λ β bγˆ cδˆ .
and either reparameterization gives the same estimator of λ β .
Example 9.4. A reparameterization for the model yij i ij ; i 1, 2 , j 1, 2 . The model
can be written in matrix form as
1 1 0 11
1 1 0 12
y Xβ ε
1 0 1 1 21
2
1 0 1 22
Since X has rank 2, there exist two linearly independent estimable functions . We can choose
these in many ways, one of which is 1 and 2 . Thus
1 1 1 1 0
γ 1 Uβ
2 2 1 0 1
2
To reparameterize in terms of γ , we can use
1 0
1 0
Z
0 1
0 1
so that Zγ Xβ :
1 0 1 1
1 0 1 1 1
Zγ
0 1 2 2 2
0 1 2 2
Alternatively, matrix Z can be obtained directly using (9.20). It is easy to verify that ZU X :
1 0 1 1 0
1 0 1 1 0 1 1 0
ZU X
0 1 1 0 1 1 0 1
0 1 1 0 1
9.5 Testing Hypotheses
Now consider hypotheses about ’s in the model y Xβ ε , where X is n p of rank
k p n .Assume that y is N n Xβ, 2 I .
119
Testable Hypotheses
A hypothesis such as H 0 : 1 2 q is said to be
testable if there exists a set of linearly independent estimable
functions λ1β, λ 2β, , λ t β such that H 0 is true if and only if
λ1β λ 2 β λ t β 0 .
Sometimes the subset of ’s whose equality we wish to test is such that every contrast i ci i
is estimable ( i ci i is a contrast if i ci 0 ). In this case, it is easy to find a set of q 1 linearly
independent estimable functions that can be set equal to zero to express 1 2 q . One
such set is the following:
λ1β q 1 1 2 3 q ,
λ 2β q 2 2 3 4 q ,
λ q 1β q 1 q
These q 1 contrasts λ1β, λ 2β,, λ q 1β constitute a set of linearly independent estimable
functions such that
λ1β 0
λ β 0
q 1
if and only if 1 2 q .
Consider model yij i ij ; i 1, 2,3 , j 1, 2,3 , and a hypothesis of interest is
H 0 : 1 2 3 . By taking linear combinations of the rows of Xβ , we can obtain the two
linearly independent estimable functions 1 2 and 1 2 2 3 . The hypothesis
H 0 : 1 2 3 is true if and only if 1 2 and 1 2 2 3 are simultaneously equal to
zero. Therefore, H 0 is a testable hypothesis and is equivalent to
1 2 0
H0 : (9.24)
1 2 2 3 0
Note: To test the testable hypotheses, we use a full‐and‐reduced‐model approach or
alternatively use a general linear hypothesis test.
Full and Reduced Model
Consider a non‐full‐rank model y Xβ ε , where β is p 1 and X is n p of rank k p n .
Suppose we want to test H 0 : 1 2 q . If H 0 is testable, we can find a set of linearly
independent estimable functions λ1β, λ 2β,, λ t β such that H 0 : 1 2 q is equivalent
to
120
γ1β 0
γ β 0
H 0 : γ1 2
γ t β 0
It is also possible to find
γ t 1β
γ 2
γ β
k
such that the k functions λ1β,, λ t β, λ t 1β , λ k β are linearly independent and estimable,
where k rank X . Let
γ
γ 1 .
γ2
We can now reparameterize (section 9.4) from the non‐full‐rank model y Xβ ε to the full‐
rank model
y Zγ ε Z1γ1 Z 2 γ 2 ε ,
where Z Z1 , Z 2 is partitioned to conform with the number of elements in γ1 and γ 2 .
For the hypothesis H 0 : γ1 0 , the reduced model is y Z 2 γ *2 ε* . The estimate of γ *2 in the
reduced model is the same as the estimate of γ 2 in the full model if the columns of Z 2 are
orthogonal to those of Z1 , that is, if Z 2Z1 O . For the balanced models the orthogonality will
typically hold . Accordingly, we refer to γ 2 and γ̂ 2 rather than to γ *2 and γ̂*2 .
Since y Xβ ε is a full‐rank model, the hypothesis H 0 : γ1 0 can be tested (as in Section
8.2). The test is outlined in Table 9.2, which is analogous to Table 8.3. Note that the degrees of
freedom, t , for SS γ1 | γ 2 is the number of linearly independent estimable functions required
to express H 0 .
Table 9.2. Analysis of variance for Testing H 0 : γ1 0 in Reparameterized Balanced Models
Source of d.f Sum of Squares F‐Statistics
Variation
Due to γ1 t SS γ1 | γ 2 γˆ Z y γˆ 2 Z 2 y SS γ 1 | γ 2 t
adjusted for γ 2 SSE n k
Error nk SSE y y γˆ Z y
Total n 1 SST y y n y 2
In Table 9.2, the sum of squares γˆ Zy is obtained from the full model y Zγ ε . The sum of
squares γˆ 2 Z2 y is obtained from the reduced model y Z 2 γ 2 ε , which assumes the hypothesis
is true. The reparameterization procedure presented above seems straightforward. However,
finding the matrix Z in practice can be time‐consuming. Fortunately, this step is actually not
necessary.
121
From (9.15) and (9.22), we have
y y βˆ Xy y y γˆ Zy
βˆ Xy γˆ Zy (9.25)
where β̂ represents any solution to the normal equations XXβ Xy . Similarly, corresponding
ˆ
to y Z 2 γ *2 ε* , we have a reduced model y X 2β*2 ε* obtained by setting 1 2 q .
Then,
βˆ *2 X2 y γˆ *2Z2 y , (9.26)
where β̂*2 is any solution to the reduced normal equations X2 X 2βˆ *2 X2 y .
Theorem 9.5A. Consider the the partitioned model y Xβ ε X1β1 X 2β 2 ε , where X is
n p of rank k p n 1. If X2 X1 O , the estimate of β*2 in the reduced model y X 2β*2 ε*
is the same as the estimate of β 2 in the full model.
In the balanced non‐full‐rank models we are considering in this chapter, the orthogonality of X1
and X 2 will typically hold. Accordingly, we refer to β 2 and β̂ 2 , rather than to β*2 and β̂*2 . The
test can be expressed as in Table 9.3, in which βˆ Xy is obtained from the full model
y Xβ ε and βˆ 2 X2 y is obtained from the model y X 2β 2 ε , which has been reduced by the
hypothesis H 0 : 1 2 q . Note that the degrees of freedom t for SS β1 | β 2 is the
same as for SS γ1 | γ 2 in Table 9.2, namely, the number of linearly independent estimable
functions required to express H 0 . Typically, this is given by t q 1 . A set of q 1 linearly
independent estimable functions was illustrated at the beginning of Section 9.5.
Table 9.3 Analysis of Variance for Testing H 0 : 1 2 q in Balanced Non‐Full‐Rank
Models
Source of d.f Sum of Squares F‐Statistics
Variation
Due to β1 t SS β1 | β 2 βˆ X y βˆ 2 X2 y SS β1 | β 2 t
adjusted for β 2 SSE n k
Error nk SSE y y βˆ Xy
Total n 1 SST y y n y
2
General Linear Hypothesis
As illustrated in (9.24), a hypothesis such as H 0 : 1 2 3 can be expressed in the form
H 0 : Cβ 0 .We can test this hypothesis in a manner analogous to that used for the general
linear hypothesis test for the full‐rank model in Section 8.4. The following theorem is an
extension of Theorem 8.4a to the non‐full‐rank case.
122
Theorem 9.5B. If y is distributed as N n Xβ, 2 I and X is n p of rank k p n , if C is a
m p of rank m k , such that Cβ is a set of m linearly independent estimable functions, and if
βˆ XX Xy , then
C XX C is nonsingular and invariant to XX .
(i)
Cβˆ is N m , 2C XX C ;
(ii)
1
SSH 2 Cβˆ C XX C Cβˆ 2 is 2 m, , where
(iii)
1
Cβ C XX C Cβ 2 2 ;
SSE y I X XX X y 2 is 2 n k ;
2
(iv)
(v) SSH & SSE are independent.
Theorem 9.5C. Let y be N n Xβ, 2 I where X is n p of rank k p n , and let C , Cβ , and
β̂ be defined as in Theorem 9.5B. Then if H 0 : Cβ 0 is true,the statistic :
1
Cβˆ C XX C Cβˆ m
F
SSH m
; (9.27)
SSE n k SSE n k
is distributed as F m, n k .
9.6 An Illustration of Estimation and Testing
Model : yij i j ij ; i 1, 2,3; j 1, 2
To test H 0 : 1 2 3 and H 0 : 1 2 .
Observations in the form y Xβ ε :
y11 1 1 0 0 1 0 11
y12 1 1 0 0 0 1 1 12
y21 1 0 1 0 1 0 2 21
(9.28)
y22 1 0 1 0 0 1 3 22
y 1 0 0 1 1 0
31 1 31
y 1 0 0 1 0 1
32 2 32
Matrix XX :
6 2 2 2 3 3
2 2 0 0 1 1
2 0 2 0 1 1
XX
2 0 0 2 1 1
3 1 1 1 3 0
3 1 1 1 0 3
The rank of both X and XX is 4
123
Estimable Functions
The hypothesis H 0 : 1 2 3 can be expressed as H 0 : 1 2 and 1 3 0 . Thus H 0 is
testable if 1 2 and 1 3 are estimable. To check 1 2 for estimability, we write it as
1 2 0,1, 1,0,0,0 β λ1β
and then note that λ1 can be obtained from X as
1,0, 1,0,0,0 X 0,1, 1,0,0,0
and from XX as
1 1
0, , ,0,0,0 XX 0,1, 1,0,0,0
2 2
(See Theorems 9.2B & 9.2D). Alternatively, we can obtain 1 2 as a linear combination of
the rows (elements) of E y Xβ :
E y11 y21 E y11 E y21
1 1 2 1 .
1 2
Similarly, 1 3 can be expressed as
1 3 0,1,0, 1,0,0 β λ 2β
and λ 2 can be obtained from X or XX :
1,0,0,0, 1,0 X 0,1,0, 1,0,0 ,
1 1 .
0, ,0, ,0,0 XX 0,1,0, 1,0,0
2 2
It is also of interest to examine a complete set of linearly independent estimable functions
obtained as linear combinations of the rows of X [see Theorem 9.2D) . If we subtract the first
row from each succeeding row of X , we obtain
1 1 0 0 1 0
0 0 0 0 1 1
0 1 1 0 0 0
.
0 1 1 0 1 1
0 1 0 1 0 0
0 1 0 1 1 1
We multiply the second and third rows by 21 and then add them to the fourth row, with similar
operations involving the second, fifth, and sixth rows. The result is
124
1 1 0 0 1 0
0 0 0 0 1 1
0 1 1 0 0 0
0 0 0 0 0 0
0 1 0 1 0 0
0 0 0 0 0 0
Multiplying this matrix by β , we obtain a complete set of linearly independent estimable
functions: 1 1 , 1 2 ,1 2 ,1 3 . Note that the estimable functions not involving
are contrasts in the ’s or ’s.
Testing a Hypothesis
Since two linearly independent estimable functions of the ’s are needed to express
H 0 : 1 2 3 , the sum of squares for testing H 0 : 1 2 3 has 2 degrees of freedom.
Similarly, H 0 : 1 2 is testable with 1 degree of freedom.
The normal equations XXβˆ Xy are given by
6 2 2 2 3 3 ˆ y..
ˆ
2 2 0 0 1 1 1 y1.
2 0 2 0 1 1 ˆ 2 y2.
ˆ (9.29)
2 0 0 2 1 1 3 y3.
3 1 1 1 3 0 ˆ y
1 .1
3 1 1 1 0 3 ˆ2 y.2
If we impose the side conditions ˆ1 ˆ 2 ˆ 3 0 and ˆ1 ˆ2 0 , we obtain the following
solution to the normal equations:
ˆ y.. ,
ˆ1 y1. y.. , ˆ 2 y2. y.. , ˆ 3 y3. y.. (9.30)
ˆ1 y.1 y.. , ˆ2 y.2 y.. ,
where y.. ij yij 6, y1. j y1 j 2, y.1 i yi1 3 , and so on.
If we impose the side conditions on both the parameters and the estimates, equations (9.30) are
unique estimates of unique meaningful parameters. Thus, for example, 1 becomes
1* 1. .. , the expected deviation from the mean due to treatment 1, and y1. y.. is a
reasonable estimate. On the other hand, if the side conditions are used only to obtain estimates
and are not imposed on the parameters, then 1 is not unique, and y1. y.. does not estimate a
parameter. In this case, ˆ1 y1. y.. can be used only together with other elements in β̂ [as
given by (9.30)] to obtain estimates λ βˆ of estimable functions λ β .
Now, to test H 0 : 1 2 3 following the outline in Table 9.3:
125
For the full model, we need βˆ Xy SS ,1 , 2 , 3 , 1 , 2 which we denote by SS , , .
By (9.29) and (9.30), we obtain
y..
y
SS , , βXy ˆ ,ˆ1 ,ˆ 2 ,ˆ3 , 1 , 2 1.
ˆ ˆ ˆ
y.2
ˆ y ˆ y ˆ y ˆ y ˆ y ˆ y
.. 1 1. 2 2. 3 3. 1 .1 2 .2
3 2
y.. y.. y1. y.. y1. y. j y.. y. j (9.31)
i 1 j 1
6 i 1 2 6 j 1 3 6
y..2 3 y1.2 y..2 2 y. j y..2
2
6 i 1 2 6 j 1 3 6
since i yi. y.. & j y. j y.. . The error sum of squares SSE is given by
y 2 3 y 2 y 2 2 y. j y..2
2
y y βˆ Xy yij2 .. 1. .. .
ij 6 i 1 2 6 j 1 3 6
To obtain βˆ 2 X2 y in Table 9.3, use the reduced model yij j ij j ij , where
1 2 3 and is replaced by . The normal equations X2 X 2βˆ 2 X2 y for the reduced
model are
6ˆ 3ˆ1 3ˆ2 y..
3ˆ 3ˆ y .
1 .1 (9.32)
3ˆ 3ˆ2 y.2
Using the side condition ˆ1 ˆ2 0 , the solution to the reduced normal equations in (9.32) is
easily obtained as
ˆ y.. , ˆ1 y.1 y.. , ˆ2 y.2 y.. (9.33)
By (9.32) & (9.33), we have
y..2 2 y. j y..2
2
SS , β 2 X 2 y ˆ y.. 1 y.1 2 y.2
ˆ ˆ ˆ (9.34)
6 j 1 3 6
Denote SS 1 , 2 , 3 | , 1 , 2 as SS | , , we have
y2 y2
SS | , βˆ Xy βˆ 2 X2 y i. .. . (9.35)
i 2 6
The test is summarized in Table 9.4.
126
Table 9.4 Analysis of Variance for Testing H 0 : 1 2 3
Source of Variation d.f Sum of Squares F‐Statistic
Due to
adjusted for ,
2
SS | ,
2
yi .
y..
2
y
i
2
i.
2 y..2 6 2
i 2 6 SSE 2
Error 2 SSE ij yij βˆ Xy
2
Total 5 y..2
SST ij yij2
6
Example 9.2. Consider the model yij i ij ; i 1, 2, j 1, 2,3 . Given that the matrix X
and the vector β as
1 1 0
1 1 0
1 1 0
X , β 1
1 0 1
1 0 1 2
1 0 1
By theorem we obtain
6 3 3
XX 3 3 0 .
3 0 3
A generalized inverse of XX is given by:
0 0 0
X X 0
1
0
3
0 0 1
3
The vector Xy is given by
y11
y12
1 1 1 1 1 1 y..
y13
Xy 1 1 1 0 0 0 y1. ,
0 0 0 1 1 1 y21 y
y 2.
22
y
23
where y.. i 1 j 1 yij , yi. j 1 yij . Then
2 3 3
0 0 0
y.. 0
β XX Xy 0 0 y1. y1. ,
ˆ 1
3
y y
0 1 2. 2.
0
3
1 1
where y1. 3j 1 yij yi.
3 3
To find E βˆ , we need E y .. . Since E ε 0 , we have E ij 0 . Then,
3 yij 1 3
E y.. E E yij
j 1 3 3 j 1
1 3
1
E 1 ij 3 3 i 0
3 j 1 3
i
Thus,
0
E β 1 .
ˆ
2
The same result is obtained using
E βˆ XX XXβ
0 0 0
6 3 3
1
0 0 3 3 0 1
3
3 0 3
2
0 0 1
3
0
1
2
Note:
Theorem : Suppose A is n p of rank r and that A is partitioned as
A A12
A 11
A 21 A 22
where A11 is r r of rank r . Then a generalized inverse of A is given by
A 1 O
A 11 ,
O O
where the three O matrices are of appropriate sizes so that A is p n .
Corollary : Suppose A is n p of rank r and that A is partitioned as above, where A 22 is
r r of rank r . Then, a generalized inverse of A is given by
O O
A 1
O A 22
where the three O matrices are of appropriate sizes so that A is p n .
Example 9.3(a) ESTIMATION IN THE LESS THAN FULL RANK MODEL
Example 9.3(a): It is known that a toxic material was dumped in a river that flows into a large salt water
commercial fishing area. Civil engineers are interested in the amount of toxic material in parts per million
found in oysters harvested at three different locations ranging from the estuary to the bay itself. These data
as follows:
SAS CODE:
data toxic;
input y x1 x2 x3;
cards;
15 1 0 0
26 1 0 0
20 1 0 0
20 1 0 0
29 1 0 0
28 1 0 0
21 1 0 0
26 1 0 0
19 0 1 0
15 0 1 0
10 0 1 0
26 0 1 0
11 0 1 0
20 0 1 0
13 0 1 0
15 0 1 0
18 0 1 0
22 0 0 1
26 0 0 1
24 0 0 1
26 0 0 1
15 0 0 1
17 0 0 1
24 0 0 1
;
proc glm;
model y=x1 x2 x3/xpx i;
title1 Finding a Conditional;
title2 Inverse and Estimating;
title3 The Variance in the Less;
title4 Than Full Rank Model;
/* t1‐t2 is a label for contrast 1 2 */
/*x1 1 x2 ‐1 forms the vector t 0 1 1 0 that forms the contrast 1 2 */
Estimate 't1‐t2' x1 1 x2 ‐1;
/*t1 is a label that says we are trying to estimate 1 *
/* x1 1 forms the vector t 0 1 0 0 used to express 1 in the form t β */
Estimate 't1' x1 1;
run;
OUTPUT:
Finding a Conditional
Inverse and Estimating
The Variance in the Less
Than Full Rank Model
The GLM Procedure
Number of Observations Read 24
Number of Observations Used 24
Finding a Conditional
Inverse and Estimating
The Variance in the Less
Than Full Rank Model
The GLM Procedure
The X'X Matrix
Intercept x1 x2 x3 y
Intercept 24 8 9 7 486
x1 8 8 0 0 185
x2 9 0 9 0 147
x3 7 0 0 7 154
y 486 185 147 154 10546
The GLM Procedure
X'X Generalized Inverse (g2)
Intercept x1 x2 x3 y
Intercept 0.1428571429 ‐0.142857143 ‐0.142857143 0 22
x1 ‐0.142857143 0.2678571429 0.1428571429 0 1.125
x2 ‐0.142857143 0.1428571429 0.253968254 0 ‐5.666666667
x3 0 0 0 0 0
y 22 1.125 ‐5.666666667 0 478.875
Finding a Conditional
Inverse and Estimating
The Variance in the Less
Than Full Rank Model
The GLM Procedure
Dependent Variable: y
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 225.6250000 112.8125000 4.95 0.0174
Error 21 478.8750000 22.8035714 (1)
Corrected Total 23 704.5000000
R‐Square Coeff Var Root MSE y Mean
0.320263 23.58177 4.775309 20.25000
Source DF Type I SS Mean Square F Value Pr > F
x1 1 99.1875000 99.1875000 4.35 0.0494
x2 1 126.4375000 126.4375000 5.54 0.0283
x3 0 0.0000000 . . .
Source DF Type III SS Mean Square F Value Pr > F
x1 0 0 . . .
x2 0 0 . . .
x3 0 0 . . .
Standard
Parameter Estimate Error t Value Pr > |t|
t1‐t2 6.79166667(2) 2.32038285(3) 2.93 0.0081
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 22.00000000 B 1.80489697 12.19 <.0001
x1 1.12500000 B 2.47145696 0.46 0.6536
x2 ‐5.66666667 B 2.40652929 ‐2.35 0.0283
x3 0.00000000 B . . .
(4)
Finding a Conditional
Inverse and Estimating
The Variance in the Less
Than Full Rank Model
The GLM Procedure
Dependent Variable: y
NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve
the normal equations. Terms whose estimates are followed by the letter 'B' are not
uniquely estimable.
(1) The estimated variance, (2) The estimated difference between 1 and 2 and its standard error
s t XX t is given at (3). Estimates for ,1 , 2 and 2 are given in (4). These estimates are not
c
unique. They are based on the conditional inverse found.
HYPOTHESIS TESTING IN THE LESS THAN FULL RANK MODEL
Example 9.3(b): Three different treatment methods for removing organic carbon from tar sand
wastewater are to be compared. The methods are airflotation (AF), foam separation (FS), and
ferric‐chloride coagulation (FCC). These data, are obtained:
H 0 : 1 2 3
In matrix form, we are testing
H 0 : Cβ 0
0 1 1 0 1
where C and β
0 1 0 1 2
3
The F‐statistic used to test H 0 from (8.21) is:
1
Cβˆ C XX C Cβˆ q
c
F
SSH q
SSE n k 1 SSE n k 1
For this model,
1 1 0 0
1 1 0 0
1 1 0 0
1 0 1 0 30 10 10 10
1 10 10 0 0
0 1 0
X , XX .
10 0 10 0
1
0 1 0 10 0 0 10
1 0 0 1
1 0 0 1
1 0 0 0
0 0 0 0
0 1 0 0
10
XX c 1 .
0 0 0
10
1
0 0 0
10
Using this conditional inverse,
0
ˆβ XX c Xy 36.25 , C XX c C 0.2 0.1
44.0 0.1 0.2
28.18
C XX c C
1 1 0.2 0.1 7.75
, Cβ .
0.03 0.1 0.2 8.07
The numerator of the F ratio used to test H 0 is
The residual sum of squares for these data can be shown to be 278.661. Thus, the F ratio for
testing H 0 : 1 2 3 is
625.766
F2,27 60.63
278.661 / 27
This F2,27 value is >3.354 F0.05,2,27 , so we have enough evidence to reject the null hypothesis.
We conclude that the three different treatment methods give different results in removing organic
carbon from tar sand wastewater.
PROC GLM is used to test hypotheses in the less than full rank model.
SAS CODE:
data tar;
input y x1 x2 x3;
cards;
34.6 1 0 0
35.1 1 0 0
35.3 1 0 0
35.8 1 0 0
36.1 1 0 0
36.5 1 0 0
36.8 1 0 0
37.2 1 0 0
37.4 1 0 0
37.7 1 0 0
38.8 0 1 0
39.0 0 1 0
40.1 0 1 0
40.9 0 1 0
41.0 0 1 0
43.2 0 1 0
44.9 0 1 0
46.9 0 1 0
51.6 0 1 0
53.6 0 1 0
26.7 0 0 1
26.7 0 0 1
27.0 0 0 1
27.1 0 0 1
27.5 0 0 1
28.1 0 0 1
28.1 0 0 1
28.7 0 0 1
30.7 0 0 1
31.2 0 0 1
;
proc glm; /* asks for the general linear models procedure*/
model y=x1 x2 x3; /* identifies the independent variables as x1,x2,x3 and y as the..*/
/* response variable*/
contrast 'equal means' x1 1 x2 ‐1 x3 0, /* ask GLM to test H 0 : Cβ 0 ; the values..*/
x1 1 x2 0 x3 ‐1; /*listed after the variable names form the..*/
/* three columns of the matrix C */
run;
OUTPUT:
The GLM Procedure
Number of Observations Read 30
Number of Observations Used 30
Dependent Variable: y
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 1251.532667 625.766333 60.63 <.0001
Error 27 278.661000(B) 10.320778(C)
Corrected Total 29 1530.193667
R‐Square Coeff Var Root MSE y Mean
0.817892 8.888490 3.212597 36.14333
Source DF Type I SS Mean Square F Value Pr > F
x1 1 0.170667 0.170667 0.02 0.8986
x2 1 1251.362000 1251.362000 121.25 <.0001
x3 0 0.000000 . . .
Source DF Type III SS Mean Square F Value Pr > F
x1 0 0 . . .
x2 0 0 . . .
x3 0 0 . . .
Contrast DF Contrast SS Mean Square F Value Pr > F
equal means 2 1251.532667(A) 625.766333 60.63(D) <.0001
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 28.18000000 B 1.01591229 27.74 <.0001
x1 8.07000000 B 1.43671694 5.62 <.0001
x2 15.82000000 B 1.43671694 11.01 <.0001
x3 0.00000000 B . . .
NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve
the normal equations. Terms whose estimates are followed by the letter 'B' are not
uniquely estimable
From the output, SAS NOTE indicates that parameter estimates are not unique in the less than full rank
model. The estimates found are based on a conditional inverse. The sum of squares associated with the
hypothesis H 0 : Cβ 0 is
This sum of squares is shown by (A) . The residual sum of squares and s 2 are given at (B) and (C)
,respectively. The F ratio used to test H 0 : Cβ 0 is given by (D). Note that any testable hypothesis can
be tested via an appropriately chosen CONTRAST statement. The estimates for 0 , 1 , 2 and 3 shown in
the output are different from those calculated earlier because these estimates are not unique.
Example 9.3(c): A one‐way ANOVA with fixed effects based on the reparameterized model can be run easily
on SAS using PROC GLM or PROC ANOVA.
SAS CODE:
data tar;
input method $ remove;
cards;
AF 34.6
AF 35.1
AF 35.3
AF 35.8
AF 36.1
AF 36.5
AF 36.8
AF 37.2
AF 37.4
AF 37.7
FS 38.8
FS 39.0
FS 40.1
FS 40.9
FS 41.0
FS 43.2
FS 44.9
FS 46.9
FS 51.6
FS 53.6
FCC 26.7
FCC 26.7
FCC 27.0
FCC 27.1
FCC 27.5
FCC 28.1
FCC 28.1
FCC 28.7
FCC 30.7
FCC 31.2
;
proc glm;
class method; /* indicates that data are grouped according to the values
of the variable METHOD*/
model remove=method; /*identifies the variable REMOVE as the response variable*/
title coal‐tar data; /*titles the output*/
run;
OUTPUT:
coal‐tar data
The GLM Procedure
Class Level Information
Class Levels Values
method 3 AF FCC FS
Number of Observations Read 30
Number of Observations Used 30
The GLM Procedure
Dependent Variable: remove
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 1251.532667(B) 625.766333 60.63(A) <.0001
Error 27 278.661000(C) 10.320778
Corrected Total 29 1530.193667
R‐Square Coeff Var Root MSE remove Mean
0.817892 8.888490 3.212597 36.14333
Source DF Type I SS Mean Square F Value Pr > F
method 2 1251.532667 625.766333 60.63 <.0001
Source DF Type III SS Mean Square F Value Pr > F
method 2 1251.532667 625.766333 60.63 <.0001
From the output, the F ratio used to test H 0 : 1 2 3 is shown by (A). The ANOVA table is based on
the corrected total sum of squares. The sum of squares (SS of regression) is shown by (B) and the error sum
of squares shown by (C)
Example 9.3(b)
Using the CONTRAST and ESTIMATE Statements with Unbalanced Data
Consider the toxic data in Example 9.3(a). Now we want to test the significance of the difference between
toxic means with the CONTRAST statement.
data toxic;
input site toxic;
cards;
1 15
1 26
1 20
1 20
1 29
1 28
1 21
1 26
2 19
2 15
2 20
2 10
2 26
2 11
2 13
2 15
2 18
3 22
3 26
3 24
3 26
3 15
3 17
3 24
;
proc glm;
class site;
model toxic=site;
contrast 'site_1‐site_2' site 1 ‐1 0;
estimate 'site_1‐site_2' site 1 ‐1 0;
contrast 'site_1‐site_3' site 1 0 ‐1;
estimate 'site_1‐site_3' site 1 0 ‐1;
contrast 'site_2‐site_3' site 0 1 ‐1;
estimate 'site_2 ‐site_3' site 0 1 ‐1;
run;
OUTPUT:
The GLM Procedure
Class Level Information
Class Levels Values
site 3 1 2 3
Number of Observations Read 24
Number of Observations Used 24
The GLM Procedure
Dependent Variable: toxic
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 225.6250000 112.8125000 4.95 0.0174
Error 21 478.8750000 22.8035714
Corrected Total 23 704.5000000
R‐Square Coeff Var Root MSE toxic Mean
0.320263 23.58177 4.775309 20.25000
Source DF Type I SS Mean Square F Value Pr > F
site 2 225.6250000 112.8125000 4.95 0.0174
Source DF Type III SS Mean Square F Value Pr > F
site 2 225.6250000 112.8125000 4.95 0.0174
Contrast DF Contrast SS Mean Square F Value Pr > F
site_1‐site_2 1 195.3602941 195.3602941 8.57 0.0081
site_1‐site_3 1 4.7250000 4.7250000 0.21 0.6536
site_2‐site_3 1 126.4375000 126.4375000 5.54 0.0283
Standard
Parameter Estimate Error t Value Pr > |t|
site_1‐site_2 6.79166667 2.32038285 2.93 0.0081
site_1‐site_3 1.12500000 2.47145696 0.46 0.6536
site_2 ‐site_3 ‐5.66666667 2.40652929 ‐2.35 0.0283
NOTE: The CONTRAST statement produces the same sum of squares (SS), mean square, F‐test and p‐value
for the differerence between means obtained from the Type III ANOVA F‐Test. (test of H 0 : A B ).
From the output:
1. The difference between the estimates of 1 and 2 is 6.7917, and the standard error of the estimate is 2.32.
2. The difference between the estimates of 1 and 3 is 1.123, and the standard error of the estimate is 2.47.
3. The difference between the estimates of 2 and 3 is ‐5.67, and the standard error of the estimate is 2.41
4. A t‐statistic for testing H 0 : 1 2 is t 6.7917 2.32 2.93 . The p‐value for the t‐statistic is 0.0081.
5. A t‐statistic for testing H 0 : 1 3 is t 1.125 2.471 0.46 . The p‐value for the t‐statistic is 0.6536.
6. A t‐statistic for testing H 0 : 2 3 is t 5.666 2.4065 2.35 . The p‐value for the t‐statistic is 0.0283.
5. We conclude either rejecting or not rejecting the null hypothesis by referring to p‐value from the output.
6. From this example, we conclude that we have enough evidence to reject the null hypothesis (since
p value 0.05 ) of equal means between site 1 and site 2 and between site 2 and site 3 whereas there are not
enough evidence to reject null hypothesis of equal means between site 1 and site 3. We conclude that the amount of
toxic is equal between site 1 and site 3 but are not equal between site 1 and site 2 and site 2 and site 3 each.
7. To obtain a 95% confidence interval for the difference in means between this groups:
1 1
A B t0.025,21MSE
nA nB
From the output: MSE = 22.8035714 , and from statistical table, we found that t0.025,21 =2.080, therefore:
1 2 : 6.79 2.080(22.803)
1 1
;
8 9
1 3 :1.125 2.080(22.803)
1 1
;
8 7
2 3 : 5.666 2.080(22.803)
1 1
9 7
ANOVA
To compare the treatment means for some response y after
applying several treatments to randomly selected
experimental unit.
The model is often have more parameters than can be
estimated results in an X matrix that is not of full rank (non‐
full‐rank model)
We also might end up with balanced or unbalanced models.
Non‐Full‐Rank Models
(a) One‐Way Model
y1 1 1 ; y2 2 2
Model: i
yij i ij , i 1, 2, j 1, 2,3
y11 1 1 0 11
y
12 1 1 0 12
y13 1 1 0 13
1
21 1
y 0 1 21
y 1 2
0 1
22 22
y 1
23 0 1 23
y1 j 15 1 ij 16 1 j , j 1, 2,3,
y2 j 15 3 ij 18 2 j , j 1, 2,3,
Which is can be written as
y1 j 1 1 j and y2 j 2 2 j and
y11 1 0 11
y12 1 0 12
y13 1 0 1 13
y21 0
y 0
1 2 21
1 y Wμ ε
22 22
y 0
23 1 23
The matrix W is full rank, and we can estimate μ as
ˆ1
μˆ WW Wy
1
ˆ 2
(ii) Impose contraint 1 2 0 (side conditions)
* *
So the example
y1 j 15 1 ij 16 1 j , j 1, 2,3,
y2 j 15 3 ij 18 2 j , j 1, 2,3,
yij 17 1 ij 16 1 j , j 1, 2,3,
becomes y 17 1 18 , j 1, 2,3,
2j ij 2j
form for the six observations can be written as:
y11 1 1 11
y12 1 1 12
y X β ε
y13 1 1 * 13
y21 1 1 1* 21 * *
y 1 1
22 22
y 1 1
23 23
can be estimated.
(iii) In
y1 j 15 1 ij 16 1 j , j 1, 2,3,
y2 j 15 3 ij 18 2 j , j 1, 2,3,
exist some linear combinations that are unique. For example,
1 2 2, 1 16, and 2 18 remain the same
for all possible values of ,1 and 2 . Such unique linear
combinations can be estimated.
(b) Two‐Way Anova
Example: to measure the effect of two different vitamins and
two different methods of administering the vitamins on the
weight gain of chicks. This leads to a two‐way model.
and can be written as
yij i j ij ; i 1, 2, j 1, 2,
where yij is the weight gain of the ij th chick and ij is
the associated random error.
Written in matrix form, as
y11 1 1 0 1 0 11
1
y 1 1 0 0 1 12
12
2
y21 1 0 1 1 0 21
22 1
y 0 1 0 1 1 22
2
y Xβ ε
1 1 1 , 2 2 1 , 3 2 1 .
So that the model can be written in terms of the ’s as
y11 1 1 11 1 11 ,
y12 1 1 2 1 12 1 3 12 ,
y21 1 1 2 1 21 1 2 21 ,
y22 1 1 2 1 2 1 22 1 2 3 22 .
In matrix form, written as
y11 1 0 0 11
1
y Zγ ε
y 1 0 1 12
12
2
y21 1 1 0 21
3
22 1
y 1 1 22
Rank(Z)=3 now we have a full‐rank model for which γ can
be estimated by γˆ ZZ Zy . This provides estimates of
1
2 2 1 and 3 2 1 , which are typically of interest to
the researcher.
(ii) side condition
means. To show this, start by writing the model as
y11 11 11 , y12 12 12 ,
y21 21 21 , y22 22 22
method j . The means are displayed in Table 9.1 , and the
parameters 1* , 2* , 1* , and 2 are defined as row and
*
column effects.
One thing for sure with the side conditions 1* 2* 0 and
1* 2* 0 , the redefined parameters are both unique and
meaningful.
ESTIMATION
A linear function of parameters λ β is said to be estimable if
there exists a linear combination of the observations with an
expected value equal to λ β . Meaning that, λ β is estimable if
there exists a vector a such that E ay λ β .
So, what are the estimators of this λ β ?
Estimator of
2
So far we have defined SSE as
(i)
SSE y Xβˆ y Xβˆ
(ii) SSE y y βˆ Xy and
An estimator of is
2
SSE
s2
n k where n is the number rows of X and
k=rank(X).
2
s
The properties of this are given in Theorem 9.3E
Now we assumed a normal model (for non‐full rank model)
βˆ XX Xy
ˆ 2
1
n
y Xβˆ y Xβˆ
(ii) n k s is n k .
2 2 2
Testable Hypotheses
A hypothesis such as H 0 : 1 2 q is said to be testable if
there exists a set of linearly independent estimable functions
λ1β, λ 2 β, , λ t β such that H 0 is true if and only if
λ1β λ 2 β λ t β 0 .
, then
1
Cβ C XX C Cβ 2 2 ;
Cβˆ C XX
1
C Cβˆ m
F
SSH m
SSE n k SSE n k
;
is distributed as F m, n k .