Stat 3515 Lecture Notes No 1 s2022 Single Factor Completely Randomized Experiments
Stat 3515 Lecture Notes No 1 s2022 Single Factor Completely Randomized Experiments
A completely randomized design is particularly useful when the experimental units are quite
homogeneous. This design is very flexible; it accommodates any number of treatments and permits
different sample sizes for each of the treatments. Its chief disadvantage is that, when the experimental
units are heterogeneous, this design is not as efficient as other statistical designs.
Terminology:
A level of a factor is a particular form of that factor. In the synthetic fiber study, the product
development engineer has selected fibers with 15%, 20%, 25%, 30% and 35% cotton. These are the
five levels of the factor in that study. In the cereal study, there are four levels for the factor of package
design. In the first example the factor is a quantitative one, while in the second example it is a
qualitative one.
In a single factor experiment, a treatment corresponds to a factor level. In multi factor studies,
a treatment will correspond to a combination of factor levels. In a single factor experiment if the
levels of the factors are chosen at random, we will say it is a random (effects) model, otherwise it will
be called a fixed (effects) model.
Type of Data:
Observational Data - data that is obtained without controlling the independent variable(s) of interest.
Experimental Data - data that is obtained by the experimenter by controlling the independent
variable(s).
15 7 7 15 11 9 49 9.8
20 12 17 12 18 18 77 15.4
25 14 18 18 19 19 88 17.6
30 19 25 22 19 23 108 21.6
35 7 10 11 15 11 54 10.8
____ _____
376 15.04
It is always a good idea to examine experimental data graphically. For example one can
present a boxplot and/or a scatter plot of tensile strength vs cotton percentage. In the SAS output the
letters are the individual observations and the rectangles in the boxplot are the sample means. Both
graphs indicate that tensile strength increases as cotton content increases, up to 30%. Beyond 30%
cotton there is a sizable decrease in tensile strength. The scatter diagram supports the fact that the
variability does not depend on cotton content. From the graphical display one would suspect that
cotton content affects tensile strength and that around 30% cotton one would get maximum strength.
Model:
Yij is the jth observation for the ith treatment, ì is the overall mean representing the common effect
for the entire experiment, ôi is the effect of the ith treatment, and gij is the random error present in the
jth observation for the ith treatment.
a
2. ' ôi = 0.
i=1
From the expression for the model it follows that for 1#j#ni and 1#i#a
E(Yij) = ì + ôi = ìi
is the mean of the observations in the ith group. The analysis of this experiment consists of testing
To test the above hypothesis the F test in a one way analysis of variance is used. The anova
approach has two purposes. First, it provides a subdivision of the total variability between the
experimental units into separate components, each component representing a different source of
variability, so that the relative importance of the different sources can be assessed. Second, and more
important, it gives an estimate of the underlying variability between units which provides a basis for
inferences about the effects of the applied treatments. We now proceed to
develop this for our model.
ni a ni a
Notation: Yi. = ' Yij, Y.. = ' ' Yij, N = ' ni
j=1 i=1 j=1 i=1
_ _ a
Yi. = Yi. / ni, Y.. = Y.. / N = ' Yi . / N.
i=1
_ _ _ _
Yij - Y.. = (Yij - Yi .) + (Yi. - Y..).
The above equation states that the deviation of each observation from the overall mean can be
decomposed into two parts: the deviation of the observation from its treatment mean plus the
deviation of the treatment mean form the overall mean. If we square both sides of the above equation
and sum over i and j we get the following fundamental equation of the analysis of variance:
a ni a ni a
_ _ _ _
' ' (Yij - Y..) = ' ' (Yij - Yi.) + ' ni(Yi. - Y..)2,
2 2
or
The term on the left hand side represents the total variability in the data. The first term on the right
hand side of the identity represents the total variability within each of the a treatments.
Since we have assumed that the variances within the a treatments are equal if we divide that term by
a
' (ni - 1) = N - a
i=1
we get an unbiased estimator of the variance ó2, which is valid regardless of the null hypothesis being
Now, if H0 is true, then the second term divided by a - 1, is also an unbiased estimator of ó2.
Moreover the two estimators are independent of each other and their quotient denoted by
has an F distribution with a-1 and N-a degrees of freedom. Since the numerator gets large when H0
is not true, while the denominator remains stable, we reject H0 for large values of F. Table IV on
pages A-6 to A-10 gives the critical values for the upper tail area = á for the F distribution, for
selected values of á.
SS total = 636.96
SS treatment = 475.76
SS error = 161.20
F = 14.76
From Table IV, page A-10 in the Appendix, we get that the critical value for our data set for á=.01
is 4.43 (õ1=4, õ2=20). Therefore, we can reject the null hypothesis at the .01 level. A more accurate
result can be obtained from SAS output.