Balanced ANOVA
Balanced ANOVA
(ANOVA)
Review: Two-Sample T-Test
• Comparison between two groups
(two groups in a categorical variable like
Female/Male or Hispanic/non-Hispanic)
• Observed samples from each population
• Assume underlying normality in each population
• Use rank-based methods when not normal
Limitations of T-Test
• Goal: Analyze the difference among groups and study the behaviors of
response variable depending on grouping variable
Want to answer:
• Does type of treatment (or diet or exercise) affect blood sugar?
• If so, which treatment is the most efficient?
• Does diet help to decrease blood sugar?
ANOVA model
Kind of extension of two-sample t-test
• Compare means of two groups
• T-test can be applied only when both groups follow normal
(parametric test)
• Two types of t-test under equal variance or unequal variance assumption
• Alternate Hypothesis:
o H1: At least ONE of the group means is significantly
different from the others in the population
• Alternative hypothesis:
H1: There is an interaction between independent
variables in the population.
Assumptions of ANOVA
1) The response (dependent) variable is continuous
2) Populations from which samples were drawn follow
normal distribution
o i.e., Each group should be normally distributed
Note: ANOVA relatively robust to violations of
normality
3) Populations from which samples were drawn must
have equal variances (Homogeneity of Variance)
Need to perform equal variance test before applying ANOVA
4) Observations must be independent of one another
The F test
• Balance or Unbalanced?
table(tooth$Dose); table(tooth$Supplement)
##
## 0.5 1 2
## 20 20 20
##
## OJ VC
## 30 30
One-way ANOVA example:
boxplot(Toothlength ~ Dose, data=tooth, main="distributio
n of tooth length by dose")
One-way ANOVA example:
leveneTest(aov.res)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.6457 0.5281
## 57
summary(aov.res2)
## Df Sum Sq Mean Sq F value Pr(>F)
## Dose 2 2426.4 1213.2 92.000 < 2e-16 ***
## Supplement 1 205.4 205.4 15.572 0.000231 ***
## Dose:Supplement 2 108.3 54.2 4.107 0.021860 *
## Residuals 54 712.1 13.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1
' ' 1 H0: no interaction between type and supplement
Ha: exist an interaction between type and supplement
Two-way ANOVA example:
leveneTest(aov.res2)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 5 1.7086 0.1484
## 54
par(mfrow=c(2,2))
plot(aov.res2)
Model validity
check
- Homogeneity of
variance?
- Normality?
Two-way ANOVA example:
lm.res2= lm(Toothlength ~ Dose * Supplement, data=tooth)
summary(lm.res2)$r.squared # R-square
## [1] 0.7937246
source: https://quantifyinghealth.com/stepwise-selection/
Forward selection
source: https://quantifyinghealth.com/stepwise-selection/
Backward elimination
1. Begins with a model that contains all variables
under consideration (called the Full Model)
2. Then starts removing the least significant
variables one after the other
3. Until a pre-specified stopping rule is reached – no
more variable with p-value greater than
significance level (0.05 but not necessarily)