Wic 5 MLR & Anova
Wic 5 MLR & Anova
Sean Hekmat
2023-11-28
head(D1)
## Brand Distance
## 1 A 250.8
## 2 A 250.0
## 3 A 235.5
## 4 A 255.4
## 5 A 248.7
## 6 A 241.8
head(D2)
head(D3)
## x1 x2 y
## 1 3.85 3.80 7.38
## 2 3.75 4.00 7.51
## 3 3.70 4.30 9.52
## 4 3.70 3.70 7.50
## 5 3.60 3.85 9.33
## 6 3.60 3.80 8.28
##
## Call:
1
## lm(formula = Distance ~ Brand, data = D1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.475 -11.896 -2.518 8.708 36.782
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 252.817 5.160 48.997 <2e-16 ***
## BrandB -2.608 7.297 -0.357 0.723
## BrandC -7.742 7.297 -1.061 0.295
## BrandD -3.598 7.461 -0.482 0.632
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 17.87 on 43 degrees of freedom
## Multiple R-squared: 0.0264, Adjusted R-squared: -0.04152
## F-statistic: 0.3887 on 3 and 43 DF, p-value: 0.7617
Above the ANOVA table shows that: there is no Brand effect on Distance, i.e. the mean distances obtained
from the 4 brands are all equal (since all P-values > 0.05).
library(ggfortify)
autoplot(lm1, label.size = 3)
2
Residuals vs Fitted Normal Q−Q
40
Standardized residuals
26 42 42
15 2 1526
20
Residuals
0 0
−20 −1
−2
245 247 249 251 253 −2 −1 0 1 2
Fitted values Theoretical Quantiles
Standardized Residuals
1.5 26 42 26 42
15 2 15
1
1.0
0
0.5 −1
−2
245 247 249 251 253 0.000 0.025 0.050 0.075
Fitted values Leverage
autoplot(a1, label.size = 3)
3
Residuals vs Fitted Normal Q−Q
40
Standardized residuals
26 42 42
15 2 1526
20
Residuals
0 0
−20 −1
−2
245 247 249 251 253 −2 −1 0 1 2
Fitted values Theoretical Quantiles
Standardized Residuals
1.5 26 42 26 42
15 2 15
1
1.0
0
0.5 −1
−2
245 247 249 251 253 0.000 0.025 0.050 0.075
Fitted values Leverage
##
## Call:
## lm(formula = Dist ~ Brand + Golfer, data = D2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.153 -11.194 -0.059 11.022 40.171
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 241.5561 7.4523 32.414 <2e-16 ***
## BrandB -11.6933 8.6052 -1.359 0.1843
## BrandC -0.2344 8.6052 -0.027 0.9784
## BrandD -2.0833 8.6052 -0.242 0.8103
## GolferG2 13.7875 7.4523 1.850 0.0742 .
## GolferG3 5.6542 7.4523 0.759 0.4539
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 18.25 on 30 degrees of freedom
## Multiple R-squared: 0.1654, Adjusted R-squared: 0.02632
## F-statistic: 1.189 on 5 and 30 DF, p-value: 0.3379
4
a2 <- aov(Dist ~ Golfer + Brand, data=D2)
summary(a2)
Above ANOVA table shows that: (i) there is no Brand effect on Distance, i.e. the mean distances obtained
from the 4 brands are all equal (since all P-values > 0.05), and (ii) there is no Golfer effect since P-values
for Golfers are all > 0.05.
autoplot(lm2)
Standardized residuals
40 22 22
29 2 29
20
Residuals
0 0
−20 −1
11 −2 11
230 240 250 −2 −1 0 1 2
Fitted values Theoretical Quantiles
11 29
22
1.2 2 29
1
0.8
0
0.4 −1
−2 11
230 240 250 A:G1
A:G2
A:G3
B:G1
B:G2
B:G3
C:G1
C:G2
C:G3
D:G1
D:G2
D:G3
Fitted values Factor Level Combination
autoplot(a2)
5
Residuals vs Fitted Normal Q−Q
Standardized residuals
40 22 22
29 2 29
20
Residuals
0 0
−20 −1
11 −2 11
230 240 250 −2 −1 0 1 2
Fitted values Theoretical Quantiles
Standardized Residuals
11 29
22
1.2 2 29
1
0.8
0
0.4 −1
−2 11
230 240 250 G1:A
G1:B
G1:C
G1:D
G2:A
G2:B
G2:C
G2:D
G3:A
G3:B
G3:C
G3:D
Fitted values Factor Level Combination
##
## Call:
## lm(formula = y ~ x1 + x2, data = D3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.91839 -0.10915 0.02283 0.16094 0.58057
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.7402 4.0586 3.878 0.002195 **
## x1 -4.9976 1.0379 -4.815 0.000423 ***
## x2 2.8573 0.5495 5.200 0.000222 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 0.3872 on 12 degrees of freedom
## Multiple R-squared: 0.7772, Adjusted R-squared: 0.74
## F-statistic: 20.93 on 2 and 12 DF, p-value: 0.0001224
6
## Df Sum Sq Mean Sq F value Pr(>F)
## x1 1 2.221 2.221 14.82 0.002313 **
## x2 1 4.053 4.053 27.04 0.000222 ***
## Residuals 12 1.799 0.150
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
autoplot(lm3)
Standardized residuals
0.5
5 5
12 12
1
Residuals
0.0 0
−1
−0.5
−2
2 2
7.5 8.0 8.5 9.0 9.5 −2 −1 0 1 2
Fitted values Theoretical Quantiles
Standardized Residuals
1.6 2 5
12
5 1
1.2 12
0
0.8
−1
0.4 −2
2
7.5 8.0 8.5 9.0 9.5 0.0 0.1 0.2 0.3
Fitted values Leverage
autoplot(a3)
7
Residuals vs Fitted Normal Q−Q
Standardized residuals
0.5
5 5
12 12
1
Residuals
0.0 0
−1
−0.5
−2
2 2
7.5 8.0 8.5 9.0 9.5 −2 −1 0 1 2
Fitted values Theoretical Quantiles
Standardized Residuals
1.6 2 5
12
5 1
1.2 12
0
0.8
−1
0.4 −2
2
7.5 8.0 8.5 9.0 9.5 0.0 0.1 0.2 0.3
Fitted values Leverage
##
## Call:
## lm(formula = y ~ x1 - x2, data = D3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9803 -0.5438 0.1154 0.5255 1.0483
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.953 6.610 3.473 0.00413 **
## x1 -3.914 1.762 -2.221 0.04472 *
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 0.6709 on 13 degrees of freedom
## Multiple R-squared: 0.2751, Adjusted R-squared: 0.2194
## F-statistic: 4.934 on 1 and 13 DF, p-value: 0.04472
8
## Df Sum Sq Mean Sq F value Pr(>F)
## x1 1 2.221 2.2211 4.934 0.0447 *
## Residuals 13 5.852 0.4502
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
autoplot(lm4)
Standardized residuals
1.0 3 3
1
0.5
Residuals
0.0 0
−0.5
−1
−1.0 9 4 9 4
8.0 8.5 −2 −1 0 1 2
Fitted values Theoretical Quantiles
Standardized Residuals
9 3
4 3
1.2
1
1.0
0
0.8
0.6 −1 6
0.4 9
8.0 8.5 0.00 0.05 0.10 0.15 0.20
Fitted values Leverage
autoplot(a4)
9
Residuals vs Fitted Normal Q−Q
Standardized residuals
1.0 3 3
1
0.5
Residuals
0.0 0
−0.5
−1
−1.0 9 4 9 4
8.0 8.5 −2 −1 0 1 2
Fitted values Theoretical Quantiles
Standardized Residuals
9 3
4 3
1.2
1
1.0
0
0.8
0.6 −1 6
0.4 9
8.0 8.5 0.00 0.05 0.10 0.15 0.20
Fitted values Leverage
Since R-square for the model lm3 (0.78) is much higher than that for lm4, we reccommend Model lm3. The
normality of residuals for all the linear models can be assessed as shown in class.
10