0% found this document useful (0 votes)
18 views10 pages

Wic 5 MLR & Anova

This document discusses using linear regression and ANOVA to analyze golf distance data from three different datasets (D1, D2, D3). Linear models are created to test the effects of brand and golfer on distance. The ANOVA tables show that for both D1 and D2, there are no significant effects of brand or golfer on distance as all p-values are greater than 0.05. Residual plots for the linear models indicate the assumptions of linear regression are reasonably met.

Uploaded by

hekmat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views10 pages

Wic 5 MLR & Anova

This document discusses using linear regression and ANOVA to analyze golf distance data from three different datasets (D1, D2, D3). Linear models are created to test the effects of brand and golfer on distance. The ANOVA tables show that for both D1 and D2, there are no significant effects of brand or golfer on distance as all p-values are greater than 0.05. Residual plots for the linear models indicate the assumptions of linear regression are reasonably met.

Uploaded by

hekmat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

WIC 5: MLR & ANOVA

Sean Hekmat

2023-11-28

D1 <- read.csv("~/Desktop/ANOVA Golf Example 1.csv")


D2 <- read.csv("~/Desktop/ANOVA Golf Example 2.csv")
D3 <- read.csv("~/Desktop/Fresh-Demand.csv")

head(D1)

## Brand Distance
## 1 A 250.8
## 2 A 250.0
## 3 A 235.5
## 4 A 255.4
## 5 A 248.7
## 6 A 241.8

head(D2)

## Golfer Brand Dist


## 1 G1 A 248.73
## 2 G1 A 258.74
## 3 G1 A 223.65
## 4 G2 A 240.30
## 5 G2 A 258.00
## 6 G2 A 272.71

head(D3)

## x1 x2 y
## 1 3.85 3.80 7.38
## 2 3.75 4.00 7.51
## 3 3.70 4.30 9.52
## 4 3.70 3.70 7.50
## 5 3.60 3.85 9.33
## 6 3.60 3.80 8.28

lm1 <- lm(Distance ~ Brand, data = D1)


summary(lm1)

##
## Call:

1
## lm(formula = Distance ~ Brand, data = D1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.475 -11.896 -2.518 8.708 36.782
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 252.817 5.160 48.997 <2e-16 ***
## BrandB -2.608 7.297 -0.357 0.723
## BrandC -7.742 7.297 -1.061 0.295
## BrandD -3.598 7.461 -0.482 0.632
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 17.87 on 43 degrees of freedom
## Multiple R-squared: 0.0264, Adjusted R-squared: -0.04152
## F-statistic: 0.3887 on 3 and 43 DF, p-value: 0.7617

a1 <- aov(Distance ~ Brand, data = D1)


summary(a1)

## Df Sum Sq Mean Sq F value Pr(>F)


## Brand 3 373 124.2 0.389 0.762
## Residuals 43 13738 319.5

Above the ANOVA table shows that: there is no Brand effect on Distance, i.e. the mean distances obtained
from the 4 brands are all equal (since all P-values > 0.05).

library(ggfortify)

## Loading required package: ggplot2

autoplot(lm1, label.size = 3)

2
Residuals vs Fitted Normal Q−Q
40

Standardized residuals
26 42 42
15 2 1526
20
Residuals

0 0

−20 −1

−2
245 247 249 251 253 −2 −1 0 1 2
Fitted values Theoretical Quantiles

Scale−Location Residuals vs Leverage


Standardized residuals

Standardized Residuals
1.5 26 42 26 42
15 2 15

1
1.0
0

0.5 −1

−2
245 247 249 251 253 0.000 0.025 0.050 0.075
Fitted values Leverage

autoplot(a1, label.size = 3)

3
Residuals vs Fitted Normal Q−Q
40

Standardized residuals
26 42 42
15 2 1526
20
Residuals

0 0

−20 −1

−2
245 247 249 251 253 −2 −1 0 1 2
Fitted values Theoretical Quantiles

Scale−Location Residuals vs Leverage


Standardized residuals

Standardized Residuals
1.5 26 42 26 42
15 2 15

1
1.0
0

0.5 −1

−2
245 247 249 251 253 0.000 0.025 0.050 0.075
Fitted values Leverage

lm2 <- lm(Dist ~ Brand + Golfer, data = D2)


summary(lm2)

##
## Call:
## lm(formula = Dist ~ Brand + Golfer, data = D2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.153 -11.194 -0.059 11.022 40.171
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 241.5561 7.4523 32.414 <2e-16 ***
## BrandB -11.6933 8.6052 -1.359 0.1843
## BrandC -0.2344 8.6052 -0.027 0.9784
## BrandD -2.0833 8.6052 -0.242 0.8103
## GolferG2 13.7875 7.4523 1.850 0.0742 .
## GolferG3 5.6542 7.4523 0.759 0.4539
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 18.25 on 30 degrees of freedom
## Multiple R-squared: 0.1654, Adjusted R-squared: 0.02632
## F-statistic: 1.189 on 5 and 30 DF, p-value: 0.3379

4
a2 <- aov(Dist ~ Golfer + Brand, data=D2)
summary(a2)

## Df Sum Sq Mean Sq F value Pr(>F)


## Golfer 2 1153 576.4 1.730 0.195
## Brand 3 828 276.2 0.829 0.489
## Residuals 30 9997 333.2

Above ANOVA table shows that: (i) there is no Brand effect on Distance, i.e. the mean distances obtained
from the 4 brands are all equal (since all P-values > 0.05), and (ii) there is no Golfer effect since P-values
for Golfers are all > 0.05.

autoplot(lm2)

Residuals vs Fitted Normal Q−Q

Standardized residuals
40 22 22
29 2 29
20
Residuals

0 0

−20 −1

11 −2 11
230 240 250 −2 −1 0 1 2
Fitted values Theoretical Quantiles

Scale−Location Constant Leverage:


Standardized residuals

1.6 22 Residuals vs Factor Levels


Standardized Residuals

11 29
22
1.2 2 29
1
0.8
0

0.4 −1

−2 11
230 240 250 A:G1
A:G2
A:G3
B:G1
B:G2
B:G3
C:G1
C:G2
C:G3
D:G1
D:G2
D:G3
Fitted values Factor Level Combination

autoplot(a2)

5
Residuals vs Fitted Normal Q−Q

Standardized residuals
40 22 22
29 2 29
20
Residuals

0 0

−20 −1

11 −2 11
230 240 250 −2 −1 0 1 2
Fitted values Theoretical Quantiles

Scale−Location Constant Leverage:


Standardized residuals

1.6 22 Residuals vs Factor Levels

Standardized Residuals
11 29
22
1.2 2 29
1
0.8
0

0.4 −1

−2 11
230 240 250 G1:A
G1:B
G1:C
G1:D
G2:A
G2:B
G2:C
G2:D
G3:A
G3:B
G3:C
G3:D
Fitted values Factor Level Combination

lm3 <- lm(y~x1+x2, data = D3)


summary(lm3)

##
## Call:
## lm(formula = y ~ x1 + x2, data = D3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.91839 -0.10915 0.02283 0.16094 0.58057
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.7402 4.0586 3.878 0.002195 **
## x1 -4.9976 1.0379 -4.815 0.000423 ***
## x2 2.8573 0.5495 5.200 0.000222 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 0.3872 on 12 degrees of freedom
## Multiple R-squared: 0.7772, Adjusted R-squared: 0.74
## F-statistic: 20.93 on 2 and 12 DF, p-value: 0.0001224

a3 <- aov(y~x1+x2, data = D3)


summary(a3)

6
## Df Sum Sq Mean Sq F value Pr(>F)
## x1 1 2.221 2.221 14.82 0.002313 **
## x2 1 4.053 4.053 27.04 0.000222 ***
## Residuals 12 1.799 0.150
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

autoplot(lm3)

Residuals vs Fitted Normal Q−Q

Standardized residuals
0.5
5 5
12 12
1
Residuals

0.0 0

−1
−0.5
−2
2 2
7.5 8.0 8.5 9.0 9.5 −2 −1 0 1 2
Fitted values Theoretical Quantiles

Scale−Location Residuals vs Leverage


Standardized residuals

Standardized Residuals

1.6 2 5
12
5 1
1.2 12
0
0.8
−1

0.4 −2
2
7.5 8.0 8.5 9.0 9.5 0.0 0.1 0.2 0.3
Fitted values Leverage

autoplot(a3)

7
Residuals vs Fitted Normal Q−Q

Standardized residuals
0.5
5 5
12 12
1
Residuals

0.0 0

−1
−0.5
−2
2 2
7.5 8.0 8.5 9.0 9.5 −2 −1 0 1 2
Fitted values Theoretical Quantiles

Scale−Location Residuals vs Leverage


Standardized residuals

Standardized Residuals
1.6 2 5
12
5 1
1.2 12
0
0.8
−1

0.4 −2
2
7.5 8.0 8.5 9.0 9.5 0.0 0.1 0.2 0.3
Fitted values Leverage

lm4 <- lm(y~x1-x2, data = D3)


summary(lm4)

##
## Call:
## lm(formula = y ~ x1 - x2, data = D3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9803 -0.5438 0.1154 0.5255 1.0483
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.953 6.610 3.473 0.00413 **
## x1 -3.914 1.762 -2.221 0.04472 *
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 0.6709 on 13 degrees of freedom
## Multiple R-squared: 0.2751, Adjusted R-squared: 0.2194
## F-statistic: 4.934 on 1 and 13 DF, p-value: 0.04472

a4 <- aov(y~x1-x2, data = D3)


summary(a4)

8
## Df Sum Sq Mean Sq F value Pr(>F)
## x1 1 2.221 2.2211 4.934 0.0447 *
## Residuals 13 5.852 0.4502
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

autoplot(lm4)

Residuals vs Fitted Normal Q−Q

Standardized residuals
1.0 3 3
1
0.5
Residuals

0.0 0

−0.5
−1
−1.0 9 4 9 4
8.0 8.5 −2 −1 0 1 2
Fitted values Theoretical Quantiles

Scale−Location Residuals vs Leverage


Standardized residuals

Standardized Residuals

9 3
4 3
1.2
1
1.0

0
0.8

0.6 −1 6
0.4 9
8.0 8.5 0.00 0.05 0.10 0.15 0.20
Fitted values Leverage

autoplot(a4)

9
Residuals vs Fitted Normal Q−Q

Standardized residuals
1.0 3 3
1
0.5
Residuals

0.0 0

−0.5
−1
−1.0 9 4 9 4
8.0 8.5 −2 −1 0 1 2
Fitted values Theoretical Quantiles

Scale−Location Residuals vs Leverage


Standardized residuals

Standardized Residuals
9 3
4 3
1.2
1
1.0

0
0.8

0.6 −1 6
0.4 9
8.0 8.5 0.00 0.05 0.10 0.15 0.20
Fitted values Leverage
Since R-square for the model lm3 (0.78) is much higher than that for lm4, we reccommend Model lm3. The
normality of residuals for all the linear models can be assessed as shown in class.

10

You might also like