300b-l1 2017 CV Notes
300b-l1 2017 CV Notes
Linear Regression
• Pearson’s correla?on (r) measures the degree to which
a set of data forms a linear rela?onship
• Regression is a sta?s?cal procedure for determining
the equa?on for the straight line that best fits a set of
data
• The equa?on for the best-fiLng straight line is called
the regression equa?on
• The regression equa?on makes it possible to find the
predicted value of Ŷ (called “Y hat”) for any value of X
Ŷ = bX + a
The equa?on provides the best predic?on for Ŷ for a
value of X and it results in the least squared error
between the data points and the regression line
X Y
2 3 12
6 11 10
0 6 8
4 6 Y 6
5 7 4
2
7 12
0
5 10
0 1 2 3 4 5 6 7
3 9 X
b = SP/SSX = 36/36 = 1
or
b = r(sY/sX) = 0.75(3.02/2.27) = 1
or
b = cov/s2X = 5.14/5.14 = 1
a = MY-(b)MX = 8-(1)4 = 4
Ŷ = bX + a = (1)X + 4
Ŷ=X+4
The Standard Error of the Es?mate
• The variability in the outcome variable that is not
predicted by the regression equa?on (SSRESIDUAL) can be
used to indicate the average accuracy of the predic?on
• The standard error of the es?mate (sY-Ŷ) is the average
distance between the regression line and the actual data
(i.e., average error when using the regression line to
make predic?ons)
X Y Ŷ=X+4 (Y - Ŷ) (Y - Ŷ)2
2 3 6 -3 9
6 11 10 1 1
0 6 4 2 4
4 6 8 -2 4
5 7 9 -2 4
7 12 11 1 1
5 10 9 1 1
3 9 7 2 4
SSRESIDUAL= 28
SST
Total Variance In The Data
SSM SSR
Improvement Due to the Model Error in Model
R2
• R2 measures the propor?on of variability in Y scores that can
be accounted for by predictor variable (i.e., the variability in Y
that the regression equa?on predicts, or can account for)
• R2 = SSMODEL / SSTOTAL
• 1 - R2 measures variability in Y scores that cannot be accounted
for by the predictor variable (i.e., the variability in Y the
regression equa?on does not predict, or can account for)
model R2
R2
Significance Tes?ng and Regression
model
model
model
model
X Y
2 3
6 11
0 6
4 6
5 7
7 12
5 10
3 9
• r = 0.75, so R2 = 0.563
• H0: the regression model does not account for a
significant por?on of the variance in Y scores (i.e., R2 = 0)
• H1: the regression model accounts for a significant
propor?on of the variance in Y scores (i.e., R2 ≠ 0)
• SSMODEL = R2SSY = 0.563(64) = 36
• SSRESIDUAL = (1 - R2)SSY = 0.438(64) = 28
• N = 8, so F-cri?cal (1, 6) = 5.99 and t (6) cri?cal = 2.45
• MSMODEL = SSMODEL/dfMODEL = 36/1 = 36
• MSRESIDUAL = SSRESIDUAL/dfRESIDUAL = 28/6 = 4.67
• F = MSMODEL/MSRESIDUAL = 36/4.67 = 7.71
• Reject H0, and conclude the regression model accounts
for a significant por?on of the variance in Y scores; F
(1,6) = 7.71, p < .05; R2 = 0.56, and the slope of the
regression line is significant, so variable X is a
significant predictor of variable Y; t (6) = 2.78, p < .05
• sY-Ŷ = √(SSRESIDUAL/df) = √(28/6) = 2.16
End of Lecture