0% found this document useful (0 votes)
25 views11 pages

300b-l1 2017 CV Notes

The document discusses linear regression, including how to calculate the regression equation and use it to predict outcomes. It explains how to assess how well the regression model fits the data, including calculating R2 and using ANOVA. It also provides an example of performing a linear regression analysis on a dataset.

Uploaded by

ghania azhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views11 pages

300b-l1 2017 CV Notes

The document discusses linear regression, including how to calculate the regression equation and use it to predict outcomes. It explains how to assess how well the regression model fits the data, including calculating R2 and using ANOVA. It also provides an example of performing a linear regression analysis on a dataset.

Uploaded by

ghania azhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Regression

PSYC 300B - Lecture 1


Dr. J. Nicol

Linear Regression
• Pearson’s correla?on (r) measures the degree to which
a set of data forms a linear rela?onship
• Regression is a sta?s?cal procedure for determining
the equa?on for the straight line that best fits a set of
data
• The equa?on for the best-fiLng straight line is called
the regression equa?on
• The regression equa?on makes it possible to find the
predicted value of Ŷ (called “Y hat”) for any value of X

Ŷ = bX + a
The equa?on provides the best predic?on for Ŷ for a
value of X and it results in the least squared error
between the data points and the regression line

The Linear Equa?on

• Slope (b) - determines the direc?on and degree to


which the best-fiLng straight line is ?lted
• Y-intercept (a) - determines the point where the best-
fiLng straight line crosses the Y-axis
• The regression equa?on provides the best predic?on
for a value of Ŷ for a given value of X
• e.g., for Y = 2X - 7, if X = 3 then Ŷ = 2(3) - 7 = -1
The regression equa?on for the straight line that best-
fits the data produces the least sum of squared errors
between the line and the actual data

X Y

2 3 12

6 11 10

0 6 8

4 6 Y 6

5 7 4

2
7 12
0
5 10
0 1 2 3 4 5 6 7
3 9 X

X (X-MX) (X-MX)2 Y (Y-MY) (Y-MY)2


2 -2 4 3 -5 25
6 2 4 11 3 9
0 -4 16 6 -2 4
4 0 0 6 -2 4
5 1 1 7 -1 1
7 3 9 12 4 16
5 1 1 10 2 4
3 -1 1 9 1 1
∑X = 32 SSX = 36 ∑Y = 64 SSY = 64
MX = 4 MY = 8
2
sX = SSX/N-1 = 36/7 = 5.14 sY2 = SSY/N-1 = 64/7 = 9.14
sX =√5.14 = 2.27 sY =√9.14 = 3.02
(X-MX) (Y-MY) (X-MX)(Y-MY)
-2 -5 10
2 3 6
-4 -2 8
0 -2 0
1 -1 -1
3 4 12
1 2 2
-1 1 -1
SP = 36

Cov = SP/N-1 = 36/7 = 5.14


r = Cov/sXsY = 5.14/(2.27)(3.02) = 0.75

b = SP/SSX = 36/36 = 1
or
b = r(sY/sX) = 0.75(3.02/2.27) = 1
or
b = cov/s2X = 5.14/5.14 = 1

a = MY-(b)MX = 8-(1)4 = 4

Ŷ = bX + a = (1)X + 4

Ŷ=X+4
The Standard Error of the Es?mate
• The variability in the outcome variable that is not
predicted by the regression equa?on (SSRESIDUAL) can be
used to indicate the average accuracy of the predic?on
• The standard error of the es?mate (sY-Ŷ) is the average
distance between the regression line and the actual data
(i.e., average error when using the regression line to
make predic?ons)

X Y Ŷ=X+4 (Y - Ŷ) (Y - Ŷ)2
2 3 6 -3 9
6 11 10 1 1
0 6 4 2 4
4 6 8 -2 4
5 7 9 -2 4
7 12 11 1 1
5 10 9 1 1
3 9 7 2 4
SSRESIDUAL= 28

sY-Ŷ = √(∑(Y-Ŷ)2/df) = √(SSRESIDUAL/N-2) = √(28/6) = 2.16

Assessing the Goodness of Fit


• The mean of the outcome variable is a model of “no
rela?onship” between the predictor and outcome variables
• SSTOTAL represents how good the mean is as a model of the
observed data
• SSRESIDUAL represents the degree of inaccuracy when the best
model is fiied to the observed data
• We can use these two values to calculate how much beier the
regression model is than the baseline model (i.e., the mean)
• The improvement in predic?on resul?ng from using the
regression model rather than the mean is determined by
calcula?ng the difference between SSTOTAL and SSRESIDUAL
Assessing the Goodness of Fit
• SSMODEL (i.e., SSTOTAL - SSRESIDUAL) shows the reduc?on in
the inaccuracy of the model that comes from fiLng the
regression model to the data
• If the value of SSMODEL is large it means the regression
model represents a big improvement to how well the
outcome variable can be predicted
• If the value of SSMODEL is small it means the regression
model is not much beier than the baseline model for
making predic?ons about the outcome variable

When the model results in beier predic?on than using the


baseline model, then SSMODEL is much greater than SSRESIDUAL

SST
Total Variance In The Data

SSM SSR
Improvement Due to the Model Error in Model

R2
• R2 measures the propor?on of variability in Y scores that can
be accounted for by predictor variable (i.e., the variability in Y
that the regression equa?on predicts, or can account for)
• R2 = SSMODEL / SSTOTAL
• 1 - R2 measures variability in Y scores that cannot be accounted
for by the predictor variable (i.e., the variability in Y the
regression equa?on does not predict, or can account for)

model R2

R2
Significance Tes?ng and Regression

• The overall significance of the regression model is


evaluated by compu?ng the F-ra?o
• A significant F-ra?o indicates that the model predicts a
significant por?on of the variability in the Y scores (i.e.,
more than would be expected by chance alone)
• To compute the F-ra?o, we first calculate a variance,
called a mean square (MS), for the predicted variability
and for the unpredicted variability

Analysis of Variance (ANOVA)

model
model
model

Analysis of Variance (ANOVA)

model
X Y

2 3

6 11

0 6

4 6

5 7

7 12

5 10

3 9

• Compute R2 (i.e., the propor?on of the variance in Y


scores that can predicted from variable X) for the
rela?onship
• Determine if the regression model accounts for a
significant (α = 0.05, two-tailed) por?on of the variance
in Y scores, and if variable X is a significant predictor of
variable Y
• Compute the standard error of the es?mate (sY-Ŷ)

• r = 0.75, so R2 = 0.563
• H0: the regression model does not account for a
significant por?on of the variance in Y scores (i.e., R2 = 0)
• H1: the regression model accounts for a significant
propor?on of the variance in Y scores (i.e., R2 ≠ 0)
• SSMODEL = R2SSY = 0.563(64) = 36
• SSRESIDUAL = (1 - R2)SSY = 0.438(64) = 28
• N = 8, so F-cri?cal (1, 6) = 5.99 and t (6) cri?cal = 2.45
• MSMODEL = SSMODEL/dfMODEL = 36/1 = 36
• MSRESIDUAL = SSRESIDUAL/dfRESIDUAL = 28/6 = 4.67
• F = MSMODEL/MSRESIDUAL = 36/4.67 = 7.71
• Reject H0, and conclude the regression model accounts
for a significant por?on of the variance in Y scores; F
(1,6) = 7.71, p < .05; R2 = 0.56, and the slope of the
regression line is significant, so variable X is a
significant predictor of variable Y; t (6) = 2.78, p < .05
• sY-Ŷ = √(SSRESIDUAL/df) = √(28/6) = 2.16

Double the one-tailed p-value for a two-tailed test (i.e., p = .032)


• A professor obtains SAT scores and first-year GPAs for a
sample of N = 15 students. The SAT scores have a MSAT
= 580 with SSSAT = 22,400 and the GPAs have a MGPA =
3.10 with SSGPA = 1.26, and SP = 84
• Find the regression equa?on for predic?ng GPA from
SAT scores
• Compute R2 for the rela?onship
• Determine if the regression model accounts for a
significant (α = 0.05, two-tailed) por?on of the variance
in GPA and if SAT scores are a significant predictor of
GPA
• Compute the standard error of the es?mate (sY-Ŷ)

• b = SP/(SSSAT) = 84/(22,400) = 0.00375


• a = MGPA - (b)MSAT = 3.10-(0.00375)(580) = 0.925
• Ŷ = bX + a = 0.00375X + 0.925

• s2SAT = 22,400/14 = 1600


• sSAT = √1600 = 40
• s2GPA = 1.26/14 = 0.09
• sGPA = √0.09 = 0.3
• Cov = SP/N-1 = 84/14 = 6

• r = Cov/(sSAT)(sGPA) = 6/(40)(0.3) = 0.5


2 2
• R = 0.5 = 0.25

• H0: the regression model, with SAT scores as a predictor, does


not account for a significant por?on of the variance in GPA
(i.e., R2 = 0)
• H1: the regression model, with SAT scores as a predictor,
accounts for a significant por?on of the variance in GPA
(i.e., R2 ≠ 0)
• SSMODEL = R2SSSAT = (0.25)(1.26) = 0.315
• SSRESIDUAL = (1 - R2)SSSAT = (0.75)(1.26) = 0.945
• N = 15, so F-cri?cal (1, 13) = 4.67 and t (13) cri?cal = 2.16
• MSMODEL = SSMODEL/dfMODEL= 0.315/1 = 0.315
• MSRESIDUAL = SSRESIDUAL/dfRESIDUAL = 0.945/13= 0.073
• F = MSMODEL/MSRESIDUAL = 0.315/0.073 = 4.32
• t = √F = √4.32 = 2.08
• Fail to reject H0, the regression model does not
account for a significant por?on of the variance in GPA
scores; F(1,13) = 4.32, ns; R2 = 0.25, and the slope of
the regression line is not significant, so SAT scores do
not predict GPA; t (13) = 2.08, ns
• sY-Ŷ = √(SSRESIDUAL/df) = √(0.945/13) = 0.27

End of Lecture

You might also like