Unit 6 Simple Regression and Corr
Unit 6 Simple Regression and Corr
REGRESSION AND
CORRELATION ANALYSIS
GOALS
OUTCOME:
• Calculate and interpret the simple correlation
between two variables
• Determine whether the correlation is significant
• Calculate and interpret the simple linear regression
equation for a set of data
• Understand the assumptions behind regression
analysis
• Determine whether a regression model is
SIMPLE LINEAR REGRESSION
Managerial decisions often are based on the
relationship between two or more variables.
Regression analysis can be used to develop an
equation showing how the variables are related.
The variable being predicted is called the dependent
variable and is denoted by y.
y y
x x
y y
x x
SCATTER PLOT EXAMPLES (continued)
Strong relationships Weak relationships
y y
x x
y y
x x
SCATTER PLOT EXAMPLES (continued)
No relationship
x
INTRODUCTION TO REGRESSION ANALYSIS
y = 0 + 1x +
where:
0 and 1 are called parameters of the model,
e is a random variable called the error term.
SIMPLE LINEAR REGRESSION
MODEL
• Only one independent variable, x
• Relationship between x and y is described by a
linear function
E(y)
Regression line
Intercept Slope 1
0
is positive
x
SIMPLE LINEAR REGRESSION EQUATION
E(y)
Intercept
0 Regression line
Slope 1
is negative
x
SIMPLE LINEAR REGRESSION EQUATION
No Relationship
E(y)
x
SIMPLE LINEAR REGRESSION EQUATION
The simple linear regression equation is:
E(y) = 0 + 1x
where:
yi = observed value of the dependent variable
for the ith observation
LEAST SQUARES METHOD
where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
observation
THE LEAST SQUARES
EQUATION
• The formulas for b1 and b0 are:
b1
( x x )( y y )
( x x ) 2
algebraic
equivalent:
b1
n xy x y
and b0
y b x 1
n x 2 ( x ) 2
n
b0 and b1 are obtained by finding the values of b0
and b1 that minimize the sum of the squared
residuals
THE ESTIMATED COEFFICIENTS
( x i x )( y i y ) ŷ b 0 b1x
b1 2
i ( x x )
b 0 y b1 x
21
SIMPLE LINEAR REGRESSION
Number of Number of
TV Ads (x) Cars Sold (y)
1 14
3 24
2 18
1 17
3 27
x = 10 y = 100
ESTIMATED REGRESSION EQUATION
• b 0 y b1x
COV (the covariance of x and y) is a measure of
the common ‘movement’ of x and y (do both
generally increase together, or move in opposite
directions) 26
THE SIMPLE LINEAR
REGRESSION LINE
• Solution
– Solving by hand: Calculate statistics. See data in Car Price
s 2x
(x x)
i
2
43.509
17,248
n 1
cov(x, y)
(x x)(y y)
i i
-2.909 ˆy b 0 b1x 17.248 .0669x
2
(x i x)
Excel provides a population covariance of -2.879. 27
The sample covariance is:
-2.879n/(n-1)=-2.879(100/(99) = -2.909
CORRELATION COEFFICIENT (continued)
y y y
x x x
r = -1 r = -.6 r=0
y y
x x
r = +.3 r = +1
CALCULATING THE
CORRELATION COEFFICIENT
Sample correlation coefficient:
r
( x x)( y y)
[ ( x x ) ][ ( y y ) ]
2 2
Tree Trunk
Height Diamete
r
y x xy y2 x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
=321 =73 =3142 =1411 =713
1
CALCULATION EXAMPLE (continued)
Tree n xy x y
Height, r
y [n( x 2 ) ( x)2 ][n( y 2 ) ( y)2 ]
70
8(3142) (73)(321)
60
50
40
[8(713) (73)2 ][8(14111) (321) 2 ]
30
0.886
20
10
0
r = 0.886 → relatively strong positive
0 2 4 6 8 10 12 14
linear association between x and y
Trunk Diameter, x
REGRESSION WITH EXCEL
• The first part of making a simple linear regression graph
in Excel is making a scatter plot.
• Lets consider the example on e-commerce site’s
pageviews and sales year 2018.
Pageviews vs Sales
Month Pageviews Sales
January 421
33.68
February 452 40.68
March 496 39.68
April 562 44.96
May 635 50.80
June 681 61.29
July 785 70.65
August 861 68.88
September 998 79.84
October 1187 94.96
November 1357 122.13
December 1521 152.10
EXCEL SOLUTION
• Here’s what you need to do to insert a scatter plot in
Excel:
• Format your data in such a way that the independent variable is
on the left column and the dependent variable on the right.
• Highlight your data.
• Find and click the ‘Scatter’ icon under the ‘Scatter’ group on
the ‘Charts’ category on the ribbon.
• Refer to the example below
EXCEL SOLUTION
• Aim for this look
Once you get the scatterplot, head over
to the main part.
To draw the regression line, add a
trendline on the chart.
Click on any of the data
points and right-click.
Select ‘Add Trendline’
EXCEL SOLUTION
After that, a window will open at the right-hand side.
‘Linear’ is the default ‘Trendline Options’. If it’s not selected, click
on it.