0% found this document useful (0 votes)
19 views

Unit 6 Simple Regression and Corr

This document provides an introduction to linear regression and correlation analysis. It outlines the goals of calculating and interpreting the simple correlation between two variables, determining a correlation's significance, and calculating and interpreting a simple linear regression equation. It describes the assumptions behind regression analysis and how to determine if a regression model is a good fit. Key concepts explained include the dependent and independent variables, scatter plots, correlation analysis, the simple linear regression model and equation, and the least squares method. Examples are provided to illustrate positive, negative and no relationships in regression models.

Uploaded by

Nkateko
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Unit 6 Simple Regression and Corr

This document provides an introduction to linear regression and correlation analysis. It outlines the goals of calculating and interpreting the simple correlation between two variables, determining a correlation's significance, and calculating and interpreting a simple linear regression equation. It describes the assumptions behind regression analysis and how to determine if a regression model is a good fit. Key concepts explained include the dependent and independent variables, scatter plots, correlation analysis, the simple linear regression model and equation, and the least squares method. Examples are provided to illustrate positive, negative and no relationships in regression models.

Uploaded by

Nkateko
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

INTRODUCTION TO LINEAR

REGRESSION AND
CORRELATION ANALYSIS
GOALS
OUTCOME:
• Calculate and interpret the simple correlation
between two variables
• Determine whether the correlation is significant
• Calculate and interpret the simple linear regression
equation for a set of data
• Understand the assumptions behind regression
analysis
• Determine whether a regression model is
SIMPLE LINEAR REGRESSION
Managerial decisions often are based on the
relationship between two or more variables.
Regression analysis can be used to develop an
equation showing how the variables are related.
The variable being predicted is called the dependent
variable and is denoted by y.

The variables being used to predict the value of the


dependent variable are called the independent
variables and are denoted by x.
SCATTER PLOTS AND
CORRELATION
• A scatter plot (or scatter diagram) is used to show the
relationship between two variables
• Correlation analysis is used to measure strength of the
association (linear relationship) between two variables

• Only concerned with strength of the


relationship
• No causal effect is implied
SCATTER PLOT EXAMPLES
Linear relationships Curvilinear relationships

y y

x x

y y

x x
SCATTER PLOT EXAMPLES (continued)
Strong relationships Weak relationships

y y

x x

y y

x x
SCATTER PLOT EXAMPLES (continued)
No relationship

x
INTRODUCTION TO REGRESSION ANALYSIS

Simple linear regression involves one independent


variable and one dependent variable.

The relationship between the two variables is


approximated by a straight line.

Regression analysis involving two or more


independent variables is called multiple regression.
INTRODUCTION TO
REGRESSION ANALYSIS
• Regression analysis is used to:
• Predict the value of a dependent variable based on the value of at
least one independent variable
• Explain the impact of changes in an independent variable on the
dependent variable

Dependent variable: the variable we wish to explain


Independent variable: the variable used to explain the
dependent variable
SIMPLE LINEAR REGRESSION
MODEL
The equation that describes how y is related to x and
an error term is called the regression model.
The simple linear regression model is:

y = 0 + 1x +
where:
0 and 1 are called parameters of the model,
e is a random variable called the error term.
SIMPLE LINEAR REGRESSION
MODEL
• Only one independent variable, x
• Relationship between x and y is described by a
linear function

• Changes in y are assumed to be caused by changes


in x
TYPES OF REGRESSION MODELS
Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship


SIMPLE LINEAR REGRESSION EQUATION

 Positive Linear Relationship

E(y)

Regression line

Intercept Slope 1
0
is positive

x
SIMPLE LINEAR REGRESSION EQUATION

Negative Linear Relationship

E(y)

Intercept
0 Regression line

Slope 1
is negative

x
SIMPLE LINEAR REGRESSION EQUATION

No Relationship

E(y)

Intercept Regression line


0
Slope 1
is 0

x
SIMPLE LINEAR REGRESSION EQUATION
The simple linear regression equation is:

E(y) = 0 + 1x

Graph of the regression equation is a straight line.

0 is the y intercept of the regression line.


1 is the slope of the regression line.

E(y) is the expected value of y for a given x value.


ESTIMATED SIMPLE LINEAR REGRESSION
EQUATION
The estimated simple linear regression equation
The sample regression line provides an estimate of
the population regression line

• The graph is called the estimated regression line.


• b0 is the y intercept of the line.
• b1 is the slope of the line.
LEAST SQUARES METHOD

• Least Squares Criterion

where:
yi = observed value of the dependent variable
for the ith observation
LEAST SQUARES METHOD

• Slope for the Estimated Regression Equation

where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
observation
THE LEAST SQUARES
EQUATION
• The formulas for b1 and b0 are:

b1 
 ( x  x )( y  y )
 ( x  x ) 2

algebraic
equivalent:

b1 
n xy   x y
and b0 
 y b  x 1
n  x 2  ( x ) 2
n
b0 and b1 are obtained by finding the values of b0
and b1 that minimize the sum of the squared
residuals
THE ESTIMATED COEFFICIENTS

To calculate the estimates of 0 and 1 The regression equation that estimates


that minimize the differences between the the equation of the first order linear model
data points and the line, use the formulas is:
shown below (alternative formulae are
suggested later):

 ( x i  x )( y i  y ) ŷ  b 0  b1x
b1  2
 i ( x  x )
b 0  y  b1 x

21
SIMPLE LINEAR REGRESSION

Example: Reed Auto Sales


Reed Auto periodically has a special week-long sale.
As part of the advertising campaign Reed runs one or
more television commercials during the weekend
preceding the sale. Data from a sample of 5 previous
sales are shown on the next slide.
SIMPLE LINEAR REGRESSION
Example: Reed Auto Sales

Number of Number of
TV Ads (x) Cars Sold (y)
1 14
3 24
2 18
1 17
3 27
x = 10 y = 100
ESTIMATED REGRESSION EQUATION

Slope for the Estimated Regression Equation

y-Intercept for the Estimated Regression Equation

Estimated Regression Equation


The Simple Linear Regression Line
• EXAMPLE 2
• A car dealer wants to find Dependent Independent
the relationship between variable y variable x
the odometer reading and
the selling price of 3-year old Car Price Odometer
Tauruses. 1 14.6 37.4
2 14.1 44.8
• A random sample of 100 cars is 3 14.0 45.8
selected, and the data 4 15.6 30.9
recorded. 5 15.6 31.7
6 14.7 34.0
• Find the regression line. . . .
. . .
. . . 25
THE SIMPLE LINEAR REGRESSION
LINE
• Another version of the formula for b1:
cov(x, y) cov(x,y) is also
b1  2
denoted by Sxy
Sx

• b 0  y  b1x
COV (the covariance of x and y) is a measure of
the common ‘movement’ of x and y (do both
generally increase together, or move in opposite
directions) 26
THE SIMPLE LINEAR
REGRESSION LINE
• Solution
– Solving by hand: Calculate statistics. See data in Car Price

x  36.01; cov(X, Y)  2.909


b1  2
  .0669
y  14.841; sx 43.509
where n = 100. b 0  y  b1x  14,822.82  ( .0669)(36.011)

s 2x 
 (x  x)
i
2
 43.509
 17,248
n 1

cov(x, y) 
 (x  x)(y  y)
i i
 -2.909 ˆy  b 0  b1x  17.248  .0669x
2
(x i  x)
Excel provides a population covariance of -2.879. 27
The sample covariance is:
-2.879n/(n-1)=-2.879(100/(99) = -2.909
CORRELATION COEFFICIENT (continued)

• The population correlation coefficient ρ (rho) measures


the strength of the association between the variables

• The sample correlation coefficient r is an estimate of ρ


and is used to measure the strength of the linear
relationship in the sample observations
FEATURES OF ΡAND R
• Unit free
• Range between -1 and 1
• The closer to -1, the stronger the negative linear
relationship
• The closer to 1, the stronger the positive linear relationship
• The closer to 0, the weaker the linear relationship
EXAMPLES OF APPROXIMATE
R VALUES

y y y

x x x
r = -1 r = -.6 r=0
y y

x x
r = +.3 r = +1
CALCULATING THE
CORRELATION COEFFICIENT
Sample correlation coefficient:

r
 ( x  x)( y  y)
[ ( x  x ) ][  ( y  y ) ]
2 2

or the algebraic equivalent:


n xy   x  y
r
[n(  x 2 )  (  x )2 ][n(  y 2 )  (  y )2 ]
where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
CALCULATION EXAMPLE

Tree Trunk
Height Diamete
r
y x xy y2 x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
=321 =73 =3142 =1411 =713
1
CALCULATION EXAMPLE (continued)

Tree n xy   x  y
Height, r
y [n(  x 2 )  (  x)2 ][n(  y 2 )  (  y)2 ]
70

8(3142)  (73)(321)
60

50 
40
[8(713)  (73)2 ][8(14111)  (321) 2 ]
30
 0.886
20

10

0
r = 0.886 → relatively strong positive
0 2 4 6 8 10 12 14
linear association between x and y
Trunk Diameter, x
REGRESSION WITH EXCEL
• The first part of making a simple linear regression graph
in Excel is making a scatter plot.
• Lets consider the example on e-commerce site’s
pageviews and sales year 2018.
Pageviews vs Sales
Month Pageviews Sales

January 421
33.68
February 452 40.68
March 496 39.68
April 562 44.96
May 635 50.80
June 681 61.29
July 785 70.65
August 861 68.88
September 998 79.84
October 1187 94.96
November 1357 122.13
December 1521 152.10
EXCEL SOLUTION
• Here’s what you need to do to insert a scatter plot in
Excel:
• Format your data in such a way that the independent variable is
on the left column and the dependent variable on the right.
• Highlight your data.
• Find and click the ‘Scatter’ icon under the ‘Scatter’ group on
the ‘Charts’ category on the ribbon.
• Refer to the example below
EXCEL SOLUTION
• Aim for this look
 Once you get the scatterplot, head over
to the main part.
 To draw the regression line, add a
trendline on the chart.
 Click on any of the data
points and right-click. 
 Select ‘Add Trendline’
EXCEL SOLUTION
After that, a window will open at the right-hand side.
‘Linear’ is the default ‘Trendline Options’. If it’s not selected, click
on it.

Also show the equation on the chart by


ticking the ‘Display Equation on chart’ box.

The equation will look like the picture below


EXCEL SOLUTION
Adding a Trendline, just right-click on it and select ‘Format
Trendline’.
HOW TO INTERPRET THE
RESULTS
• Primarily, what you’re looking in a simple linear regression is the correlation
between the variables. Fortunately, in Excel, the trendline does it all for you.
• The trendline will tell you if the relationship of your variables is positive or
negative.
• Positive: If the line shows an upward trend. This indicates that as the
independent variable increases, the dependent variable also increases. The
same with our example, as the pageviews increase, we can expect to see a
rise in sales as well.
• Negative: If the line shows a downward trend. This suggests that as the
independent variable increases, the dependent variable decreases.
• None at all: This is easy to spot. There is no correlation between the variables
(therefore, no way to predict the next values) when the points in the scatter plot
don’t resemble a line as they are scattered. You can still see a line if you add a
trendline no matter how random the points are, but the line is usually close to a
horizontal line.

You might also like