0% found this document useful (0 votes)
2 views

STA200 - Lab Session - Chapter 14

Chapter 14 covers Simple Linear Regression, highlighting its appropriateness for predicting one variable based on another with a linear relationship using one independent variable. It outlines key assumptions for reliable regression results, including linearity, independence, homoscedasticity, and normality of errors. The chapter also includes case studies demonstrating regression analysis with practical examples and Excel steps for conducting regression and checking assumptions.

Uploaded by

xzelex22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

STA200 - Lab Session - Chapter 14

Chapter 14 covers Simple Linear Regression, highlighting its appropriateness for predicting one variable based on another with a linear relationship using one independent variable. It outlines key assumptions for reliable regression results, including linearity, independence, homoscedasticity, and normality of errors. The chapter also includes case studies demonstrating regression analysis with practical examples and Excel steps for conducting regression and checking assumptions.

Uploaded by

xzelex22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Chapter 14:

Simple Linear Regression


Lab Session
www.aum.edu.kw
When is Simple Linear Regression
Appropriate?
Simple linear regression is useful when:
● You want to predict one variable based on another (e.g.,
profit from investment).
● You believe there is a linear relationship between the two
variables.
● The predictor variable is continuous (e.g., hours of employee
training, advertising spend).
● Only one independent variable is used.

www.aum.edu.kw
Assumptions of the Regression Model
For the regression results to be reliable, four key assumptions must be satisfied:
● Linearity: The relationship between the dependent and independent variable
must be linear.
● Independence: The residuals (errors) are independent of each other.
● Homoscedasticity: The variance of residuals is constant across all levels of the
independent variable.
● Normality of Errors: The residuals should be approximately normally distributed.
Notes:
● ෡𝑖 (Observed value – predicted value)
Residuals (errors) = 𝑌𝑖 − 𝑌
● Violation of these assumptions can lead to biased or misleading results.
www.aum.edu.kw
Possible Regression Lines in Simple Linear Regression

www.aum.edu.kw 4
Key Points
Regression Equation: Y = β1 X + β0
● 𝐑² is the Coefficient of Determination which represents the % of the variation in Y
that is explained by X.
● 𝐑 is the correlation coefficient which measures the strength and direction of the
linear relationship between the independent variable X and the dependent variable
Y
● 𝛃𝟏 : The coefficient of X
● 𝛃𝟎 : The coefficient of the intercepts
● 𝐘𝐢 is the observed value of Y
● 𝐘෡𝐢 is the predicted value
● 𝐗 𝐢 is the observed value of X
● p-value < 𝟎. 𝟎𝟓 → statistically significant → there is evidence of a linear
relationship. www.aum.edu.kw
www.aum.edu.kw 6
Excel Steps for Regression Assumptions and Output

1) For Regression Output :


Run the Regression Using Data Analysis ToolPak
1. Go to Data → Data Analysis (If not available, enable it via Excel Options →
Add-ins → ToolPak).
2. Select Regression and click OK.
3. Fill in the fields:
■ Input Y Range: Select your Y Data

■ Input X Range: Select your X Data

■ Check Labels (if you included headers)

■ Select Output Range (or New Worksheet)

www.aum.edu.kw
2) For Regression Assumptions:
a) Linearity:
1. Create a scatterplot of the observed Y values against the X values.
2. Add a trendline: Right-click a data point → Add Trendline → Choose "Linear“.

b) Independence:
1. After running regression, calculate residuals:
○ Add a column for Predicted Y using the regression equation.
○ Subtract Predicted Y from Actual Y.
○ Residual = Actual Y – Predicted Y
2. Create a scatterplot of residuals vs. observation order:
○ X-axis = Observation number (Add a new column called "Observation Order" (1 to n)).
○ Y-axis = Residual www.aum.edu.kw 8
2) For Regression Assumptions:
c) Homoscedasticity:
1. After running regression, calculate residuals:
○ Add a column for Predicted Y using the regression equation.
○ Subtract Predicted Y from Actual Y.
○ Residual = Actual Y – Predicted Y
2. Create a scatterplot of residuals vs. predicted Y values:
○ X-axis = Predicted Y
○ Y-axis = Residual

www.aum.edu.kw 9
2) For Regression Assumptions:
d) Normality of Residuals:
Create a histogram of residuals:
1. After running regression, calculate residuals:
○ Add a column for Predicted Y using the regression equation.
○ Subtract Predicted Y from Actual Y.
○ Residual = Actual Y – Predicted Y
2. Click anywhere in the residual column (or select the full range, e.g., C2:C36).
3. Go to the top menu and click:
○ Insert tab → In the Charts group → click on the Histogram icon (it's under the bar
chart dropdown).
4. Select “Histogram” from the options.
www.aum.edu.kw 10
Case Study 1:
Discount % vs. Weekly Profit
A retail business is analyzing how increasing discount percentages
affect its weekly profit.
Data for 35 observations:
Use the following data set to:
1) Comment on the regression
output
2) Write the regression
equation
3) Check the simple linear
regression assumptions
4) Predict the weekly profit if
the Discount applied is 25%.
www.aum.edu.kw
1) Regression output
Excel Output:

www.aum.edu.kw 12
1) Regression output Output Comments
• 𝐑2 = 𝟎. 𝟗𝟐𝟕𝟎 = 𝟗𝟐. 𝟕% which means that
Excel Output: 92.7% of the variation in the weekly profit is
explained by the discount %.
• 𝜷𝟏 = −𝟒𝟐 negative slope, which reflect the
negative linear relationship. In addition, 1 hour
increase in the discount % leads to $42
decrease in the weekly profit.
• 𝜷𝟎 = 𝟐𝟎𝟐𝟓
• p-value < 0.05 which means that the discount %
has significant effect on the weekly profit.
• 𝐑 = − 𝑅2 𝛽1 < 0 = − 0.9270 = +0.9628
strong negative linear relationship

www.aum.edu.kw 13
2) Regression Equation
Using the Regression output

Regression Equation:
෣Profit = −42 × (𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡 %) + 2025
𝑊𝑒𝑒𝑘𝑙𝑦

www.aum.edu.kw 14
3) Checking the Regression Assumptions
Excel Output:

www.aum.edu.kw 15
4) Profit Prediction:
Predict the weekly profit if the Discount applied is 25%?

Regression Equation:

Weely Profit = −42 × (𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡 %) + 2025

If the discount % is 25, the profit will be:



Weely Profit = −42 × (25) + 2025 = $975

www.aum.edu.kw 16
Case Study 2:
Ads per Week vs. Units Sold
A marketing team wants to examine whether running more online
ads leads to more product sales.
Data for 35 observations:
Use the following data set to:
1) Comment on the regression
output
2) Write the regression
equation
3) Check the simple linear
regression assumptions
4) Predict the Units Sold if the
number of Ads per week is
60.
www.aum.edu.kw
1) Regression output
Excel Output:

www.aum.edu.kw 18
1) Regression output Output Comments
• 𝐑2 = 𝟎. 𝟗𝟐𝟕𝟎 = 𝟗𝟐. 𝟕% which means that
Excel Output: 92.7% of the variation in the weekly units sold
is explained by the number of Ads per week.
• 𝜷𝟏 = 𝟐𝟐 positive slope, which reflect the
positive linear relationship. In addition, 1 Ads
increase per week leads to 22 units increase in
the weekly units sold.
• 𝜷𝟎 = 𝟑𝟎𝟎
• p-value < 0.05 which means that the number of
Ads per week has significant effect on the weekly
number of units sold.
• 𝐑 = + R² (𝛽1 > 0) = + 0.9270 = +0.9628
strong positive linear relationship

www.aum.edu.kw 19
2) Regression Equation
Using the Regression output

Regression Equation:

Units Sold = 22 × (𝐴𝑑𝑠 𝑝𝑒𝑟 𝑤𝑒𝑒𝑘) + 300

www.aum.edu.kw 20
3) Checking the Regression Assumptions
Excel Output:

www.aum.edu.kw 21
4) Units Sold Prediction:
Predict the Units Sold if the number of Ads per week is 60?

Regression Equation:

Units Sold = 22 × (𝐴𝑑𝑠 𝑝𝑒𝑟 𝑤𝑒𝑒𝑘) + 300

If the discount % is 25, the profit will be:



Units Sold = 22 × 60 + 300 = 1620 𝑢𝑛𝑖𝑡𝑠

www.aum.edu.kw 22
Case Study 3:
Training Hours vs. Productivity Score
An HR department is assessing the impact of employee training
hours on productivity scores.
Data for 35 observations:
Use the following data set to:
1) Comment on the regression
output
2) Write the regression
equation
3) Check the simple linear
regression assumptions
4) Predict the productivity
score for 40 training hours.
www.aum.edu.kw
2) Regression output
Excel Output:

www.aum.edu.kw 24
2) Regression output Output Comments
• 𝐑2 = 𝟎. 𝟖𝟎𝟐𝟗 = 𝟖𝟎. 𝟐𝟗% which means that
Excel Output: 80.29% of the variation in the production
score is explained by the training hours.
• 𝜷𝟏 = 𝟓 positive slope, which reflect the
positive linear relationship. In addition, an
increase of 1 training hour leads the
productivity score to increase by 5 points.
• 𝜷𝟎 = 𝟔𝟔
• p-value < 0.05 which means that the training
hours has significant effect on the productivity
score.
• 𝐑 = + R² (𝛽1 > 0) = + 0.8029 = +0.8960
strong positive linear relationship

www.aum.edu.kw 25
3) Regression Equation
Using the Regression output

Regression Equation:
෣ Score = 5 × (𝑇𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝐻𝑜𝑢𝑟𝑠) + 66
Productivity

www.aum.edu.kw 26
3) Checking the Regression Assumptions
Excel Output:

www.aum.edu.kw 27
4) Productivity Score Prediction:
Predict the productivity score for 40 training hours?

Regression Equation:
෣ Score = 5 × (𝑇𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝐻𝑜𝑢𝑟𝑠) + 66
Productivity

If the discount % is 40, the profit will be:


෣ Score = 5 × (40) + 66 = 266
Productivity

www.aum.edu.kw 28
End of Session

www.aum.edu.kw 29

You might also like