STA200 - Lab Session - Chapter 14
STA200 - Lab Session - Chapter 14
www.aum.edu.kw
Assumptions of the Regression Model
For the regression results to be reliable, four key assumptions must be satisfied:
● Linearity: The relationship between the dependent and independent variable
must be linear.
● Independence: The residuals (errors) are independent of each other.
● Homoscedasticity: The variance of residuals is constant across all levels of the
independent variable.
● Normality of Errors: The residuals should be approximately normally distributed.
Notes:
● 𝑖 (Observed value – predicted value)
Residuals (errors) = 𝑌𝑖 − 𝑌
● Violation of these assumptions can lead to biased or misleading results.
www.aum.edu.kw
Possible Regression Lines in Simple Linear Regression
www.aum.edu.kw 4
Key Points
Regression Equation: Y = β1 X + β0
● 𝐑² is the Coefficient of Determination which represents the % of the variation in Y
that is explained by X.
● 𝐑 is the correlation coefficient which measures the strength and direction of the
linear relationship between the independent variable X and the dependent variable
Y
● 𝛃𝟏 : The coefficient of X
● 𝛃𝟎 : The coefficient of the intercepts
● 𝐘𝐢 is the observed value of Y
● 𝐘𝐢 is the predicted value
● 𝐗 𝐢 is the observed value of X
● p-value < 𝟎. 𝟎𝟓 → statistically significant → there is evidence of a linear
relationship. www.aum.edu.kw
www.aum.edu.kw 6
Excel Steps for Regression Assumptions and Output
www.aum.edu.kw
2) For Regression Assumptions:
a) Linearity:
1. Create a scatterplot of the observed Y values against the X values.
2. Add a trendline: Right-click a data point → Add Trendline → Choose "Linear“.
b) Independence:
1. After running regression, calculate residuals:
○ Add a column for Predicted Y using the regression equation.
○ Subtract Predicted Y from Actual Y.
○ Residual = Actual Y – Predicted Y
2. Create a scatterplot of residuals vs. observation order:
○ X-axis = Observation number (Add a new column called "Observation Order" (1 to n)).
○ Y-axis = Residual www.aum.edu.kw 8
2) For Regression Assumptions:
c) Homoscedasticity:
1. After running regression, calculate residuals:
○ Add a column for Predicted Y using the regression equation.
○ Subtract Predicted Y from Actual Y.
○ Residual = Actual Y – Predicted Y
2. Create a scatterplot of residuals vs. predicted Y values:
○ X-axis = Predicted Y
○ Y-axis = Residual
www.aum.edu.kw 9
2) For Regression Assumptions:
d) Normality of Residuals:
Create a histogram of residuals:
1. After running regression, calculate residuals:
○ Add a column for Predicted Y using the regression equation.
○ Subtract Predicted Y from Actual Y.
○ Residual = Actual Y – Predicted Y
2. Click anywhere in the residual column (or select the full range, e.g., C2:C36).
3. Go to the top menu and click:
○ Insert tab → In the Charts group → click on the Histogram icon (it's under the bar
chart dropdown).
4. Select “Histogram” from the options.
www.aum.edu.kw 10
Case Study 1:
Discount % vs. Weekly Profit
A retail business is analyzing how increasing discount percentages
affect its weekly profit.
Data for 35 observations:
Use the following data set to:
1) Comment on the regression
output
2) Write the regression
equation
3) Check the simple linear
regression assumptions
4) Predict the weekly profit if
the Discount applied is 25%.
www.aum.edu.kw
1) Regression output
Excel Output:
www.aum.edu.kw 12
1) Regression output Output Comments
• 𝐑2 = 𝟎. 𝟗𝟐𝟕𝟎 = 𝟗𝟐. 𝟕% which means that
Excel Output: 92.7% of the variation in the weekly profit is
explained by the discount %.
• 𝜷𝟏 = −𝟒𝟐 negative slope, which reflect the
negative linear relationship. In addition, 1 hour
increase in the discount % leads to $42
decrease in the weekly profit.
• 𝜷𝟎 = 𝟐𝟎𝟐𝟓
• p-value < 0.05 which means that the discount %
has significant effect on the weekly profit.
• 𝐑 = − 𝑅2 𝛽1 < 0 = − 0.9270 = +0.9628
strong negative linear relationship
www.aum.edu.kw 13
2) Regression Equation
Using the Regression output
Regression Equation:
Profit = −42 × (𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡 %) + 2025
𝑊𝑒𝑒𝑘𝑙𝑦
www.aum.edu.kw 14
3) Checking the Regression Assumptions
Excel Output:
www.aum.edu.kw 15
4) Profit Prediction:
Predict the weekly profit if the Discount applied is 25%?
Regression Equation:
Weely Profit = −42 × (𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡 %) + 2025
www.aum.edu.kw 16
Case Study 2:
Ads per Week vs. Units Sold
A marketing team wants to examine whether running more online
ads leads to more product sales.
Data for 35 observations:
Use the following data set to:
1) Comment on the regression
output
2) Write the regression
equation
3) Check the simple linear
regression assumptions
4) Predict the Units Sold if the
number of Ads per week is
60.
www.aum.edu.kw
1) Regression output
Excel Output:
www.aum.edu.kw 18
1) Regression output Output Comments
• 𝐑2 = 𝟎. 𝟗𝟐𝟕𝟎 = 𝟗𝟐. 𝟕% which means that
Excel Output: 92.7% of the variation in the weekly units sold
is explained by the number of Ads per week.
• 𝜷𝟏 = 𝟐𝟐 positive slope, which reflect the
positive linear relationship. In addition, 1 Ads
increase per week leads to 22 units increase in
the weekly units sold.
• 𝜷𝟎 = 𝟑𝟎𝟎
• p-value < 0.05 which means that the number of
Ads per week has significant effect on the weekly
number of units sold.
• 𝐑 = + R² (𝛽1 > 0) = + 0.9270 = +0.9628
strong positive linear relationship
www.aum.edu.kw 19
2) Regression Equation
Using the Regression output
Regression Equation:
Units Sold = 22 × (𝐴𝑑𝑠 𝑝𝑒𝑟 𝑤𝑒𝑒𝑘) + 300
www.aum.edu.kw 20
3) Checking the Regression Assumptions
Excel Output:
www.aum.edu.kw 21
4) Units Sold Prediction:
Predict the Units Sold if the number of Ads per week is 60?
Regression Equation:
Units Sold = 22 × (𝐴𝑑𝑠 𝑝𝑒𝑟 𝑤𝑒𝑒𝑘) + 300
www.aum.edu.kw 22
Case Study 3:
Training Hours vs. Productivity Score
An HR department is assessing the impact of employee training
hours on productivity scores.
Data for 35 observations:
Use the following data set to:
1) Comment on the regression
output
2) Write the regression
equation
3) Check the simple linear
regression assumptions
4) Predict the productivity
score for 40 training hours.
www.aum.edu.kw
2) Regression output
Excel Output:
www.aum.edu.kw 24
2) Regression output Output Comments
• 𝐑2 = 𝟎. 𝟖𝟎𝟐𝟗 = 𝟖𝟎. 𝟐𝟗% which means that
Excel Output: 80.29% of the variation in the production
score is explained by the training hours.
• 𝜷𝟏 = 𝟓 positive slope, which reflect the
positive linear relationship. In addition, an
increase of 1 training hour leads the
productivity score to increase by 5 points.
• 𝜷𝟎 = 𝟔𝟔
• p-value < 0.05 which means that the training
hours has significant effect on the productivity
score.
• 𝐑 = + R² (𝛽1 > 0) = + 0.8029 = +0.8960
strong positive linear relationship
www.aum.edu.kw 25
3) Regression Equation
Using the Regression output
Regression Equation:
Score = 5 × (𝑇𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝐻𝑜𝑢𝑟𝑠) + 66
Productivity
www.aum.edu.kw 26
3) Checking the Regression Assumptions
Excel Output:
www.aum.edu.kw 27
4) Productivity Score Prediction:
Predict the productivity score for 40 training hours?
Regression Equation:
Score = 5 × (𝑇𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝐻𝑜𝑢𝑟𝑠) + 66
Productivity
www.aum.edu.kw 28
End of Session
www.aum.edu.kw 29