Linear Regression Model Slope: Ŷ B + B X + B X + B X + + B X
Linear Regression Model Slope: Ŷ B + B X + B X + B X + + B X
BA360: Data Mining Montassar Ben Messaoud, PhD 14 BA360: Data Mining Montassar Ben Messaoud, PhD 15
BA360: Data Mining Montassar Ben Messaoud, PhD 16 BA360: Data Mining Montassar Ben Messaoud, PhD 17
BA360: Data Mining Montassar Ben Messaoud, PhD 18 BA360: Data Mining Montassar Ben Messaoud, PhD 19
R-squared = Explained variation / Total variation
R-squared is always between 0% and 100%.
R-squared is a statistical measure that represents the
proportion of the variance for a dependent variable
0% indicates that the model explains none of the
that’s explained by an independent variable.
variability of the response data around its mean.
ത2
Sum (Y − 𝑌)
BA360: Data Mining Montassar Ben Messaoud, PhD 22 BA360: Data Mining Montassar Ben Messaoud, PhD 23
Adjusted R Squared
ത2
SS Total = Sum (Y − 𝑌)
2 1−𝑅2 (𝑁−1)
SS Residuals = Sum (Y − 𝑌) R2 adjusted = 1 -
𝑁−𝑃−1
𝑆𝑆𝑟𝑒𝑠
𝑆𝑆𝑟𝑒𝑠 Where R2=Sample R-Square 𝑅2 = 1 −
𝑆𝑆𝑡𝑜𝑡
𝑅2 = 1 −
P=Number of predictors
N= Total Sample Size
𝑆𝑆𝑡𝑜𝑡
BA360: Data Mining Montassar Ben Messaoud, PhD 24 BA360: Data Mining Montassar Ben Messaoud, PhD 25
Your sale volume = 𝑌
Ŷ = b0 + b1 X1 + b2 X2 + b3 X3 + b4 X4
BA360: Data Mining Montassar Ben Messaoud, PhD 26 BA360: Data Mining Montassar Ben Messaoud, PhD 27
BA360: Data Mining Montassar Ben Messaoud, PhD 28 BA360: Data Mining Montassar Ben Messaoud, PhD 29
Y= b0+ b1 X + residual
Ŷ = b0 + b1 X
BA360: Data Mining Montassar Ben Messaoud, PhD 30 BA360: Data Mining Montassar Ben Messaoud, PhD 31
Multiple Linear Regression
Ŷ = b0 + b1 X1 + b2 X2 + …. + bn Xn
Dummy variables
BA360: Data Mining Montassar Ben Messaoud, PhD 33 BA360: Data Mining Montassar Ben Messaoud, PhD 34
Dummy variables
1. R&D: Numeric
5. Profit: Numeric
BA360: Data Mining Montassar Ben Messaoud, PhD 35 BA360: Data Mining Montassar Ben Messaoud, PhD 36
Step 1 Step 2
No longer use this variable
in regression model
BA360: Data Mining Montassar Ben Messaoud, PhD 37 BA360: Data Mining Montassar Ben Messaoud, PhD 38
Step 3 (Very Important)
Multiple Regression Model
Ŷ = b0 + b1 X1 + b2 X2 + b3 X3
Ŷ = b0 + b1 X1 + b2 X2 + b3 X3 + b4 X4 + b5 X5 + b6 X6
BA360: Data Mining Montassar Ben Messaoud, PhD 39 BA360: Data Mining Montassar Ben Messaoud, PhD 40
Forward Selection
Stepwise Regression Approach is divided into two main methods:
1- Start with no variables in the model
1- Forward selection 2- Test the addition of each variable and find out if is significant by checking its p-value
and if this variable improves the model by checking R-Squared.
2-Backward elimination
3- If yes, keep the variable in the model and try the next variable, and if not remove it
and try the next variable.
4- Keep repeating this process until none improves the model.
BA360: Data Mining Montassar Ben Messaoud, PhD 41 BA360: Data Mining Montassar Ben Messaoud, PhD 42
Example Example
Find out if a variable is significant or not by Find out if a variable is significant or not by
Ŷ checking the p-value of the variable (p<=0.05)
Ŷ checking the p-value of the variable (p<=0.05)
R-Squared (0 < R < 1) R-Squared (0 < R < 1)
Ŷ = b0 + b1 X1 Ŷ = b0 + b1 X1 + b2 X2
Adjusted R-Squared Adjusted R-Squared
Significant Insignificant
BA360: Data Mining Montassar Ben Messaoud, PhD 43 BA360: Data Mining Montassar Ben Messaoud, PhD 44
Example Backward Elimination
Find out if a variable is significant or not by
Ŷ Significant checking the p-value of the variable (p<=0.05) 1- Start with ALL candidate variables in the model
R-Squared (0 < R < 1) 2- Remove the insignificant variable that has the highest p-value from the model and
Ŷ = b0 + b1 X1 + b3 X3 Adjusted R-Squared run the regression again
3- Keep repeating this process until removing all insignificant variables from the model
and only keep the ones that improve the model.
BA360: Data Mining Montassar Ben Messaoud, PhD 45 BA360: Data Mining Montassar Ben Messaoud, PhD 46
Greatest p-value
• Find out if a variable is significant or not by • Find out if a variable is significant or not by
checking the p-value of the variable checking the p-value of the variable
Ŷ = b0 + b1 X1 + b2 X2 + b3 X3 (p<=0.05)
Ŷ = b0 + b1 X1 + b3 X3 (p<=0.05)
• Eliminate the insignificant variable that has • Eliminate the insignificant variable that has
Insignificant Insignificant significant significant
the greatest p-value. the greatest p-value.
• R-Squared (0.0 < R-Sq < 1) • R-Squared (0.0 < R-Sq < 1)
Adjusted R-Squared Adjusted R-Squared
Ŷ = b0 + b1 X1 + b3 X3
BA360: Data Mining Montassar Ben Messaoud, PhD 47 BA360: Data Mining Montassar Ben Messaoud, PhD 48
1/ Linearity
BA360: Data Mining Montassar Ben Messaoud, PhD 49 BA360: Data Mining Montassar Ben Messaoud, PhD 50
Example of non linearity 2/ Multivariate Normality
BA360: Data Mining Montassar Ben Messaoud, PhD 51 BA360: Data Mining Montassar Ben Messaoud, PhD 52
BA360: Data Mining Montassar Ben Messaoud, PhD 53 BA360: Data Mining Montassar Ben Messaoud, PhD 54
BA360: Data Mining Montassar Ben Messaoud, PhD 55 BA360: Data Mining Montassar Ben Messaoud, PhD 56
3/ No or little Multicollinearity
Multicollinearity may be tested using the Correlation Matrix.
Multicollinearity occurs when the independent variables are too highly correlated with When considering the correlation among all independent
each other variables, the coefficients need to be smaller than 1.
BA360: Data Mining Montassar Ben Messaoud, PhD 57 BA360: Data Mining Montassar Ben Messaoud, PhD 58
Autocorrelation occuts when the residuals are not independent from each other.
The residuals are equal across the regression line.
BA360: Data Mining Montassar Ben Messaoud, PhD 59 BA360: Data Mining Montassar Ben Messaoud, PhD 60
The following scatter plots show examples of data that are not
homoscedastic (i.e., heteroscedastic):
Polynomial Regression
BA360: Data Mining Montassar Ben Messaoud, PhD 61 BA360: Data Mining Montassar Ben Messaoud, PhD 63
Polynomial Regression Polynomial Regression
BA360: Data Mining Montassar Ben Messaoud, PhD 64 BA360: Data Mining Montassar Ben Messaoud, PhD 65
BA360: Data Mining Montassar Ben Messaoud, PhD 66 BA360: Data Mining Montassar Ben Messaoud, PhD 67