Supply Chain Analytics
Supply Chain Analytics
BUSINESS ANALYTICS
SESSION 3
PREDICTIVE ANALYTICS & ITS
APPLICATIONS
1 2
3 4
3 4
1
18/01/2023
5 6
First Year Sales & Advertising Data First Year Sales & Advertising Data
200 200
180 180
160 160
First Year Sales
140 140
($ million)
($ million)
120 120
100 100
Average First Year
80 80
Sales = $101.5M
60 60
40 40
20 20
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
7 8
First Year Advertising Expenditures ($ million) First Year Advertising Expenditures ($ million)
7 8
2
18/01/2023
($ million)
120
100
80
Dependent Variable Predictor Variable (IDV) 60
40
20
We must estimate the unknown parameters β0 and β1 . b0
9 0 10
We will call these estimates b0 and b1, respectively 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
First Year Advertising Expenditures ($ million)
9 10
How can we find the best line? Use of Excel to solve it!
11 12
3
18/01/2023
13 14
13 14
15 16
Sales = 42.2 + 59.7 * advertising expenditures
15 16
4
18/01/2023
Y
80 Predicted Y 80 ei
= 10185.6
60 Linear (Predicted Y) 60
40 40
20 20
0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
x Variable Error ?? 17 x Variable 18
Sales (Y) = 42.2 + 59.7 * advertising expenditures (x) Sales (Y) = 42.2 + 59.7 * advertising expenditures (x)
17 18
140
($ million)
120
100
80 ei
60
SSE naive= 20405
40
20
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
19 20
First Year Advertising Expenditures ($ million) R2 is a measure of the overall quality of the regression. It is the proportion
Average first year sales = $101.5M of the variance in the dependent variable that is predicted from the
independent variable.
19 20
5
18/01/2023
21 22
23 24
6
18/01/2023
25 26
R2 has increased
from 0.83 to 0.86
27 28
27 28
7
18/01/2023
29 30
Model misspecification
May be due to Left out variables
May be due to irrelevant variables
Functional Misspecification: What if the underlying
relationship between x and Y is not linear? [The Ramsey
Regression Specification Error Test (RESET)]
Extrapolation
Extending the model beyond the domain of available data
Variable selection
Exclude irrelevant variables to avoid overfitting (will
result in confidence interval containing 0)
Exclude highly correlated variables (may also result in31
confidence intervals containing 0)
31