Regression Output in Excel, Interpretation and Relations Between Different Statistics
Regression Output in Excel, Interpretation and Relations Between Different Statistics
(This note is for better understanding of Regression output in Excel. This is meant to read
along with the text and class notes)
Data are given for operating income X and monthly stock close Y for Clorox, Inc. To study
the dependence of Y on X.
Scatterplot of X and Y
Correlation of income (X) and stock price (Y)
r = 0.9434 = 94.34%
R-square is the amount of total variability present in the data explained by regression. Here
89% of the variability is explained by regression of Y on X.
Y^ = b0 + b1 X = —67.62 + 0.41 X
If income (X) changes by 1 unit, expected change in monthly stock close (Y) is 0.41 units.
If X = 0, value of Y = —67.62.
Standard error of estimate = Standard error = 9.6329 = √(MSE) = √(SSE / (n-2))
SSE = Error sum of squares. Total SS = SSR + SSE. Total no. of observations = 18 = n
SSE df = n – 2 = 18 – 2 = 16.
If Sb1 is not given but Se and Sxx are given then Sb1 can easily be calculated.
Testing H0: β1 = 0 versus Ha: β1 ≠ 0 is equivalent to testing whether regression exists or not.
Test statistic is
At 5% level of significance t16(0.05) = 1.746; reject H0 if observed t < —1.746. In this case H0
is rejected. This is an example how regression output may be used to test for any
hypothesis regarding β1. Similarly hypotheses regarding β0 may also be tested.
The confidence intervals can also be used to test a null hypothesis. If the 95% confidence
intervals include the null value, then the null hypothesis may not be rejected.
Example 1: The 95% confidence intervals for β 1 is (0.33, 0.48). It does not include the value
0. Hence the null hypothesis H0: β1 = 0 will be rejected at 5% level of significance.
Example 2: 95% confidence interval for β1 includes the value 0.35. Hence the null
hypothesis H0: β1 = 0.35 will not be rejected at 5% level of significance.
Example 3: 95% confidence interval for β1 does not includes the value 0.5 but 99%
confidence interval does. At 5% level of significance H 0: β1 = 0.5 will not be rejected but it
will be rejected at 1% level of significance.
Another way to test the overall model is the F-test. For simple regression, both F-test and t-
test for regression slope is identical, even though F-test uses F distribution and t-test uses
t-distribution.
Let the null hypothesis H0: There is a regression of Y on X vs Ha: No regression exists. For
simple regression this translates to H0: β1 = 0 versus Ha: β1 ≠ 0. F-statistic is
F = (SSR / dfR) / (SSE / dfE) = MSR / MSE = 129.49
In this case the rejection region is always on the right side. Significance F gives P-value of
the F statistic. Here P-value is very very small (0.00 means the first significant digit after
decimal is not before the third place, possibly even later). Very small P-value indicates that
H0 cannot be rejected, i.e. regression is significant.
Note that the t-statistic corresponding to H0: β1 = 0 may be squared to get the F-statistic
described above (11.382 = 129.49) and both P-values will be identical.
Residuals are the difference between the observed Y and predicted Y, for a given value of X.
Example: Observation 6, X = 300, Y = 50.
Y – Y^ = 50 – 54.56 = —4.56 (If you do the calculation by hand, you may find small
discrepancy. That is due to rounding off)
The main purpose of the residual plots is to see if the model assumed is good for
explanation of the data. In this case the residuals are showing a parabolic form, indicating a
quadratic relationship.
The residuals are expected to be within ± 3 Se. Here the residuals will be within ± 3 (9.63) =
(-28.89, 28.89). Here all residuals are well within the limits; hence no outliers are detected.
In this plot the bars represent the actual Y as well as the predicted Y side by side for ease of
comparison.