0% found this document useful (0 votes)
25 views

Regression Output in Excel, Interpretation and Relations Between Different Statistics

This document discusses interpreting regression output in Excel by analyzing stock price (Y) data in relation to operating income (X) for Clorox, Inc. It finds a strong positive correlation (r=0.9434) between X and Y, with the regression equation being Y^ = -67.62 + 0.41X. This model explains 89% of the variability in Y. Hypothesis tests and confidence intervals are used to evaluate the significance of the regression relationship between X and Y. Residual plots are examined to check that the linear model is appropriate.

Uploaded by

Abhinav Nigam
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Regression Output in Excel, Interpretation and Relations Between Different Statistics

This document discusses interpreting regression output in Excel by analyzing stock price (Y) data in relation to operating income (X) for Clorox, Inc. It finds a strong positive correlation (r=0.9434) between X and Y, with the regression equation being Y^ = -67.62 + 0.41X. This model explains 89% of the variability in Y. Hypothesis tests and confidence intervals are used to evaluate the significance of the regression relationship between X and Y. Residual plots are examined to check that the linear model is appropriate.

Uploaded by

Abhinav Nigam
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

REGRESSION OUTPUT IN EXCEL, INTERPRETATION AND

RELATIONS BETWEEN DIFFERENT STATISTICS

(This note is for better understanding of Regression output in Excel. This is meant to read
along with the text and class notes)

Data are given for operating income X and monthly stock close Y for Clorox, Inc. To study
the dependence of Y on X.

Income( StockPrice( StockPrice(


X) Y) Income(X) Y)
240 45 340 59
250 42 350 67
260 44 360 75
270 46 370 74
280 47 400 85
300 50 410 95
310 48 420 110
320 60 430 125
330 61 450 130

Scatterplot of X and Y
Correlation of income (X) and stock price (Y)

r = 0.9434 = 94.34%

Multiple R = correlation coefficient (r) for simple regression

R-square = square of correlation coefficient = r2 = (0.9434)2 = 0.8900

R-square = Regression SS / Total SS = 12015.8 / 13500.5

R-square is the amount of total variability present in the data explained by regression. Here
89% of the variability is explained by regression of Y on X.

The linear regression equation is

Y^ = b0 + b1 X = —67.62 + 0.41 X

If income (X) changes by 1 unit, expected change in monthly stock close (Y) is 0.41 units.

If X = 0, value of Y = —67.62.
Standard error of estimate = Standard error = 9.6329 = √(MSE) = √(SSE / (n-2))

SSE = Error sum of squares. Total SS = SSR + SSE. Total no. of observations = 18 = n

Total SS degrees of freedom (df) = n – 1 = 17. For simple regression, SSR df = 1

SSE df = n – 2 = 18 – 2 = 16.

Standard error (Sb0) = 12.32. Standard error (Sb1) = 0.04 = Se / √Sxx

If Sb1 is not given but Se and Sxx are given then Sb1 can easily be calculated.

One of the main hypothesis to test is whether there is a regression of X on Y. Remember


that the regression equation we determine is from a sample of size n. But we need to make
an inference about the population.

Testing H0: β1 = 0 versus Ha: β1 ≠ 0 is equivalent to testing whether regression exists or not.
Test statistic is

t = b1 / Sb1 = 0.41 / 0.04 = 11.38


P-value of the test is = P(t16 > 11.38) = 0.00. Suppose test for H 0: β1 = 0.5 versus Ha: β1 <
0.5. Test statistic is

t = (b1 – 0.5)/Sb1 = (0.41 – 0.5) / 0.04 = — 2.25

At 5% level of significance t16(0.05) = 1.746; reject H0 if observed t < —1.746. In this case H0
is rejected. This is an example how regression output may be used to test for any
hypothesis regarding β1. Similarly hypotheses regarding β0 may also be tested.

The confidence intervals can also be used to test a null hypothesis. If the 95% confidence
intervals include the null value, then the null hypothesis may not be rejected.

Example 1: The 95% confidence intervals for β 1 is (0.33, 0.48). It does not include the value
0. Hence the null hypothesis H0: β1 = 0 will be rejected at 5% level of significance.

Example 2: 95% confidence interval for β1 includes the value 0.35. Hence the null
hypothesis H0: β1 = 0.35 will not be rejected at 5% level of significance.

Example 3: 95% confidence interval for β1 does not includes the value 0.5 but 99%
confidence interval does. At 5% level of significance H 0: β1 = 0.5 will not be rejected but it
will be rejected at 1% level of significance.

Another way to test the overall model is the F-test. For simple regression, both F-test and t-
test for regression slope is identical, even though F-test uses F distribution and t-test uses
t-distribution.

Let the null hypothesis H0: There is a regression of Y on X vs Ha: No regression exists. For
simple regression this translates to H0: β1 = 0 versus Ha: β1 ≠ 0. F-statistic is
F = (SSR / dfR) / (SSE / dfE) = MSR / MSE = 129.49

In this case the rejection region is always on the right side. Significance F gives P-value of
the F statistic. Here P-value is very very small (0.00 means the first significant digit after
decimal is not before the third place, possibly even later). Very small P-value indicates that
H0 cannot be rejected, i.e. regression is significant.

Note that the t-statistic corresponding to H0: β1 = 0 may be squared to get the F-statistic
described above (11.382 = 129.49) and both P-values will be identical.

Residuals are the difference between the observed Y and predicted Y, for a given value of X.
Example: Observation 6, X = 300, Y = 50.

Predicted Y = —67.62 + 0.41 X = 54.56.

Y – Y^ = 50 – 54.56 = —4.56 (If you do the calculation by hand, you may find small
discrepancy. That is due to rounding off)
The main purpose of the residual plots is to see if the model assumed is good for
explanation of the data. In this case the residuals are showing a parabolic form, indicating a
quadratic relationship.

The residuals are expected to be within ± 3 Se. Here the residuals will be within ± 3 (9.63) =
(-28.89, 28.89). Here all residuals are well within the limits; hence no outliers are detected.

In this plot the bars represent the actual Y as well as the predicted Y side by side for ease of
comparison.

You might also like