Inference in The Regression Model
Inference in The Regression Model
Introduction
Having used regression as a descriptive tool and, separately, learned about statistical
inference, we now put them together to consider the reliability of the regression model. The
parameter estimates (a and b) we obtained are point estimates of the true values. It is not
difficult to show that these are in fact unbiased estimates. We now want to obtain interval
estimates, which involves calculating their standard errors. We will also consider hypothesis
testing within regression.
The model is
To make any progress we need to assume a distribution for the error term. Assume therefore
ui ~ NID(0,²). This means the errors have identical and independent Normal pdfs with
mean 0 and constant variance ².
This means
Note that a and b are random variables (because they are calculated from a random sample
and their values depend upon the particular realisation of the random error), so for inference
we need their distributions. Since
Yi = a + b Xi + ui and
ui ~ N(0, 2 ) then
Yi ~ N(a + bXi, 2 )
Each Yi is normally distributed and independent (since the ui are independent). But b is a
linear function of the Y values, so b must be Normally distributed also. Now we just need to
find the parameters of this Normal distribution.
1 RSS
s² =
n- 2
ei2
n2
(this provides an unbiased estimate of 2.)
170.754
Hence s² = = 17.0754 (values taken from the previous regression exercise)
10
Hence V(b) = 17.0754/49.37 = 0.345. The standard error of b is therefore 0.345 = 0.587.
1 X2
For a, V (a) 2
= … = 5.304.
n
X X
2
Yi = 40.711 - 2.70 Xi + ei
s.e. (2.30) (0.59)
R2 = 0.678 n = 12
Interval estimates
We now have the information needed to calculate interval estimates. We actually use the t
distribution rather than the z, because we have had to estimate 2 by s2. This applies to all
sample sizes, unlike estimation of a mean.
For b the 95% c.i. is given by b 2.228 s.e.(b) = -2.7 2.228 0.59 = [-4.01, -1.39].
Hypothesis testing
We can now test hypotheses about and . An obvious test is to see if the slope coefficient
is zero (when X would no influence upon Y):
H0: = 0
H1: 0.
b
The test statistic is t which has a t distribution (with n-2 degrees of freedom).
V b
2.7 0
Hence we obtain t 4.57 t* 2.228 and so we reject the null.
0.59
Note the equivalence between estimation and hypothesis testing: the test rejected the value of
0, and the confidence interval did not contain the value. (Note that this equivalence only
holds if you perform a two tailed test.)
2
More complex tests
f(a,b) - f ,
t= ~ tn-2
est. s.e. f(a,b)
Example:
H0: + = 40
H1: + 40.
(a+b ) - ( + )
t= ~ t10
est. s.e.(a+b )
2 X 17.0754 (40.2)
Cov(a, b) = = 1.159.
n X X 12 184.04 40.2 2
2 2
Hence t =
40.711 2.7 40 1.09 > -t* = -1.812 hence we cannot reject the null.
1.825
Prediction
For X = 3, Y = 32.6 is the predicted value. The prediction from an OLS regression is
unbiased (proof on QM) so we use this as our point estimate. For an interval estimate we
need the variance of Y (note that Y is a random variable because it is a function of a and b).
Hence the s.e. is 1.206, its square root. Hence the 95% C.I. is given by
3
This gives the CI for the regression line at X = 3, but not for an individual Y observation. For
this, there is the additional uncertainty given by the data points not lying on the regression
line. For V(Y3 ) we need