0% found this document useful (0 votes)
39 views

Inference in The Regression Model

This document discusses statistical inference in regression models. It explains how to calculate standard errors for the regression coefficients to obtain interval estimates and perform hypothesis tests. Specifically, it shows how to: 1) Calculate the standard error of the slope coefficient b using the formula that relies on the estimated variance of the error term σ^2. 2) Use the standard errors to obtain 95% confidence intervals for the coefficients. 3) Perform hypothesis tests on the coefficients, such as testing if the slope is equal to zero. 4) Predict new values and calculate confidence intervals for predictions and individual observations.

Uploaded by

bo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Inference in The Regression Model

This document discusses statistical inference in regression models. It explains how to calculate standard errors for the regression coefficients to obtain interval estimates and perform hypothesis tests. Specifically, it shows how to: 1) Calculate the standard error of the slope coefficient b using the formula that relies on the estimated variance of the error term σ^2. 2) Use the standard errors to obtain 95% confidence intervals for the coefficients. 3) Perform hypothesis tests on the coefficients, such as testing if the slope is equal to zero. 4) Predict new values and calculate confidence intervals for predictions and individual observations.

Uploaded by

bo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Inference in the regression model

Introduction

Having used regression as a descriptive tool and, separately, learned about statistical
inference, we now put them together to consider the reliability of the regression model. The
parameter estimates (a and b) we obtained are point estimates of the true values. It is not
difficult to show that these are in fact unbiased estimates. We now want to obtain interval
estimates, which involves calculating their standard errors. We will also consider hypothesis
testing within regression.

Calculating standard errors

The model is

Yi =  + Xi + ui (i = 1..n in x-section, t = 1..T in time series)

or y = Xu in matrix notation.

u is stochastic error or disturbance process. X is (assumed) non-stochastic. Y is stochastic,


because of u.

The error process

To make any progress we need to assume a distribution for the error term. Assume therefore
ui ~ NID(0,²). This means the errors have identical and independent Normal pdfs with
mean 0 and constant variance ².

This means

E(ui) = 0 for all i


V(ui) = ² for all i
E(uiuj) = 0 for all i # j (zero covariance)

Note that a and b are random variables (because they are calculated from a random sample
and their values depend upon the particular realisation of the random error), so for inference
we need their distributions. Since

Yi = a + b Xi + ui and

ui ~ N(0, 2 ) then

Yi ~ N(a + bXi, 2 )

Each Yi is normally distributed and independent (since the ui are independent). But b is a
linear function of the Y values, so b must be Normally distributed also. Now we just need to
find the parameters of this Normal distribution.

The variance of b is given by the formula V(b) =  2  X  X  (proof in QM).


2
Unfortunately we don't know ², so we have to estimate it using s², given by

1 RSS
s² =
n- 2
 ei2 
n2
(this provides an unbiased estimate of  2.)

170.754
Hence s² = = 17.0754 (values taken from the previous regression exercise)
10

Hence V(b) = 17.0754/49.37 = 0.345. The standard error of b is therefore 0.345 = 0.587.

1 X2 
For a, V (a)   2
  = … = 5.304.
n
 X  X  
2 

Hence the regression equation becomes:

Yi = 40.711 - 2.70 Xi + ei
s.e. (2.30) (0.59)

R2 = 0.678 n = 12

Interval estimates

We now have the information needed to calculate interval estimates. We actually use the t
distribution rather than the z, because we have had to estimate 2 by s2. This applies to all
sample sizes, unlike estimation of a mean.

For b the 95% c.i. is given by b  2.228  s.e.(b) = -2.7  2.228  0.59 = [-4.01, -1.39].

The calculation for a is similar.

Hypothesis testing

We can now test hypotheses about  and . An obvious test is to see if the slope coefficient
is zero (when X would no influence upon Y):

H0:  = 0
H1:   0.

b
The test statistic is t  which has a t distribution (with n-2 degrees of freedom).
V b 

 2.7  0
Hence we obtain t   4.57  t*  2.228 and so we reject the null.
0.59

Note the equivalence between estimation and hypothesis testing: the test rejected the value of
0, and the confidence interval did not contain the value. (Note that this equivalence only
holds if you perform a two tailed test.)

2
More complex tests

In general to test Ho: f() = 0 the test statistic is

f(a,b) - f ,  
t= ~ tn-2
est. s.e. f(a,b)

where f is a linear function of a, b.

Example:

H0:  +  = 40
H1:  +   40.

(a+b ) - ( +  )
t= ~ t10
est. s.e.(a+b )

is the test statistic. We need

V(a+b) = V(a) + V(b) + 2Cov(a,b)


= 5.304 + 0.345 + 2  -1.159 = 3.331

where the covariance is given by the formula

 2   X  17.0754  (40.2)
Cov(a, b) = =  1.159.
n X   X  12  184.04  40.2 2
2 2

Hence est. s.e. (a+b) = 1.825.

Hence t =
40.711  2.7   40  1.09 > -t* = -1.812 hence we cannot reject the null.
1.825

Prediction

For X = 3, Y = 32.6 is the predicted value. The prediction from an OLS regression is
unbiased (proof on QM) so we use this as our point estimate. For an interval estimate we
need the variance of Y (note that Y is a random variable because it is a function of a and b).

V(Y ) = V(a + bX)


= V(a + b3)
= V(a) + 32 V(b) + 2  3  Cov(a, b)
= 5.304 + 9  0.345 + 6  -1.159 = 1.455

Hence the s.e. is 1.206, its square root. Hence the 95% C.I. is given by

Y  t  s.e.(Y ) = 32.6  2.228  1.206 = [29.9, 35.3].

3
This gives the CI for the regression line at X = 3, but not for an individual Y observation. For
this, there is the additional uncertainty given by the data points not lying on the regression
line. For V(Y3 ) we need

V(Y3) = V(Y 3) + V(e) N.B. Y 3 and e are independent - why?


= 1.455 + 17.0754 = 18.53.
and the standard error is 4.304.

The interval is therefore

32.6 ± 2.228  4.304 = [23.0, 42.2]

This is the 95% c.i. for an individual observation Y at X = 3.

You might also like