0% found this document useful (0 votes)
10 views

Tutorial Session 12 - Model Selection Solution

Uploaded by

lucastone325
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Tutorial Session 12 - Model Selection Solution

Uploaded by

lucastone325
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Econ 314: Quantitative Economics

Tutorial 12- Model Selection


Please print and attempt prior to the tutorial session.

1. Explain in your own words, what is meant by each of the following types of specification errors:

1.1 Incorrect error structure:


This occurs in the case of autocorrelation (i.e. a systematic pattern in the error terms)
or heteroscedasticity (i.e. when the error terms have a non-constant variance).
1.2 Incorrect functional form:
The problem could arise when a linear model is estimated when, in fact, the correct
model ought to have been a log-log model or some other non-linear form e.g. a
quadratic estimation.
1.3 Errors of measurement:
This occurs when the inaccurate measures of the dependent and/or independent
variables are used to estimate a regression.

2. What is omitted variable bias? Why is it a problem? How do you try to prevent it? Explain.

Omitted variable bias occurs when an independent variable that is correlated with the
dependent variable is omitted from the multiple linear regression analysis.

Omitted variable bias is a serious problem because it results in biased coefficient estimates.

You can attempt to control for potential omitted variable bias by using economic theory to guide
your choice of independent variables included in your multiple linear regression analysis or use
more advanced econometric techniques e.g. introducing control variables, etc.

3. What is the inclusion of an irrelevant variable? Why is it a problem? How do you try to prevent
it? Explain.

The inclusion of an irrelevant variable occurs when an independent variable that is not actually
correlated with the dependent variable is included in the multiple linear regression analysis.

The inclusion of an irrelevant variable is problematic because it results in inefficiency of the


coefficient estimators leading to inflated standard errors, so that the population parameters are
less precisely estimated.

You can attempt to prevent the inclusion of an irrelevant variable by using economic theory to
guide your choice of independent variables included in your multiple linear regression analysis.

4. Why is missing data a potential problem?


Missing data is a potential concern because it limits the amount of observations we can
collect and if data is missing due to a systematic reason then missing data can bias the
coefficient estimates.
Page 1 of 4
5. What are some of the criteria you would use to evaluate the appropriateness of your chosen
model?

• Examination of R2 (and adjusted R2) and F-statistic.

• The signs, magnitudes and significance of coefficients as expected?

• Whether the residuals fulfil the CLRM assumptions.

• Usage of the MWD test and Ramsey’s RESET test.

6. What can Ramsey’s RESET test be used for?

It can be used to check for:

• omitted variables.

• incorrect functional form.

• correlation between the explanatory variables and the residual term.

7. What is the basic difference between the traditional approach to model selection and Hendry’s
approach?

The traditional approach moves form the simple to the general or the bottom-up approach,
whereas Hendry’s approach is to go from the general to the specific.

8. A major coffee importer is interested in knowing what the sensitivity of the demand for coffee
is to its own price, and whether other drinks such as tea and cola are substitutes. It can supply
you with a set of annual data giving the demand for coffee, the prices of coffee, tea and cola,
for the period 1960-1999. You estimate the following model:

reg ccoffee ptea pcola pcoffee

Source | SS df MS Number of obs = 40


-------------+------------------------------ F( 3, 36) = 27.07
Model | 6624.26469 3 2208.08823 Prob > F = 0.0000
Residual | 2936.66104 36 81.5739177 R-squared = 0.6928
-------------+------------------------------ Adj R-squared = 0.6673
Total | 9560.92573 39 245.151942 Root MSE = 9.0318

------------------------------------------------------------------------------
ccoffee | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ptea | 11.69218 15.90781 0.73 0.467 -20.57035 43.9547
pcola | 8.514633 8.677161 0.98 0.333 -9.083465 26.11273
pcoffee | -10.19486 10.07888 -1.01 0.319 -30.63578 10.24606
_cons | 160.0731 35.40204 4.52 0.000 88.27439 231.8717
------------------------------------------------------------------------------

Page 2 of 4
Comment on each of the following:

8.1 The goodness of fit


Based on the R2 value we can say that about 69% of the variation in the dependent
variable can be explained by the variation in the explanatory variables. This is
reasonable. The F-stat likewise confirms the explanatory power of the estimated
model.

8.2 The statistical significance of the variables


Most of the explanatory variables, in contrast to the above observation, are statistically
insignificant at all the conventional levels.

8.3 The signs of the estimated coefficients. Are they as expected?


On the basis of economic theory, all of the signs on the estimated coefficients are as
expected. Explain (?)

8.4 Do you think that you have omitted any important variables from your model? If so,
which ones?
Possibly income, the prices of complements, size of population, etc.

You then add in an additional variable in your model, per capita income, obtaining the
following results:

. reg ccoffee ptea pcola pcoffee pcy

Source | SS df MS Number of obs = 40


-------------+------------------------------ F( 4, 35) = 97.16
Model | 8771.04186 4 2192.76047 Prob > F = 0.0000
Residual | 789.88387 35 22.5681106 R-squared = 0.9174
-------------+------------------------------ Adj R-squared = 0.9079
Total | 9560.92573 39 245.151942 Root MSE = 4.7506

------------------------------------------------------------------------------
ccoffee | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ptea | 18.9075 8.399881 2.25 0.031 1.854837 35.96017
pcola | 5.942891 4.571652 1.30 0.202 -3.338056 15.22384
pcoffee | -17.78683 5.358168 -3.32 0.002 -28.66448 -6.909167
pcy | .4572321 .0468804 9.75 0.000 .36206 .5524043
_cons | -312.566 51.91447 -6.02 0.000 -417.9579 -207.174
------------------------------------------------------------------------------

Comment on how the addition of the new variable has impacted on each of the following:

8.5 The goodness of fit


This has now significantly improved from below 70% to over 90%. Clearly, income is
an important explanatory variable that had been left out in the first regression.
Page 3 of 4
8.6 The statistical significance of the variables
In contrast to the earlier results, now three of the four variables are now statistically
significant at the 5% level, and even the other variable’s significance has also
improved somewhat.

8.7 The signs of the estimated coefficients. Are they as expected?


Although the constant now has a negative sign, all of the other variable have their
expected signs.

Page 4 of 4

You might also like