0% found this document useful (0 votes)
110 views

Problem Set 6

1) The document contains a problem set with multiple choice and analytical questions regarding regression analysis and hypothesis testing. 2) It includes regression output from 3 models examining the relationship between costs and output. The output is used to answer questions about sample sizes, R-squared values, and effects of variables in each model. 3) Questions also address hypothesis testing, model selection, issues like multicollinearity, and pooling time series data across periods.

Uploaded by

Sila Kapsata
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views

Problem Set 6

1) The document contains a problem set with multiple choice and analytical questions regarding regression analysis and hypothesis testing. 2) It includes regression output from 3 models examining the relationship between costs and output. The output is used to answer questions about sample sizes, R-squared values, and effects of variables in each model. 3) Questions also address hypothesis testing, model selection, issues like multicollinearity, and pooling time series data across periods.

Uploaded by

Sila Kapsata
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Problem Set 6

Multiple Choice Questions

1. The critical value in the F-distribution depends on the degrees of freedom in the
numerator and denominator. How do you find the degrees of freedom in the nu-
merator?
(a) It is the number of observations minus the number of coefficients estimated
(N − K)
(b) It is the number of hypotheses being tested simultaneously (J)
(c) It is the number of coefficients being estimated (K)
(d) It is the number of observations minus the number of hypotheses tested (N −J)
2. The critical value in the F-distribution depends on the degrees of freedom in the
numerator and denominator. How do you find the degrees of freedom in the de-
nominator?
(a) It is the number of observations minus the number of coefficients estimated
(N − K)
(b) It is the number of hypotheses being tested simultaneously (J)
(c) It is the number of coefficients being estimated (K)
(d) It is the number of observations minus the number of hypotheses tested (N −
J)
3. When performing an F-test, if the null hypothesis is H0 : β1 = β2 = 0. What is
the alternative hypothesis?
(a) β1 6= 0 and β2 6= 0
(b) β1 6= 0 or β2 6= 0
(c) (β1 6= 0 and β2 = 0) or (β1 = 0 and β2 6= 0)
(d) β1 = β2 6= 0
4. How does omitting a relevant variable from a regression model affect the estimated
coefficient of other variables in the model?
(a) they are biased downward and have smaller standard errors
(b) they are biased upward and have larger standard errors
(c) they are biased and the bias can be negative or positive
(d) they are unbiased but have larger standard errors
5. How does including an irrelevant variable in a regression model affect the estimated
coefficient of other variables in the model?
(a) they are biased downward and have smaller standard errors
(b) they are biased upward and have larger standard errors
(c) they are biased and the bias can be negative or positive

1
(d) they are unbiased but have larger standard errors
6. Which of the following measures is NOT used to evaluate model specification?
(a) The adjusted R2
(b) Akaike Information Criterion
(c) Bayesian Information Criterion
(d) Jarque-Bera test
7. When are the R2 and adjusted R2 equal?
(a) When the model is correctly specified
(b) When K = 1
(c) When the error terms are normally distributed
(d) When an unrestricted model is estimated
8. When highly collinear variables are included in an econometric model coefficient
estimates are
(a) biased downward and have smaller standard errors
(b) biased upward and have larger standard errors
(c) biased and the bias can be negative or positive
(d) unbiased but have larger standard errors
9. When a set of variables with perfect collinearity is included in an econometric
model coefficient estimates are
(a) undefined
(b) unbiased
(c) biased upward
(d) biased, but the direction is unclear
10. If your regression results show a high R2 , adj R2 , and a significant F-test, but low
t-values for the coefficients, what is the most likely cause?
(a) omitted relevant variables
(b) irrelevant variables have been included
(c) multicolinearity
(d) heteroskedasticity

Analytical Questions

11. Past EXAM Question


The following output is taken from OLS regressions of three different models which
try to establish the effect of output (measured in Kilograms) on total costs (mea-
sured in £’s). The first model regresses the level of costs, (costs), on the level of
output, (output). The second model regresses the natural log of costs, (log cost),
on the natural log of output, (log output) and the third model regresses the level

2
of costs on the level of output, the square of output, (output sq) and the cube of
output (output cub). Some of the regression output has been hidden.
Model 1
reg costs output
Source | SS df MS Number of obs =
-------------+------------------------------ F( 1, 58) = 662.73
Model | 733.336303 1 733.336303 Prob > F = 0.0000
Residual | 97.3749935 58 1.10653402 R-squared = 0.8828
-------------+------------------------------ Adj R-squared = 0.8814
Total | 830.711297 59 9.33383479 Root MSE = 1.0519
------------------------------------------------------------------------------
costs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .5000000 .0250000 0.000
_cons | .6501553 .1677777 3.88 0.000 .3167323 .9835782
------------------------------------------------------------------------------

Model 2
reg log_cost log_output

Source | SS df MS Number of obs =


-------------+------------------------------ F( 1, 58) = 185.50
Model | 1 Prob > F = 0.0000
Residual | 10.0000000 58 .113636360 R-squared =
-------------+------------------------------ Adj R-squared = 0.9155
Total | 100.000000 59 1.69491530 Root MSE = 1.3019
------------------------------------------------------------------------------
log_cost | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
log_output | .6000000 .0272426 22.39 0.000 .5556884 .6639662
_cons | -2.447097 .1569509 -15.59 0.000 -2.759004 -2.13519
------------------------------------------------------------------------------

Model 3
reg costs output output_sq output_cub

Source | SS df MS Number of obs =


-------------+------------------------------ F( , ) =
Model | 855.000000 3 285.00000 Prob > F = 0.0000
Residual | 95.0000000 56 1.69642860 R-squared = 0.9000
-------------+------------------------------ Adj R-squared = 0.8806
Total | 950.000000 59 16.1016950 Root MSE = 4.0126
------------------------------------------------------------------------------
costs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
output | .8000000 .2000000 4.00 0.000
output_sq | -0.003000 0.003000 -1.00 0.290 -9.07e-06 2.74e-06
output_cub | 0.000001 0.000001 0.95 0.347 -1.36e-09 3.83e-09
_cons | .4343922 .2503542 1.74 0.086 -.0632955 .9320799
------------------------------------------------------------------------------

(a) Find the sample size in model 1


(b) Calculate the R2 value in model 2
(c) Interpret the effect of the estimated effect of output on costs in each model
(d) Test the hypothesis that the variable output has some explanatory power in
model 1 (use the 5% significance level for your test and the nearest critical
value in the Table for the relevant degrees of freedom)
(e) Calculate the F test of goodness of fit of the model as a whole in model 3
(f) Explain, briefly, how the adjusted R2 helps with model selection. Use this to
help you choose whether you prefer models 1, 2 or 3
(g) The regression output in Model 3 suggests the presence of what issue that
arises in many multiple regression models? Give reasons for your answer.
(h) Why might we worry if the OLS residuals are not normally distributed?

3
12. You have time series data for the period 1935-2000. You are given an estimate
of the effects of income (measured in £billion) and interest rates, (measured in
percentage points) on aggregate consumption expenditure (measured in £billion).
ˆ =
Cons 10.00 + 0.90Income − 6.00IntRate T SS = 70 ĒSS = 10
(1.00) + (0.45) + (2.00)

You then split the data into two periods, and run 2 separate regressions

For the period 1935-1970:


ˆ =
Cons 6.00 + 0.95Income − 2.00IntRate T SS = 30 ĒSS = 10
(1.00) + (0.40) + (1.00)

For the period 1971-2000:


ˆ =
Cons 14.00 + 0.85Income − 10.00IntRate T SS = 20 ĒSS = 10
(1.00) + (0.50) + (4.00)

Test the hypothesis that the data could be pooled across both time periods and
estimated as a single equation.
13. Consider the model Y = β0 + β1 X + u
(a) What is the formula for the Ordinary Least Squares estimate of β1 ?
(b) Under what conditions will Ordinary Least Squares produce an unbiased and
efficient estimate of β1 ?
(c) Prove that the Ordinary Least Squares estimate of β1 is unbiased.
14. A researcher is interested how the proportion of household budget spent on trans-
portation (W T RAN S) depends on total household expenditure (measured in logs
- LOGEXP ), the age of the household head (AGE) and the number of children
in the household (N U M KIDS). The researcher produces the following table of
estimates:
WTRANS
Log expenditure 0.0414
(0.0071)
Age of HH head -0.0001
(0.0004)
No. of children -0.0130
(0.0055)
Constant -0.0315
(0.0322)
R2 0.0247
N 1,519
Standard errors reported in parentheses

(a) What was the theoretical model the researcher took to the data?
(b) Write down the estimated model
(c) Interpret the estimates
(d) Are there any variables you would exclude from the model? Why, or why
not?

4
(e) Predict the proportion of a budget that will be spent on transportation for a
one-child household when total expenditure and age are set at their sample
means (98.7 and 36 respectively)

Practical Questions

15. When estimating wage equations we expect that young, experienced workers will
have relatively low wages and that with additional experience their wages will rise,
but then begin to decline after middle age, as the worker nears retirement. This
lifecycle pattern of wages can be captured by introducing experience and the square
of experience to explain the level of wages.
Consider the theoretical model

W age = β0 + β1 Exper + β2 Exper2 + β3 Educ + u (1)

(a) What is the marginal effect of experience on wages?


(b) What signs do you expect for each of the coefficients β1 and β2 and why?
(c) After how many years of experience do wages start to decline?
(d) Open the dataset cpseduc.dta (we used this dataset previously in Problem
Set 4)
i. Estimate a simple regression model of wages on years of experience
ii. Estimate a second model where you also include years of education
iii. Estimate the full theoretical model in (1) and interpret the estimates.
Are the estimates consistent with your expectations?
iv. Export your the estimates from all three models in one single table
• To export the table requires the outreg2 command. (Recall if nec-
essary you can download the command using < ssc install outreg2 .
• After estimating each model you need to save the estimates in STATA’s
internal memory. Then tell STATA to put the estimates together in
one table with the outreg command. The syntax will look something
like:
reg . . . . . .
estimates store model1
reg . . . . . .
est sto model2
reg . . . . . .
est sto model3
outreg2 [model1 model2 model3] using datapath\Table1, replace
word
v. Compare the coefficient on experience between the simple regression
model and the second model. What happens? Why? What does this
tell you about the correlation between experience and education?

5
16. The file cocaine.dta available on Moodle contains 56 observations on variables
related to sales of cocaine powder in northeastern California over the period 1984-
1991. The data are a subset of those used in the study

Caulkins, J.P. and R. Padman (1993) “Quantity Discounts and Quality Premia
for Illicit Drugs” Journal of the American Statistical Association, 88, 748-757

The variables are:


• PRICE = price per gram in dollars for a cocaine sale
• QUANT= number of grams of cocaine in a given sale
• QUAL = quality of the cocaine expressed as a percentage of purity
• TREND = a time variable with 1984=1 up to 1991=8
Consider the regression model

P RICE = β0 + β1 QU AN T + β2 QU AL + β3 T REN D + u

(a) What signs would you expect for the coefficients β1 , β2 and β3 . Explain
(b) Estimate the model in STATA and interpret the coefficient estimates. Do the
signs of the coefficients conform to your expectations?
(c) What proportion of the variation in cocaine prices is explained jointly by
variation in quantity, quality and time?
17. Use the data cpseduc.dta to estimate the following wage equation:

ln(W age) = β0 + β1 Educ + β2 Exper + β3 Hrswk + u

(a) Interpret the regression output.


(b) Test the hypothesis that an extra year of education increases the wage rate
by 10%
(c) Re-estimate the model with the additional variables EDU C ∗ EXP ER and
EDU C 2 and EXP ER2 . Interpret the regression output
(d) Estimate the marginal effects ∂ ∂EDU
ln(W age)
C for a woman with 16 years of educa-
tion and 2 years of experience and for a woman with 12 years of education
and 2 years of experience. What can you say about the marginal effect of
education for women as education increases?
(e) Estimate the marginal effects ∂ ∂EDU
ln(W age)
C for a man with 16 years of education
and 2 years of experience and for a man with 12 years of education and 2
years of experience. What can you say about the marginal effect of education
for men as education increases?

You might also like