0% found this document useful (0 votes)
344 views

Homework Answers

Answers to econometric questions for understanding the difficult questions easily with the help of these answers and solves questions.

Uploaded by

Ramsha Sohail
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
344 views

Homework Answers

Answers to econometric questions for understanding the difficult questions easily with the help of these answers and solves questions.

Uploaded by

Ramsha Sohail
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Homework 2 answers

Nathaniel Higgins
[email protected], [email protected]

Assignment
The assignment was to do the following book problems:
17.2, 17.6, 17.7
C17.1, C17.2, C17.3

17.2
Let grad be a dummy variable for whether a student-athlete at a large university graduates in five years. Let hsGP A and SAT be high school grade point average and SAT
score, respectively. Let study be the number of hours spetn per week in an organized
study hall. Suppose that, using data on 420 students-athletes, the following logit model
is obtained:
P (grad = 1|hsGP A, SAT, study) = (1.17 + 0.24hsGP A + 0.00058SAT + 0.073study)
where (z) = exp(z)/[1 + exp(z)] is the logit function. Holding hsGP A fixed at 3.0 and
SAT fixed at 1,200, compute the estimated difference in the graduation probability for
someone who spent 10 hours per week in study hall and someone who spent 5 hours per
week.
This one is easy if you can translate the lingo properly. The question is asking us to use
the estimated coefficients of a model (-1.17, 0.24, 0.00058, and 0.073) to compute two
predictions (p-hats or y-hats or whatever you want to call them). The predicted
probability of the student-athlete graduating (P (grad = 1)) given a high school GPA of
3.0, an SAT of 1200, and 10 hours of study per week is given by
(1.17 + 0.24 3.0 + 0.00058 1200 + 0.073 10) = (0.976).
To translate this function into a single number (a numerical prediction), we simply use
the formula given to us by Wooldridge:
(0.976) =

exp(0.976)
= 0.7263.
1 + exp(0.976)

So our prediction for the probability of a student-athlete who studies 10 hours per week
graduating is 0.7263. If this same student athlete (same GPA and SAT scores) instead
studied only 5 hours per week, we would instead predict their probability of graduating
to be:
(1.17 + 0.24 3.0 + 0.00058 1200 + 0.073 5) = 0.6482.
Thus the difference (the gain in probability of graduation from an extra 5 hours of study
per week) is 0.7263 0.6482 = 0.0781. Thats a pretty healthy increase!

17.6
Consider a family saving function for the population of all families in the United States
sav = 0 + 1 inc + 2 hhsize + 3 educ + 4 age + u,
where hhsize is household size, educ is years of education of the household head, and
age is the age of the household head. Assume that E(u|inc, hhsize, educ, age) = 0.
(i) Suppose that the sample includes only families whose head is over 25 years old. If
we use OLS on such a sample, do we get unbiased estimators of the j ? Explain.
Sure! We would be concerned about bias if the selection of our sample was based on
an unobservable (a variable that was omitted from our savings function). But since the
selection is based on what we are assuming is an exogenous variable (uncorrelated with
u), then the selection of the sample causes no problem.
(ii) Now, suppose our sample includes only married couples without children. Can we
estimate all of the parameters in the saving equation? Which ones can we estimate?
If we only include married couples without children, we are excluding families with
single parents and families with children. This is a pretty serious restriction we are
restricting the sample to households made up of two people. If we do this, we have no
variation whatsoever in hhsize. When we have no variation in a variable, we cannot
estimate the coefficient associated with the variable.
(iii) Suppose we exclude from our sample families that save more than $25,000 per year.
Does OLS produce consistent estimators of the j ?
Yikes. This is a whole different ball game. Now, instead of selecting our sample based on
the exogenous variables (the x-variables), we are selecting our sample based on outcomes.
This is a bad idea. How much can we learn about a problem if we only look at certain
outcomes? Your intuition should be not much. OLS would not produce consistent
estimators, since we are biasing the coefficients by selecting on y (sav). When we select
on y, we are causing truncation in the dataset. When we have a truncated dependent
variable, we dont typically use OLS. See notes on truncation to remember why.

17.7
Suppose you are hired by a university to study the factors that determine whether
students admitted to the university actually come to the university. You are given
2

a large random sample of students who were admitted the previous year. You have
information on whether each student chose to attend, high school performance, family
income, financial aid offered, race, and geographic variables. Someone says to you, Any
analysis of that data will lead to biased results because it is not a random sample of all
college applicants, but only those who apply to this university. What do you think of
this criticism?
I think that the person who offered this criticism should retake econometrics. The task
we have been given is to study the factors that determine whether students admitted
to the university come to the university. We were not asked to study the factors that
determine whether a random sample of college applicants actually come to the university.
If we had (and that would be a silly question) then we would behave differently. This
is an example of a selection issue that does not cause any problems. Your immediate
reaction might be to suspect that all cases of selection cause bias, but as this example
shows, sometimes selection some restriction of the sample is appropriate.

C17.1
Use the data in PNTSPRD for this exercise.
(i) The variable f avwin is a binary variable if the team favored by the Las Vegas point
spread wins. A linear probability model to estimate the probability that the favored
team wins is
P (f avwin = 1|spread) = 0 + 1 spread.
Explain why, if the spread incorporates all relevant information, we expect 0 = 5.
In every game, there is a favorite and an underdog. The spread is an expression of the
gap between the favorite and the underdog. If the favorite is judged to be 5 points
better than the underdog, then the spread is 5. The model is attempting to predict the
probability that the favored team wins a game. If the spread incorporates all information
relevant to whether or not a team wins the game, then the probability of the favorite
winning by more or less than the spread should be 50/50. That is, 0 should be equal to
0.5. Said another way, if the spread was 0 (if there was no favorite), then the probability
of either team winning is 0.5.
(ii) Estimate the model from part (i) by OLS. Test H0 : 0 = 0.5 against a two-sided
alternative. Use both the usual and heteroskedasticity-robust standard errors.
I run this model by using the command reg favwin spread in Stata. If I consult the
t-statistic printed out by Stata, I can directly evaluate the hypothesis H0 : 0 = 0. But
I want to test a different hypothesis. I have to make my own t-stat for that hypothesis.
Luckily, all the necessary information is printed out for me. The relevant t-stat is given
by (0.5769 - 0.5)/0.0282 = 2.73. Based on this t-statistic, I would reject the null at the
= 0.05 level. Performing the exact same procedure with the heteroskedasticity-robust
standard errors does not change things much (you could use the command
reg favwin spread, robust to do this in Stata).
(iii) Is spread statistically significant? What is the estimated probability that the favored
team wins when spread = 10?
3

spread is statistically significant (t-stat = 8.28 or 10.08 depending on whether robust


standard errors are used or not). The estimated probability that the favored team wins
when spread = 10 is obtained by plugging the estimated coefficients back into the model
and assuming a spread value of 10. Alternatively, we can have Stata do the work for us.
I issued the following commands to have Stata do the work for me (I did this with the
regular, i.e. non-robust, standard errors):
reg favwin spread
predict yhat
sum yhat if spread == 10
I used these commands to determine that the predicted probability of the favorite winning when the spread is 10 points is equal to 0.7706.
(iv) Now, estimate a probit model for P (f avwin = 1|spread). Interpret and test the
null hypothesis that the intercept is zero. [Hint: remember that (0) = 0.5.]
To run a probit model instead of an LPM (as we did in the previous step), I simply
modify the Stata code a tiny bit.
probit favwin spread
This gives me a model that I can use to test the null hypothesis that the intercept is
zero. I can do this simply by consulting the Stata regression output. The t-stat is -0.10,
which leads us to not reject the null hypothesis that the intercept is equal to zero. Note
that this is exactly the analog of testing that 0 = 0.5 in the linear model (this is what
the hint was about).
(v) Use the probit model to estimate the probability that the favored team wins when
spread = 10. Compare this with the LPM estimate from part (iii).
Just as above, I use the following commands to evaluate:
probit favwin spread
predict phat
sum phat if spread == 10 When I issue these commands I can see that the predicted
probability of the favorite winning when the spread is equal to 10 is 0.8196, which is a
bit higher than the predictions I obtained using the LPM above.
(vi) Add the variables f avhome, f av25, and und25 to the probit model and test joint
significance of these variables using the likelihood ratio test. (How many df are in the
chi-square distribution?) Interpret this result, focusing on the question of whether the
spread incorporates all observable information prior to a game.
This is stuff from last semester. I wouldnt emphasize this stuff as much since we havent
spent any time on it this semester, but its probably good stuff to review anyway. To
run the model and test using the LR test, I use the following commands in Stata:
probit favwin spread favhome fav25 und25
estimates store Unrestricted
probit favwin spread
estimates store Restricted
lrtest Unrestricted Restricted The result of this test (df = number of restrictions
= 3) suggests that we cannot reject the hypothesis that the coefficients associated with
the 3 new variables are jointly 0. This suggests that the spread really does incorporate

whatever information might otherwise be contained in these 3 variables.

C17.2
Use the data in LOANAPP for this exercise; see also Computer Exercise C7.8.
(i) Estimate a probit model of approve on white. Find the estimated probability of
loan approval for both whites and nonwhites. How doe these compare with the linear
probability estimates?
Running the probit model is straightforward:
probit approve white
predict phat
sum phat if white==0
sum phat if white==1. By doing this, we find that the estimated probability of loan
approval of whites is 0.9084, while for nonwhites it is 0.7078. This is a very similar
picture to what we see if we use an LPM instead (use regress instead of probit in the
previous set of commands)
(ii) Now, add the variables hrat, obrat, loanprc, unem, male, married, dep, sch, cosign,
chist, pubrec, mortlat1, mortlat2, and vr to the probit model. Is there statistically
significant evidence of discrimination against nonwhites?
When I run the suggested model I learn that there is still substantial evidence that being
white influences the probability of loan approval. So yes, there is statistically significant
evidence of discrimination against nonwhites.
(iii) Estimate the model from part (ii) by logit. Compare the coefficient on white to the
probit estimate.
When I do this, I find that the coefficient is different, but the statistical significance is
nearly identical.
(iv) Use equation (17.17) to estimate the sizes of the discrimination effects for probit
and logit.
There are two ways to proceed here. You could consult equation (17.17) and find that
multiplying the logit coefficient by 0.625 makes the two coefficient estimates appear
similar. A more informative thing to do is probably to compare the predictions from the
two models (for instance, we could look at the average predicted probabilities across the
sample using both models). This would be more interesting. When I do this, I find that
the average predicted probability of an approved loan is identical using both models (to
the first 3 digits after the decimal.

1 C.17.3
Use the data in FRINGE for this exercise.
(i) For what percentage of the workers in the sample is pension equal to zero? What is
the range of pension for workers with nonzero pension benefits/ Why is a Tobit model
appropriate for modeling pension?

pension is equal to zero for 27.92 percent of the sample. (I determined that by using the
command tabulate pension). For workers with nonzero pension benefits, the minimum
pension value is 7.28 and the maximum is 2880.27. A Tobit model is appropriate because
the pension variable cannot be less than zero, takes on the value 0 for a significant portion
of the population, and takes on a continuum of values greater than 0. The variable is
continuous, but appears discontinuous (lumpy) at 0.
(ii) Estimate a Tobit model explaining pension in terms of exper, age, tenure, educ,
depends, married, white, and male. Do whites and males have statistically significant
higher expected pension benefits?
I use the following command to estimate this model:
tobit pension exper age tenure educ depends married white male, ll(0).
When I estimate the Tobit model I see that whites and males are both predicted to have
higher expected pension benefits (positive coefficients) but only being male has an effect
that is statistically significant at conventional levels.
(iii) Use the results from part (ii) to estimate the difference in expected pension benefits
for a white male and a nonwhite female, both of whom are 35 years old, are single with
no dependents, have 16 years of eduction, and have 10 years of experience.
There are several ways to do this. You could do it manually using the estimated coefficients, or you could do it a number of ways using Stata. Heres how I did it using Stata
(the lazy way):
tobit pension exper age tenure educ depends married white male, ul(0)
set obs 618
replace white = 1 in 617
replace male = 1 in 617
replace age = 35 in 617
replace married = 0 in 617
replace depends = 0 in 617
replace educ = 16 in 617
replace exper = 10 in 617
replace tenure = 10 in 617
predict yhat
sum yhat if white==1 & male==1 & age==35 & married==0 & depends==0///
& educ==16 & exper==10
Doing this lets me see that a white male with the appropriate description is predicted to
have a pension of 941.1191. Using an identical procedure for a nonwhite female (placing
observations of this individual in observation 618 instead of 617 tells me that a nonwhite
female with the same description is predicted to have a pension of 488.8831, which is significantly less. This is enough to understand that there is a substantial difference in the
predictions. To obtain the predictions from the Tobit model, we would take these linear
predictions and plug them in to equation (17.25; page 590 in the 4th ed.). Knowing the
first part is sufficient for our purposes.
(iv) Add union to the Tobit model and comment on its significance.
Union appears quite significnat (t-stat = 7.03).

(v) Apply the Tobit model from part (iv) but with peratio, the pension-earnings ratio, as
the dependent variable. (Notice that this is a fraction between zero and one, but, though
it often takes on the value zero, it never gets close to being unity. Thus, a Tobit model
is fine as an approximation.) Does gender or race have an effect on the pension-earnings
ratio?
In this case, both gender and race appear not to have an impact. Whites and males do
not seem to have higher pensions as a fraction of earnings. This suggests that differences
in pension levels discovered in previous regressions were due to differences in earnings.

You might also like