5103A1
5103A1
Maesha Armeen
A0210255W
29.09.2019
Answer 1
(a) Under what assumptions will the OLS estimate of _1 provide an unbiased estimate of _1?
Are these assumptions realistic?
a) OLS estimate of ai will provide an unbiased and consistent estimate of Bi under MLR 1-4, which are as
follows
MLR1: Linear in parameters the model should be linear in parameters meaning that increase of unit size
and price per square foot of the house must hold a linear relationship. The researcher can use specific
alternations like log, exponential etc. while running the specified regression, but the linearity in
parameters in the original regression must hold.
MLR2: Random Sampling the housing data must be a random sample from the population. This allows
us to make unbiased interpretations or else a flawed conclusion would be drawn based on a specific
portion of the population.
MLR3: No perfect Collinearity all the values of the independent variables cannot be the same which
means each x cannot be a linear function of another or constant.
MLR4: Zero conditional Mean the explanatory variables cannot contain any information about the
mean of the error terms, i.e. they must be endogenous.
There are further two assumptions ML5: variance of explanatory variable must be constant and ML6
normality, ei is independent of lnsi under all which OLS is BLUE (best linear unbiased estimator).
However, the coefficient is unbiased if it satisfies just MLR1-4.
These assumptions are not always realistic, however is it important to make these assumptions to infer
the causal effect. In real life, these assumptions are unlikely to hold since we can expect unit size to have
a proportional increasing on pricing, researchers have no details about the randomness of the sample
data, explanatory variables are likely to hold some information about residuals or other parameters.
(b) If these assumptions are violated, will the OLS estimate of _1 over- or under-estimate the
impact of size on price? Explain your answer.
Violations of these assumptions may make the results worthless or the effect can be usually trivial. In
this case, dropping a variable from the regression leads to violation of omitted variable bias because of
zero conditional mean. Thus, alphai is likely to be overestimated since number of bedrooms and unit
size hold a positive relationship.
Answer 2
(a) Rename variables R0000300 R0000500 R0618300 to birth month birth day, and afqt.
(b) Use recode 1) to convert invalid values into missing and 2) to recode sex, which is currently
defined as = 1 for male, = 2 for female, into = 0 for male and = 1 for female, and rename
i) After checking all the variables with tab for negitive value,found out that birth_month and birth_day
has no negitive values thus,recoding them is not required.
. recode ind04(-5/-3=.)
. recode wage04(-5/-4=.)
. recode edu(-5=.)
. recode age04(-5=.)
ii)
. recode sex(1=0)
. recode sex(2=1)
Table 1:
(d) Create a new variable birthq that contains information on a person's birth quarter. For
example, if a person was born in May, then his birthq will have the value of 2.
. gen birthq=birth_month
. recode birthq(1/3=1)
. recode birthq(4/6=2)
. recode birthq(7/9=3)
. recode birthq(10/12=4)
. tab birthq
quarters.
From the above tables if we compare the mean values of the forth quarter and the first three quarters
we can notice that the mean of the fourth quarter(i.e. 13.55) is less than the cumulative mean of the
first three quarters(13.23).As a result, the people born in the last quarter are less educated.
(f) Plot the histogram of wage04 and log(wage04). (You can use the command histogram and
you need to create a new variable lw04 = log(wage04)).
. gen lw04=log(wage04)
. hist wage04
(g) Explain why researchers prefer to use log wage rather than wage in the regression.
Researchers prefer to use log wage instead of wage as they care about percentage changes in wages
rather than absolute changes. Moreover, using log enables them to normalize the data.
. gen exp=age04-edu-5
(i) Regress log wage on education, female, a quadratic function of potential years of experience.
Report the regression results in column 1 of Table 2.
. gen exp2=(exp)^2
(j) Regress log wage on ducation, female, a quadratic function of potential years of experience, and
AFQT scores. Report the regression results in column 2 of Table 2.
Table 2
lw04 (1) (2) (3)
edu 0.1160641
female -0.8276587
exp2 0.0004232
afqt -0.0030298
_cons 12.45203
Observations 2,541
R-squared 0.0332
(k) Comment on the difference in the coefficients on education between these two columns.
The coefficient of edu increases from 0.0849 to 0.116 after we add variable afqt in the regression. This
means that in the first regression the coefficient was underestimated because of omitted variable bias,
thus adding another variable makes it more unbiased.