HW3
HW3
Problem 3.1
In a study relating college grade point average to time spent in various activities, you
distribute a survey to several students. The students are asked how many hours they spend
each week in four activities: studying (study), sleeping (sleep), working (), and
(leisure). Any activity is put into one of the four categories, so that for each student, the
sum of hours in the four activities must be 168.
= 0 + 1 + 2 + 3 + 4 + ,
does it make sense to hold sleep, work, and leisure fixed, while changing study?
No, it doesnt make any sense to hold sleep, work and leisure fixed and while
changing study. The variable study, sleep, work and leisure are linearly related and all
of them sum 168 hours in a week.
(iii) How could you reformulate the model so that its parameters have a useful
interpretation and it satisfies Assumption MLR.3?
We could drop one of the four variables, for example, we could drop leisure from the
model and we could re-write the model as
= 0 + 1 + 2 + 3 +
One of the variables would change by one hour to keep the 168 hours in a week. In
this way, beta1 tells us what effect substituting one more hour of study for one less
hour of leisure has on GPA. This model satisfies MLR.3.
Problem 3.2
Suppose that average worker productivity at manufacturing firms (avgprod) depends on
two factors, average hours of training (avgtrain) and average worker ability (avgabil):
= 0 + 1 + 2 + .
Assume that this equation satisfies the Gauss-Markov assumptions. If grants have been
given to firms whose workers have less than average ability, so that avgtrain and avgabil
are negatively correlated, what is the likely bias in 1 obtained from the simple regression
of avgprod on avgtrain?
If we omit , we mess up the model as
= 0 + 1 + .
If ( ) affects () and if ( ) and ( ) are
correlated, then omission of ( ) will lead to omitted variables bias.
Since the correlation between the omitted and included variables is negative, and since
the effect of the omitted variable on is positive, we would expect the overall sign of the
bias to be negative.
Problem 3.3
The following equation describes the relationship between fourth-grade pass rates on a
math test, measured as a percent, spending per student (exppp, in dollars), and the percent-
age of students eligible for free and reduced-price lunches (lunch):
4 = 0 + 1 log() + 2+ .
(i) How much is the percentage point change in math4 when exppp increases by 10
percent?
(ii) If expenditure per student is higher at poor schools, are log(exppp) and lunch
positively or negatively correlated?
(iii) The following equations were estimated:
From these simple and multiple regression results, determine whether, in this simple,
log(exppp) and lunch are positively or negatively correlated.
Problem 3.4
The median starting salary for new law school graduates is determined by
where LSAT is the median LSAT score for the graduating class, GPA is the median college
GPA for the class, libvol is the number of volumes in the law school library, cost is the
annual cost of attending law school, and rank is a law school ranking (with = 1
being the best).
(i) What signs do you expect for all slope parameters? Justify your answers.
There should be positive relationships between all of the other slope parameters and
salary. For LSAT and GPA, higher score means higher ability and higher chance of
getting a good job. The number of volumes in the law school library and the annual
cost are both measure of quality. However, there is no guarantee that books is
sufficient and efficiently read and the annual cost could be for other purposes than
education.
log() = 8.34 + 0.0047 + 0.248 + 0.095 ()
+ 0.038 () 0.0033 ; n = 136, 2 = 0.842.
What is the predicted ceteris paribus difference in salary for schools with a median
GPA different by one point? (Report your answer as a percentage.)
24.8%
(iv) Would you say it is better to attend a higher ranked law school? How much is a
difference in ranking of 20 worth in terms of predicted starting salary?
The answer is 20 (.0033) (100) = 6.6%
Computer Exercises (All data are contained in the file Data_HW3.xls)
Due to I am not able to do INFILE in iCloud, I am doing CARDS instead of INFILE and I am
only showing part of the CARDS SAS CODE because the information is too long.
Problem 3.5
Use the data in sheet of WAGE1 to confirm the partialling out interpretation of the OLS
estimates by considering the following model
log() = 0 + 1 + 2 + 3 +
This first requires regressing educ on exper and tenure and saving the residuals,1.
DATA adMaru;
INPUT educ exper tenure @@;
CARDS;
11 2 0
12 22 2
11 2 0
;
TITLE 'Multiple Linear Regression';
PROC REG DATA =adMaru;
MODEL educ= exper tenure /;
RUN;
Then, regress log(wage) on 1.. Compare the coefficient on 1 with the coefficient on educ
in the regression of log(wage) on educ, exper, and tenure.
:
To get the residuals, I substract
1 =
log() = 1.62976 + 0.091121
DATA adMaru;
INPUT logwage r1 @@;
CARDS;
1.131402 -2.43
1.175573 -0.13
1.098612 -2.43
;
TITLE 'Simple Linear Regression';
PROC REG DATA =adMaru;
MODEL logwage= r1 /;
RUN;
Problem 3.6
Use the data in sheet of WAGE2 for this problem. As usual, be sure all of the following
regressions contain an intercept.
(i) Run a simple regression of IQ on educ to obtain the slope coefficient, say,1 .
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 53.68715 2.62293 20.47 <.0001
educ 1 3.53383 0.19221 18.39 <.0001
= 53.68715 + 3.53383
DATA adMaru;
INPUT IQ educ @@;
CARDS;
93 12
119 18
108 14
;
TITLE 'Simple Linear Regression';
PROC REG DATA =adMaru;
MODEL IQ= educ /;
RUN;
(ii) Run the simple regression of log(wage) on educ, and obtain the slope coefficient, 1
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 5.97306 0.08137 73.40 <.0001
educ 1 0.05984 0.00596 10.03 <.0001
log() = 5.97306 + 0.05984
DATA adMaru;
INPUT logwage educ @@;
CARDS;
6.645091 12
6.694562 18
6.715384 14
;
TITLE 'Simple Linear Regression';
PROC REG DATA =adMaru;
MODEL logwage= educ /;
RUN;
(iii) Run the multiple regression of log(wage) on educ and IQ, and obtain the slope
coefficients, 1 2 , respectively.
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 5.65829 0.09624 58.79 <.0001
educ 1 0.03912 0.00684 5.72 <.0001
IQ 1 0.00586 0.00099791 5.88 <.0001
log() = 5.65829 + 0.03912 + 0.00586
DATA adMaru;
INPUT logwage educ IQ @@;
CARDS;
6.645091 12 93
6.694562 18 119
6.715384 14 108
;
TITLE 'Multiple Linear Regression';
PROC REG DATA =adMaru;
MODEL logwage= educ IQ /;
RUN;
Based on this regression model of constant elasticity of annual salary toward the firm
sales and marketing value, the equation can be expressed as:
log( = 4.62092 + 0.16213log() + 0.10671log()
DATA datamaru37;
INPUT logsalary logsales logmktval @@;
CARDS;
7.057037 8.732305 10.05191
6.39693 5.645447 7.003066
5.937536 5.129899 7.003066
(ii) Add profits to the model from part. Why can this variable not be included in
logarithmic form?
we are asked to add the profit variable as explanatory variable in the regression
model, however, profits value can be negative which means that firm loses.
When I run this model in SAS I get:
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 4.68692 0.37973 12.34 <.0001
logsales 1 0.16137 0.03991 4.04 <.0001
logmktval 1 0.09753 0.06369 1.53 0.1275
profits 1 0.00003566 0.00015196 0.23 0.8147
Based on this regression mode of constant elasticity of annual salary toward the firm
sales, markt sales, and profit, the equation can be expressed as:
log() = 4.68692 + 0.16137 log() + 0.09753 log()
+ 0.00003566
= 0.2993
(ii). Would you say that these firm performance variables explain most of the
variation in CEO salaries?
The R-square of this model is 29.93% which means that the rest in percentage change
of annual salary is not explained, therefore, I would not say that this firm
performance variables explain most of the variation in CEO salaries.
DATA datamaru37;
INPUT logsalary logsales logmktval profits @@;
CARDS;
7.057037 8.732305 10.05191 966
6.39693 5.645447 7.003066 48
5.937536 5.129899 7.003066 40
(iii) Add the variable ceoten to the model in part (ii). What is the estimated percentage
return for another year of CEO tenure, holding other factors fixed?
When I run in SAS I get:
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 4.55778 0.38025 11.99 <.0001
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
logsales 1 0.16223 0.03948 4.11 <.0001
logmktval 1 0.10176 0.06303 1.61 0.1083
profits 1 0.00002905 0.00015035 0.19 0.8470
ceoten 1 0.01168 0.00534 2.19 0.0301
Based on this regression model of constant elasticity of annual salary toward the firm
sales, market value, profit, and CEO tenure, the equation cab expressed as:
log() = 4.55778 + 0.16223 log() + 0.10176 log() + 0.00002905
+ 0.01168
DATA datamaru37;
INPUT logsalary logsales logmktval profits ceoten @@;
CARDS;
7.057037 8.732305 10.05191 966 2
6.39693 5.645447 7.003066 48 10
5.937536 5.129899 7.003066 40 3
Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 2 407.94631 203.97316 29.49 <.0001
Error 523 3617.48335 6.91679
Corrected Total 525 4025.42966
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 13.57496 0.18432 73.65 <.0001
exper 1 -0.07379 0.00976 -7.56 <.0001
tenure 1 0.04768 0.01834 2.60 0.0096
Problem 3.5 (Simple Regression)
Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 1 30.04971 30.04971 133.13 <.0001
Error 524 118.28005 0.22573
Corrected Total 525 148.32976
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 1.62976 0.02072 78.64 <.0001
r1 1 0.09112 0.00790 11.54 <.0001
Problem 3.6 (i)
Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 1 56281 56281 338.02 <.0001
Error 933 155347 166.50218
Corrected Total 934 211627
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 53.68715 2.62293 20.47 <.0001
educ 1 3.53383 0.19221 18.39 <.0001
Problem 3.6 (ii)
Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 1 16.13771 16.13771 100.70 <.0001
Error 933 149.51859 0.16026
Corrected Total 934 165.65629
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 5.97306 0.08137 73.40 <.0001
educ 1 0.05984 0.00596 10.03 <.0001
Problem 3.6 (iii)
Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 2 21.47795 10.73897 69.42 <.0001
Error 932 144.17834 0.15470
Corrected Total 934 165.65629
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 5.65829 0.09624 58.79 <.0001
educ 1 0.03912 0.00684 5.72 <.0001
IQ 1 0.00586 0.00099791 5.88 <.0001
Problem 3.7 (i)
Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 2 19.33656 9.66828 37.13 <.0001
Error 174 45.30966 0.26040
Corrected Total 176 64.64622
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 4.62092 0.25441 18.16 <.0001
logsales 1 0.16213 0.03967 4.09 <.0001
logmktval 1 0.10671 0.05012 2.13 0.0347
Problem 3.7 (ii)
Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 3 19.35098 6.45033 24.64 <.0001
Error 173 45.29524 0.26182
Corrected Total 176 64.64622
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 4.68692 0.37973 12.34 <.0001
logsales 1 0.16137 0.03991 4.04 <.0001
logmktval 1 0.09753 0.06369 1.53 0.1275
profits 1 0.00003566 0.00015196 0.23 0.8147
Problem 3.7 (iii)
Multiple Linear Regression
Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 4 20.57681 5.14420 20.08 <.0001
Error 172 44.06941 0.25622
Corrected Total 176 64.64622
`Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 4.55778 0.38025 11.99 <.0001
logsales 1 0.16223 0.03948 4.11 <.0001
logmktval 1 0.10176 0.06303 1.61 0.1083
profits 1 0.00002905 0.00015035 0.19 0.8470
ceoten 1 0.01168 0.00534 2.19 0.0301