0% found this document useful (0 votes)
8 views

Econometrics - Week 5 Tutorials 2024

Econometreics- Week 5 Tutorials 2024

Uploaded by

Joy Chauruka
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Econometrics - Week 5 Tutorials 2024

Econometreics- Week 5 Tutorials 2024

Uploaded by

Joy Chauruka
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Week 5 Tutorials: Multiple Linear Regression

BEE1023 Introduction to Econometrics

Dr Eva Poen and Dr Amy Binner

Attempt to solve all exercises in this problem set before you attend your tutorial group.
Active participation in the tutorials is essential to your learning success in this module.

Note: Before working on the problems in this tutorial sheet, complete the mandatory reading
for week 4 as indicated on ELE.

Note: The first exercise in this problem set requires you to run regressions and calculate
descriptive statistics using Stata. If you didn’t attend our online Stata seminar in week 4, make
sure you watch the recording. Instructions on how to install Stata on your computer can be
found on ELE in the Tutorials section.

Problem 1: Computer Exercise

Download the file hprice2.dta to a suitable directory on your computer and double-click the
file to open it. The data set contains 506 observations on neighbourhoods in the Boston area
(1 observation = 1 neighbourhood). We aim to relate median house price in the neighbourhood
(price, in USD) to various community characteristics. The variable crime measures the number
of crimes per capita committed in the neighbourhood, and lowstat is the percentage of residents
of ”lower socio-economic status” in the community. nox is the amount of nitrogen oxides in the
air (parts per million). Expectations
B1- effect of increase in crime to
a) Estimate the following model by OLS: sum y x1 x2 xk price (increase crime, decrease
price)
price = β0 + β1 crime + β2 lowstat + β3 nox + u B2- effect of population living in
poverty on price
Report your results in equation form. Also report R2 from this regression. B3- effect of air pollution on price

b) Interpret the coefficients from your regression. Are the signs what you expected them to
R-squared: the proportion of variance in dependent (median house prices) explained by
be?
independent variables (price)
Adjusted R-Squared: penalizes me for adding lots of variables to my model, especially when no of
regressors gets close to no if observations

February 2024 B1- coefficient of crime-: -88.7 means for 1 unit (crime per capita) increase is predicted to decrease median
1 house price
by $88.7 USD, holding fixed nox (pollution) and lowstat(population in poverty) [This was our expected sign]
B2- coefficient of lowstat (%) - -891.3 means for 1 % point increase in percentage of residents living in poverty is predicted to
decrease house price by $891.3 USD, holding fixed crime and nox
B3- coefficient of nox(amount of nitrogen oxide)- [counter-intuitive because more nox should = lower price] makes sense because we
remove correlation with lowstat and crime (partialled out) - suggests we have omitted variable bias: +151.7

c) Next, run the following regression where we are adding the variable dist to the model.
dist is the weighted distance of the neighbourhood from five employment centers, in miles.
(I.e., the larger dist is, the further residents have to travel to get to work.)
We expect that an increase in dist will decrease house prices
price = β0 + β1 crime + β2 lowstat + β3 nox + β4 dist + u

Report your results.

d) Compare your results to those obtained earlier in question a). Can you guess if nox and
dist are positively or negatively correlated in our sample? In your opinion, does the model
in a) suffer from omitted variable bias?

e) Obtain the sample correlations between price, nox and dist. Do the results match your
expectations in d)?
We expect dist and nox to be negatively correlated, Houses further from industries should have less air pollutionAfter generating the new
regression including distance, increase in negativity of crime, lowstat, distance, nox. This is an indicated that the previous model was
affected by omitted variable bias.
Problem 2

Suppose that average worker productivity at manufacturing firms (avgprod ) depends on two
factors, average hours of training (avgtrain) and average worker ability (avgabil ):

avgprod = β0 + β1 avgtrain + β2 avgabil + u.

Assume that this equation satisfies the Gauss-Markov assumptions. If grants have been given to
firms whose workers have less than average ability, so that avgtrain and avgabil are negatively
correlated, what is the likely bias in βb1 obtained from the simple regression of avgprod on
avgtrain?

Problem 3

The following equations were estimated using data on US law schools, where lsalary is the log
of the mean salary of graduates, rank is law school ranking, GPA is median college GPA of
graduates, and age is the age of the law school in years.

\
lsalary = 9.9 − 0.0041 rank + 0.294 GPA
n = 142, R2 = 0.8238

\
lsalary = 9.86 − 0.0038 rank + 0.295 GPA + 0.00017 age
n = 99, R2 = 0.8036

How can it be that the R-squared is smaller when the variable age is added to the equation?
Usually we can't add a new dependent variable to a model and R squared decreases, the issue is sample size has decreased. We have lost
some observations because maybe some didn't have information on the variable age. ALWAYS CONTROL SAMPLE WHEN ADDING
VARIABLES TO MODELS.
February 2024 2
Problem 4

In a study relating college grade point average to time spent in various activities, you distribute
a survey to several students. The students are asked how many hours they spend each week in
four activities: studying, sleeping, working, and leisure. Any activity is put into one of the four
categories, so that for each student, the sum of hours in the four activities must be 168.

a) In the model
Leisure includes everything else like showering, grocery shopping etc.
GPA = β0 + β1 study + β2 sleep + β3 work + β4 leisure + u,

does it make sense to hold sleep, work, and leisure fixed, while changing study?
It is not possible because all hours need to add up to 168. If I increase study, I have to decrease another variable (time is finite)
b) Explain why this model violates Assumption MLR.3.
This model has a perfect linearity- because sum of hours is fixed.
c) How could you reformulate the model so that its parameters have a useful interpretation
and it satisfies Assumption MLR.3?

We can drop one of the variables (eg. sleep- if i know the hours for the remaining 3, I know the hours of sleep whilst still being able to
increase hours of study without having to decrease another variables.
Which variable you drop doesn't affect our results.
Estimates will differ but predictions won't change [Base category- the variable we drop]

February 2024 3

You might also like