0% found this document useful (0 votes)
17 views20 pages

Multiple Regression Analysis Insights

ecnometric course content

Uploaded by

amyake
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views20 pages

Multiple Regression Analysis Insights

ecnometric course content

Uploaded by

amyake
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

FINM3123 Introduction to Econometrics

Chapter 6
Multiple Regression Analysis: Further Issues

1
Multiple Regression Analysis: Further Issues
§ More on Functional Form
§ More on using logarithmic functional forms
• Convenient percentage/elasticity interpretation
• Slope coefficients of logged variables are invariant to rescalings
• Taking logs often eliminates/mitigates problems with outliers
• Taking logs often helps to secure normality and homoscedasticity
• Variables measured in units such as years should not be logged
• Variables measured in percentage points should also not be logged
• Logs must not be used if variables take on zero or negative values
• It is hard to reverse the log-operation when constructing predictions

2
Multiple Regression Analysis: Further Issues
Using quadratic functional forms
Concave experience profile
§ Example: Wage equation

The first year of experience increases


§ Marginal effect of experience the wage by some .30$, the second
year by .298-2(.0061)(1) = .29$ etc.

3
Multiple Regression Analysis: Further Issues
Wage maximum with respect to work experience

Does this mean the return to experience


becomes negative after 24.4 years?

Not necessarily. It depends on how many


observations in the sample lie right of the
turnaround point.

In the given example, these are about 28% of


the observations. There may be a specification
problem (e.g. omitted variables).

4
Multiple Regression Analysis: Further Issues
Nitrogen oxide in air, distance from employment
Example: Effects of pollution on housing prices centers, student/teacher ratio

Does this mean that, at a low number of rooms,


more rooms are associated with lower prices?

5
Multiple Regression Analysis: Further Issues
Calculation of the turnaround point

Turnaround point:

This area can be ignored


as it concerns only 1% of
the observations. Increase rooms from 5 to 6:

Increase rooms from 6 to 7:

6
Multiple Regression Analysis: Further Issues
§ Other possibilities

§ Higher polynomials

7
Multiple Regression Analysis: Further Issues
§ Models with interaction terms
Interaction term

The effect of the number of bedrooms


depends on the level of square footage

§ Interaction effects complicate the interpretation of parameters


𝛽! = effect of number of bedrooms, but for a square footage of zero

8
Multiple Regression Analysis: Further Issues
§ Reparametrization of interaction effects
Population means; may be
replaced by sample means

Effect of 𝑥2 if all variables take on their mean values

§ Advantages of reparametrization
• Easy interpretation of all parameters
• Standard errors for partial effects at the mean values available
• If necessary, interaction may be centered at other interesting values

9
Multiple Regression Analysis: Further Issues
More on goodness-of-fit and selection of regressors
§ General remarks on R-squared
• A high R-squared may result in misleading conclusions.
• A low R-squared does not preclude precise estimation of partial effects
§ Adjusted R-squared
• What is the ordinary R-squared supposed to measure?

is an estimate for

Population R-squared

10
Multiple Regression Analysis: Further Issues
Correct degrees of freedom of
numerator and denominator
§ Adjusted R-squared (cont.)
• A better estimate taking into account degrees of freedom would be

• The adjusted R-squared imposes a penalty for adding new regressors


• The adjusted R-squared increases if, and only if, the 𝑡-statistic of a newly added
regressor is greater than one in absolute value

§ Relationship between R-squared and adjusted R-squared


The adjusted R-squared
may even get negative
11
Multiple Regression Analysis: Further Issues
Using adjusted R-squared to choose between nonnested models
§ Models are nonnested if neither model is a special case of the other

§ A comparison between the R-squared of both models would be unfair to the first
model because the first model contains fewer parameters
§ In the given example, even after adjusting for the difference in degrees of freedom,
the quadratic model is preferred

12
Multiple Regression Analysis: Further Issues
Comparing models with different dependent variables
§ R-squared or adjusted R-squared should not be used to compare models which differ in
their definition of the dependent variable
Example: CEO compensation and firm performance

There is much
less variation in
log(salary) that
needs to be
explained than
in salary

13
Multiple Regression Analysis: Further Issues
Controlling for too many factors in regression analysis: over controlling
§ In some cases, certain variables should not be held fixed
• In a regression of traffic fatalities on state beer taxes (and other factors) one should
not directly control for beer consumption
• In a regression of family health expenditures on pesticide usage among farmers one
should not control for doctor visits
§ Different regressions may serve different purposes
• In a regression of house prices on house characteristics, one would only include
price assessments if the purpose of the regression is to study their validity;
otherwise one would not include them

14
Multiple Regression Analysis: Further Issues
§ Adding regressors to reduce the error variance
• Adding regressors may exacerbate multicollinearity problems
• On the other hand, adding regressors reduces the error variance
• Variables that are uncorrelated with other regressors should be added because they
reduce error variance without increasing multicollinearity
• However, such uncorrelated variables may be hard to find

§ Example: Individual beer consumption and beer prices


• Including individual characteristics (such as age and education) in a regression of
beer consumption on beer prices leads to more precise estimates of the price
elasticity

15
Multiple Regression Analysis: Further Issues
Predicting y when log(y) is the dependent variable
log 𝑦 = 𝛽" + 𝛽# 𝑥# + 𝛽! 𝑥! + ⋯ + 𝛽$ 𝑥$ + 𝑢 population model

-𝑦 = 𝛽." + 𝛽.# 𝑥# + 𝛽.! 𝑥! + ⋯ + 𝛽.$ 𝑥$


log prediction

-𝑦 but this does not work,


The first idea to predict 𝑦 could be to set 𝑦/ = exp log
in fact, this would always underestimate the expected value of 𝑦.

Take exponential on both sides of the population model:


𝑦 = exp(𝑢) exp(𝛽" + 𝛽# 𝑥# + 𝛽! 𝑥! + ⋯ + 𝛽$ 𝑥$ )

Under Assumptions MLR.1-MLR.6, we know that 𝑢 ∼ 𝑁(0, 𝜎 ! ), therefore


𝔼 𝑦|𝒙 = exp 𝜎 "/2 exp(𝛽" + 𝛽# 𝑥# + 𝛽! 𝑥! + ⋯ + 𝛽$ 𝑥$ )
16
Multiple Regression Analysis: Further Issues
Define 𝑚 -𝑦 = exp(𝛽." + 𝛽.# 𝑥# + 𝛽.! 𝑥! + ⋯ + 𝛽.$ 𝑥$ )
9 % ≔ exp log
A predictor for 𝑦 is 𝒚 9 𝟐 /𝟐 𝒎
9 = 𝐞𝐱𝐩 𝝈 9𝒊

If we only assume MLR.1-MLR.5, then we don’t know 𝔼 exp 𝑢 , but we can approximate it by
#
𝛼/" ≔ ∑(%)# exp(𝑢/ % ) , where 𝑢/ % = log 𝑦% − 𝛽." − 𝛽.# 𝑥%# − 𝛽.! 𝑥%! − ⋯ − 𝛽.$ 𝑥%$
(
9=𝜶
Then we can predict 𝑦 by 𝒚 9𝟎𝒎
9𝒊

Another approximation of 𝔼 exp 𝑢 can be obtained by a simple regression through the origin
of 𝑦% on 𝑚
9 % , since 𝑦 = exp 𝑢 𝑚% . ∑%"#$ 𝑚
% " 𝑦"
The OLS slope estimate from the simple regression 𝑦% on 𝑚
9 % (no intercept) is 𝛼!! = %
∑"#$ 𝑚 % "&
Then we can predict 𝑦 by 𝒚 9=𝜶 + 𝟎𝒎9𝒊
-𝑦 are consistent but not unbiased.
These three predictors of 𝑦 from log
17
Multiple Regression Analysis: Further Issues
Comparing R-squared of a logged and an unlogged specification

𝑅 ! is the square of the correlation between 𝑦" and 𝑦&"

These are the R-squareds for the predictions of the unlogged salary
variable (although the second regression is originally for logged salaries).
Both R-squareds can now be directly compared.

𝑅' ! can be defined as the square of the correlation between 𝑦" and 𝑦&" = 𝛼&# 𝑚 * " (which is equal to the
square of the correlation between 𝑦" and 𝑚 * " ). Another possible definition is to define the residuals
* " and use them in the formula for R-squared from linear regression:
𝑟"̂ = 𝑦" − 𝛼&# 𝑚 ∑%"#$ 𝑟"̂
1−
∑%"#$(𝑦" − 𝑦)
( 18
Summary
§ Functional form
• Logarithmic functional form
• Interpretation: elasticity, semi-elasticity
• When to use logarithmic form
• Quadratic functional form
• Turnaround point calculation
• Interaction terms
• Interpretation of parameters
• Reparametrization

19
Summary
§ Adjusted R-squared

• Adjusted R-squared increases if and only if the 𝑡-statistic of a newly added regressor is
greater than one in absolute value.
• Relationship between R-squared and adjusted R-squared

• Can be used to choose between nonnested models when dependent variable is the same
§ Predicting y when log(y) is the dependent variable
• Two approaches: 𝛼!! , 𝛼$! .
• Comparing R-squared of a logged and an unlogged specification.
20

You might also like