Lecture set 5
Lecture set 5
Yˆ fˆ ( X 1 X 1 , X 2 , , X k ) fˆ ( X 1 , X 2 , , X k ). (8.5)
Nonlinear Functions of a Single Independent
Variable (SW Section 8.2)
We’ll look at two complementary approaches:
1. Polynomials in X
The population regression function is approximated by a
quadratic, cubic, or higher-degree polynomial
2. Logarithmic transformations
Y and/or X is transformed by taking its logarithm, which
provides a “percentages” interpretation of the coefficients
that makes sense in many applications
1. Polynomials in X
Approximate the population regression function by a polynomial:
Yi 0 1 X i 2 X i2 r X ir ui
• This is just the linear multiple regression model – except that the
regressors are powers of X!
• Estimation, hypothesis testing, etc. proceeds as in the multiple
regression model using OLS
• The coefficients are difficult to interpret, but the regression
function itself is interpretable
Example: the TestScore – Income relation
Incomei = average district income in the ith district (thousands of
dollars per capita)
Quadratic specification:
TestScorei = β0 + β1Incomei + β2(Incomei)2 + ui
Cubic specification:
TestScorei = β0 + β1Incomei + β2(Incomei)2 + β3(Incomei)3 + ui
Estimation of the quadratic specification in
STATA
generate avginc2 = avginc*avginc; Create a new regressor
reg testscr avginc avginc2, r;
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
avginc | 3.850995 .2680941 14.36 0.000 3.32401 4.377979
avginc2 | -.0423085 .0047803 -8.85 0.000 -.051705 -.0329119
_cons | 607.3017 2.901754 209.29 0.000 601.5978 613.0056
------------------------------------------------------------------------------
Test the null hypothesis of linearity against the alternative that the
regression function is a quadratic.…
Interpreting the estimated regression
function: (1 of 3)
(a) Plot the predicted values
TestScore 607.3 3.85Incomei 0.0423( Incomei ) 2
(2.9) (0.27) (0.0048)
Interpreting the estimated regression
function: (2 of 3)
(b) Compute the slope, evaluated at various values of X
TestScore 607.3 3.85Incomei 0.0423( Incomei ) 2
(2.9) (0.27) (0.0048)
x x
Here’s why : ln( x x) ln( x) ln 1
x x
d ln( x) 1
(calculus: )
dx x
Numerically :
ln(1.01) .00995 .01;
ln(1.10) .0953 .10 (sort of )
The three log regression specifications:
Yi = β0 + β1ln(Xi) + ui
for small ΔX,
Y
1
X /X
X
Now 100 percentage change in X , so a 1% increase
X
in X ( multiplying X by 1.01) is associated with a .01 1
change in Y .
Y
so 1X
Y
Y /Y
or 1 (small X )
X
II. Log-linear population regression function
(2 of 2)
ln(Yi ) 0 1 X i ui
Y /Y
for small X , 1
X
Y
• Now 100 percentage change in Y , so a change in X by
Y
one unit ( X = 1) is associated with a 100 1 % change in Y .
Y X
so 1
Y X
Y /Y
or 1 (small X )
X /X
III. Log-log population regression function
(2 of 2)
ln(Yi) = β0 + β1ln(Xi) + ui
for small ΔX,
Y /Y
1
X /X
Y X
Now 100 percentage change in Y , and 100 percentage
Y X
change in X , so a 1% change in X is associated with a 1 %
change in Y .
TestScore
PctEL
STR
0 –1.12
20% –1.12 + .0012 × 20 = –1.10
Example: TestScore, STR, PctEL (2 of 2)
TestScore 686.3 1.12STR 0.67 PctEL .0012( STR PctEL),
(11.8) (0.59) (0.37) (0.019)
• Does population coefficient on STR×PctEL = 0?
t = .0012/.019 = .06 → can’t reject null at 5% level
• Does population coefficient on STR = 0?
t = –1.12/0.59 = –1.90 → can’t reject null at 5% level
• Do the coefficients on both STR and STR×PctEL = 0?
F = 3.89 (p-value = .021) → reject null at 5% level(!!) (high but
imperfect multicollinearity)
Summary: Nonlinear Regression Functions
• Using functions of the independent variables such as ln(X ) or X1×X2,
allows recasting a large family of nonlinear regression functions as
multiple regression.
• Estimation and inference proceed in the same way as in the linear
multiple regression model.
• Interpretation of the coefficients is model-specific, but the general rule
is to compute effects by comparing different cases (different reference
value of the original X’s)
• Many nonlinear specifications are possible, so you must use judgment:
– What nonlinear effect you want to analyze?
– What makes sense in your application?
APPENDIX
Estimation of a cubic specification in STATA
(1 of 2)
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
avginc | 5.018677 .7073505 7.10 0.000 3.628251 6.409104
avginc2 | -.0958052 .0289537 -3.31 0.001 -.1527191 -.0388913
avginc3 | .0006855 .0003471 1.98 0.049 3.27e-06 .0013677
_cons | 600.079 5.102062 117.61 0.000 590.0499 610.108
------------------------------------------------------------------------------
Estimation of a cubic specification in STATA
(2 of 2)
Testing the null hypothesis of linearity, against the alternative that the population
regression is quadratic and/or cubic, that is, it is a polynomial of degree up to 3:
H0: population coefficients on Income2 and Income3 = 0
H1: at least one of these coefficients is nonzero.
test avginc2 avginc3
( 1) avginc2 = 0.0
( 2) avginc3 = 0.0
F( 2, 416) = 37.69
Prob > F = 0.0000
Y 0 e 1 X
β0, β1, and α are unknown parameters. This is called a negative
exponential growth curve. The asymptote as X → ∞ is β0.
Negative exponential growth
We want to estimate the parameters of,
Yi 0 e 1 X i ui
or Yi 0 [1 e 1 ( X i 2 ) ] ui (*)
where 0e 2 (why would you do this???)
Compare model (*) to linear-log or cubic models:
Yi 0 1 ln( X i ) ui
Yi 0 1 X i 2 X i2 2 X i3 ui
n
min 0 ,1 ,2 Yi 0 1 e
2
1 ( X i 2 )
i 1
(obs = 420)
Iteration 0: residual SS = 1.80e+08 .
Iteration 1: residual SS = 3.84e+07 .
Iteration 2: residual SS = 4637400 .
Iteration 3: residual SS = 300290.9 STATA is “climbing the hill”
Iteration 4: residual SS = 70672.13 (actually, minimizing the SSR)
Iteration 5: residual SS = 66990.31 .
Iteration 6: residual SS = 66988.4 .
Iteration 7: residual SS = 66988.4 .
Iteration 8: residual SS = 66988.4