Nonlinear Regression Part 1
Nonlinear Regression Part 1
Nonlinear Regression
Functions
Nonlinear Regression Functions
(SW Chapter 8)
Outline
1. Nonlinear regression functions – general comments
2. Nonlinear functions of one variable
3. Nonlinear functions of two variables: interactions
2
The TestScore – STR relation looks
linear (maybe)…
3
But the TestScore – Income relation
looks nonlinear...
4
Nonlinear Regression Population Regression
Functions – General Ideas (SW Section 8.1)
If a relation between Y and X is nonlinear:
Assumptions
1. E(ui| X1i,X2i,…,Xki) = 0 (same); implies that f is the
conditional expectation of Y given the X’s.
2. (X1i,…,Xki,Yi) are i.i.d. (same).
3. Big outliers are rare (same idea; the precise mathematical
condition depends on the specific f).
4. No perfect multicollinearity (same idea; the precise statement
depends on the specific f).
6
Nonlinear Functions of a Single
Independent Variable (SW Section 8.2)
We’ll look at two complementary approaches:
1. Polynomials in X
The population regression function is approximated by a
quadratic, cubic, or higher-degree polynomial
2. Logarithmic transformations
Y and/or X is transformed by taking its logarithm
this gives a “percentages” interpretation that makes sense
in many applications
7
1. Polynomials in X
Approximate the population regression function by a polynomial:
Yi = 0 + 1Xi + 2 X i2 +…+ r X ir + ui
Quadratic specification:
Cubic specification:
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
avginc | 3.850995 .2680941 14.36 0.000 3.32401 4.377979
avginc2 | -.0423085 .0047803 -8.85 0.000 -.051705 -.0329119
_cons | 607.3017 2.901754 209.29 0.000 601.5978 613.0056
------------------------------------------------------------------------------
11
Interpreting the estimated
regression function, ctd:
(b) Compute “effects” for different values of X
TestScore = 607.3 + 3.85Incomei – 0.0423(Incomei)2
(2.9) (0.27) (0.0048)
TestScore = 607.3 + 3.85 6 – 0.0423 62
13
Estimation of a cubic specification
in STATA
gen avginc3 = avginc*avginc2; Create the cubic regressor
reg testscr avginc avginc2 avginc3, r;
------------------------------------------------------------------------------
| Robust
14
Testing the null hypothesis of linearity, against the alternative
that the population regression is quadratic and/or cubic, that is, it
is a polynomial of degree up to 3:
test avginc2 avginc3; Execute the test command after running the regression
( 1) avginc2 = 0.0
( 2) avginc3 = 0.0
F( 2, 416) = 37.69
Prob > F = 0.0000
X
now ln(X + X) – ln(X) ,
X
X
so Y 1
X
Y
or 1 (small X)
X / X 18
Linear-log case, continued
Yi = 0 + 1ln(Xi) + ui
X
Now 100 = percentage change in X, so a 1% increase in X
X
(multiplying X by 1.01) is associated with a .011 change in Y.
(1% increase in X .01 increase in ln(X)
.011 increase in Y)
19
Example: TestScore vs. ln(Income)
First defining the new regressor, ln(Income)
The model is now linear in ln(Income), so the linear-log model
can be estimated by OLS:
TestScore = 557.8 + 36.42 ln(Incomei)
(3.8) (1.40)
21
II. Log-linear population regression
function
ln(Y) = 0 + 1X (b)
Y
so 1X
Y
Y / Y
or 1 (small X)
X
22
Log-linear case, continued
ln(Yi) = 0 + 1Xi + ui
Y / Y
for small X, 1
X
Y
Now 100 = percentage change in Y, so a change in X by
Y
one unit (X = 1) is associated with a 100 1% change in Y.
1 unit increase in X 1 increase in ln(Y)
100 1% increase in Y
Note: What are the units of ui and the SER?
fractional (proportional) deviations
23
III. Log-log population regression
function
ln(Yi) = 0 + 1ln(Xi) + ui (b)
Y X
so 1
Y X
Y / Y
or 1 (small X)
X / X
24
Log-log case, continued
ln(Yi) = 0 + 1ln(Xi) + ui
ln(TestScore) = 6.336 + 0.0554 ln(Incomei)
(0.006) (0.0021)
27
Summary: Logarithmic
transformations
Three cases, differing in whether Y and/or X is transformed
by taking logarithms.
The regression is linear in the new variable(s) ln(Y) and/or
ln(X), and the coefficients can be estimated by OLS.
Hypothesis tests and confidence intervals are now
implemented and interpreted “as usual.”
The interpretation of 1 differs from case to case.
Choice of specification should be guided by judgment (which
interpretation makes the most sense in your application?),
tests, and plotting predicted values
28
Interactions Between Independent
Variables (SW Section 8.3)
Perhaps a class size reduction is more effective in some
circumstances than in others…
Perhaps smaller classes help more if there are many English
learners, who need individual attention
TestScore
That is, might depend on PctEL
STR
Y
More generally, might depend on X2
X 1
How to model such “interactions” between X1 and X2?
We first consider binary X’s, then continuous X’s
29
(a) Interactions between two binary
variables
Yi = 0 + 1D1i + 2D2i + ui
30
Interpreting the coefficients
Yi = 0 + 1D1i + 2D2i + 3(D1i D2i) + ui
TestScore = 664.1 – 18.2HiEL – 1.9HiSTR – 3.5(HiSTR HiEL)
(1.4) (2.3) (1.9) (3.1)
Di is binary, X is continuous
As specified above, the effect on Y of X (holding constant D) =
2, which does not depend on D
To allow the effect of X to depend on D, include the
“interaction term” Di Xi as a regressor:
33
Binary-continuous interactions: the
two regression lines
Yi = 0 + 1Di + 2Xi + 3(Di Xi) + ui
Yi = 0 + 1 + 2Xi + 3Xi + ui
= (0+1) + (2+3)Xi + ui The D=1 regression line
34
Binary-continuous interactions, ctd.
35
Interpreting the coefficients
Yi = 0 + 1Di + 2Xi + 3(Di Xi) + ui
When HiEL = 0:
TestScore = 682.2 – 0.97STR
When HiEL = 1,
TestScore = 682.2 – 0.97STR + 5.6 – 1.28STR
= 687.8 – 2.25STR
Two regression lines: one for each HiSTR group.
Class size reduction is estimated to have a larger effect when
the percent of English learners is large. 37
Example, ctd: Testing hypotheses
TestScore = 682.2 – 0.97STR + 5.6HiEL – 1.28(STR HiEL)
(11.9) (0.59) (19.5) (0.97)
The two regression lines have the same slope the
coefficient on STRHiEL is zero: t = –1.28/0.97 = –1.32
The two regression lines have the same intercept the
coefficient on HiEL is zero: t = –5.6/19.5 = 0.29
The two regression lines are the same population
coefficient on HiEL = 0 and population coefficient on
STR HiEL = 0: F = 89.94 (p-value < .001) !!
We reject the joint hypothesis but neither individual
hypothesis (how can this be?)
38
(c) Interactions between two
continuous variables
Yi = 0 + 1X1i + 2X2i + ui
43
Strategy for Question #1 (different
effects for different STR?)
Estimate linear and nonlinear functions of STR, holding constant
relevant demographic variables
PctEL
Income (remember the nonlinear TestScore-Income relation!)
LunchPCT (fraction on free/subsidized lunch)
See whether adding the nonlinear terms makes an “economically
important” quantitative difference (“economic” or “real-world”
importance is different than statistically significant)
Test for whether the nonlinear terms are significant
44
Strategy for Question #2
(interactions between PctEL and
STR?)
Estimate linear and nonlinear functions of STR, interacted with
PctEL.
If the specification is nonlinear (with STR, STR2, STR3), then you
need to add interactions with all the terms so that the entire
functional form can be different, depending on the level of
PctEL.
We will use a binary-continuous interaction specification by
adding HiELSTR, HiELSTR2, and HiELSTR3.
45
46
Tests of joint hypotheses:
47