15 Splines
15 Splines
Introduction
• We are discussing ways to estimate the regression function f,
where
E(y|x) = f(x)
• One approach is of course to assume that f has a certain
shape, such as linear or quadratic, that can be estimated
parametrically
• We have also discussed locally weighted linear/polynomial
models as a way of allowing f to be more flexible
• An alternative approach is to introduce local basis functions
Basis functions
• A common approach for extending the linear model is to augment
the linear component of x with additional, known functions of x:
𝑀
𝑓 𝑥 = 𝛽𝑚 ℎ𝑚 (𝑥)
𝑚=1
• where the ℎ𝑚 are known functions called basis functions
• Because the basis functions {ℎ𝑚 } are prespecified and the model
is linear in these new variables, ordinary least squares approaches
for model fitting and inference can be employed
• This idea is not new to you, as you have encountered
transformations and the inclusion of polynomial terms in models
in earlier courses
Problems with polynomial regression
• However, polynomial terms introduce undesirable side
effects: each observation affects the entire curve, even for x
values far from the observation
• Not only does this introduce bias, but it also results in
extremely high variance near the edges of the range of x
• As Hastie et al. (2009) put it, “tweaking the coefficients to
achieve a functional form in one region can cause the
function to flap about madly in remote regions"
Problems with polynomial regression (cont'd)
To illustrate this, consider the following simulated example (gray lines
are models fit to 100 observations arising from the true f, colored red):
Global versus local bases
• We can do better than this
• Let us consider instead local basis functions, thereby
ensuring that a given observation affects only the
nearby fit, not the fit of the entire line
• In this lecture, we will discuss splines: piecewise
polynomials joined together to make a single smooth
curve
The piecewise constant model
• To understand splines, we will gradually build up a
piecewise model, starting at the simplest one: the
piecewise constant model
• First, we partition the range of x into K + 1 intervals by
choosing K points 𝜉𝑘 , 𝑘 = 1, … , 𝐾 called knots
• For our example involving bone mineral density, we
will choose the tertiles of the observed ages
The piecewise constant model (cont'd)
The piecewise linear model
The continuous piecewise linear model
Basis functions for piecewise continuous models
Basis functions for piecewise continuous
models
• It can be easily checked that these basis functions lead
to a composite function f(x) that:
• Is linear everywhere except the knots
• Has a different intercept and slope in each region
• Is everywhere continuous
• Also, note that the degrees of freedom add up: 3
regions 2 degrees of freedom in each region - 2
constraints = 4 basis functions
Splines
• The preceding is an example of a spline: a piecewise m -1 degree
polynomial that is continuous up to its first m - 2 derivatives
• By requiring continuous derivatives, we ensure that the resulting
function is as smooth as possible
• We can obtain more flexible curves by increasing the degree of the
spline and/or by adding knots
• However, there is a tradeoff:
• Few knots/low degree: Resulting class of functions may be too
restrictive (bias)
• Many knots/high degree: We run the risk of overfitting (variance)
The truncated power basis
Quadratic splines
Cubic splines
Additional notes
• These types of fixed-knot models are referred to as
regression splines
• Recall that cubic splines contain 4 + K degrees of freedom: K
+ 1 regions × 4 parameters per region - K knots × 3
constraints per knot
• It is claimed that cubic splines are the lowest order spline for
which the discontinuity at the knots cannot be noticed by
the human eye
• There is rarely any need to go beyond cubic splines, which
are by far the most common type of splines in practice
Implementing regression splines
• The truncated power basis has two principal virtues:
• Conceptual simplicity
• The linear model is nested inside it, leading to simple tests
of the null hypothesis of linearity
• Unfortunately, it has several computational/numerical flaws -
it's inefficient and can lead to overflow and nearly singular
matrix problems
• The more complicated but numerically much more stable
and efficient B-spline basis is often employed instead
B-splines