15 Splines

The document discusses the estimation of regression functions using splines, which are piecewise polynomials that allow for more flexibility than traditional polynomial regression. It highlights the issues with polynomial regression, such as bias and high variance, and introduces local basis functions and B-splines as alternatives. Additionally, it covers the implementation of regression splines in R, the concept of smoothing splines, and the use of the mgcv package for generalized additive models.

Uploaded by

jameszipo1016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

15 Splines

Uploaded by

jameszipo1016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Splines

Introduction
• We are discussing ways to estimate the regression function f,
where
E(y|x) = f(x)
• One approach is of course to assume that f has a certain
shape, such as linear or quadratic, that can be estimated
parametrically
• We have also discussed locally weighted linear/polynomial
models as a way of allowing f to be more flexible
• An alternative approach is to introduce local basis functions
Basis functions
• A common approach for extending the linear model is to augment
the linear component of x with additional, known functions of x:
𝑀

𝑓 𝑥 = ෍ 𝛽𝑚 ℎ𝑚 (𝑥)
𝑚=1
• where the ℎ𝑚 are known functions called basis functions
• Because the basis functions {ℎ𝑚 } are prespecified and the model
is linear in these new variables, ordinary least squares approaches
for model fitting and inference can be employed
• This idea is not new to you, as you have encountered
transformations and the inclusion of polynomial terms in models
in earlier courses
Problems with polynomial regression
• However, polynomial terms introduce undesirable side
effects: each observation affects the entire curve, even for x
values far from the observation
• Not only does this introduce bias, but it also results in
extremely high variance near the edges of the range of x
• As Hastie et al. (2009) put it, “tweaking the coefficients to
achieve a functional form in one region can cause the
function to flap about madly in remote regions"
Problems with polynomial regression (cont'd)
To illustrate this, consider the following simulated example (gray lines
are models fit to 100 observations arising from the true f, colored red):
Global versus local bases
• We can do better than this
• Let us consider instead local basis functions, thereby
ensuring that a given observation affects only the
nearby fit, not the fit of the entire line
• In this lecture, we will discuss splines: piecewise
polynomials joined together to make a single smooth
curve
The piecewise constant model
• To understand splines, we will gradually build up a
piecewise model, starting at the simplest one: the
piecewise constant model
• First, we partition the range of x into K + 1 intervals by
choosing K points 𝜉𝑘 , 𝑘 = 1, … , 𝐾 called knots
• For our example involving bone mineral density, we
will choose the tertiles of the observed ages
The piecewise constant model (cont'd)
The piecewise linear model
The continuous piecewise linear model
Basis functions for piecewise continuous models
Basis functions for piecewise continuous
models
• It can be easily checked that these basis functions lead
to a composite function f(x) that:
• Is linear everywhere except the knots
• Has a different intercept and slope in each region
• Is everywhere continuous
• Also, note that the degrees of freedom add up: 3
regions 2 degrees of freedom in each region - 2
constraints = 4 basis functions
Splines
• The preceding is an example of a spline: a piecewise m -1 degree
polynomial that is continuous up to its first m - 2 derivatives
• By requiring continuous derivatives, we ensure that the resulting
function is as smooth as possible
• We can obtain more flexible curves by increasing the degree of the
spline and/or by adding knots
• However, there is a tradeoff:
• Few knots/low degree: Resulting class of functions may be too
restrictive (bias)
• Many knots/high degree: We run the risk of overfitting (variance)
The truncated power basis
Quadratic splines
Cubic splines
Additional notes
• These types of fixed-knot models are referred to as
regression splines
• Recall that cubic splines contain 4 + K degrees of freedom: K
+ 1 regions × 4 parameters per region - K knots × 3
constraints per knot
• It is claimed that cubic splines are the lowest order spline for
which the discontinuity at the knots cannot be noticed by
the human eye
• There is rarely any need to go beyond cubic splines, which
are by far the most common type of splines in practice
Implementing regression splines
• The truncated power basis has two principal virtues:
• Conceptual simplicity
• The linear model is nested inside it, leading to simple tests
of the null hypothesis of linearity
• Unfortunately, it has several computational/numerical flaws -
it's inefficient and can lead to overflow and nearly singular
matrix problems
• The more complicated but numerically much more stable
and efficient B-spline basis is often employed instead
B-splines

• k is the order of the polynomial segments of the B-spline curve.

Order k means that the curve is madeup of piecewise polynomial
segments of degree k − 1
B-splines
B-splines
B-splines
B-splines
B-splines in R
• Fortunately, one can use B-splines without knowing the details
behind their complicated construction
• In the splines package (which by default is installed but not
loaded), the bs() function will implement a B-spline basis for you
X <- bs(x,knots=quantile(x,p=c(1/3,2/3)))
X <- bs(x,df=5)
X <- bs(x,degree=2,df=10)
Xp <- predict(X,newdata=x)
• By default, bs uses degree=3, knots at evenly spaced quantiles,
and does not return a column for the intercept
Natural cubic splines
• Polynomial fits tend to be erratic at the boundaries of
the data; naturally, cubic splines share the same flaw
• Natural cubic splines ameliorate this problem by
adding the additional (4) constraints that the function
is linear beyond the boundaries of the data
Natural cubic splines (cont'd)
Natural cubic splines, 6 df
Natural cubic splines, 6 df (cont'd)
Splines vs. Loess (6 df each)
Natural splines in R
• R also provides a function to compute a basis for the
natural cubic splines, ns, which works almost exactly
like bs, except that there is no option to change the
degree
• Note that a natural spline has m+ K - 4 degrees of
freedom; thus, a natural cubic spline with K knots has
K degrees of freedom
Mean and variance estimation
Mean and variance estimation (cont'd)
Mean and variance estimation (cont'd)
Problems with knots
• Fixed-df splines are useful tools, but are not truly
nonparametric
• Choices regarding the number of knots and where
they are located are fundamentally parametric choices
and have a large effect on the fit
• Furthermore, assuming that you place knots at
quantiles, models will not be nested inside each other,
which complicates hypothesis testing
• An alternative, more direct approach is penalization
Controlling smoothness with penalization
• Here, we directly solve for the function f that minimizes the following
objective function, a penalized version of the least squares objective:
𝑛

෍ （𝑦𝑖 − 𝑓(𝑥𝑖 ))2 + 𝜆 න(𝑓′′(𝑢))2 𝑑𝑢

𝑖=1
• The first term captures the fit to the data, while the second penalizes
curvature - note that for a line, 𝑓′′(𝑢) = 0 for all u
• Here, is the smoothing parameter, and it controls the tradeoff between the
two terms:
• 𝜆= 0 imposes no restrictions and f will therefore interpolate the data
• 𝜆= ∞ renders curvature impossible, thereby returning us to ordinary linear regression
Controlling smoothness with penalization
(cont'd)

• This avoids the knot selection problem altogether by

formulating the problem in a nonparametric manner
• It may sound impossible to solve for such an f over all
possible functions, but the solution turns out to be
surprisingly simple: as we will show, the solution to
this problem lies in the family of natural cubic splines
Terminology

First, some terminology:

• The parametric splines with fixed degrees of freedom that
we have talked about so far are called regression splines
• A spline that passes through the points {𝑥𝑖 , 𝑦𝑖 } is called an
interpolating spline, and is said to interpolate the points
{𝑥𝑖 , 𝑦𝑖 }
• A spline that describes and smooths noisy data by passing
close to {𝑥𝑖 , 𝑦𝑖 } without the requirement of passing through
them is called a smoothing spline
Natural cubic splines are the smoothest
interpolators
Natural cubic splines solve the nonparametric
formulation
Design matrix
Smoothing splines are linear smoothers
Smoothing splines are linear smoothers (cont'd)
CV, GCV for BMD example
Undersmoothing and oversmoothing of BMD data
Pointwise confidence bands
R implementation
• Recall that local regression had a simple, standard function for
basic one-dimensional smoothing (loess) and an extensive
package for more comprehensive analyses (locfit)
• Spline-based smoothing is similar smooth.spline does not
require any packages and implements simple one-dimensional
smoothing:
fit <- smooth.spline(x,y)
plot(fit,type="l")
predict(fit,xx)
• By default, the function will choose based 𝜆 on GCV, but this
can be changed to CV, or you can specify 𝜆
The mgcv package
• If you have a binary outcome variable or multiple covariates
or want confidence intervals, however, smooth.spline is
lacking
• A very extensive package called mgcv provides those
features, as well as much more
• The basic function is called gam, which stands for generalized
additive model (we'll discuss GAMs more in a later lecture)
• Note that the main function of the gam package was also
called gam; it is best to avoid having both packages loaded at
the same time
The mgcv package (cont'd)
The syntax of gam is very similar to glm and locfit, with a
function s() placed around any terms that you want a smooth
function of:
fit <- gam(y~s(x))
fit <- gam(y~s(x),family="binomial")
plot(fit)
plot(fit,shade=TRUE)
predict(fit,newdata=data.frame(x=xx),se.fit=TRUE)
summary(fit)
Hypothesis testing
• As with local regression, we are often interested in testing whether
the smaller of two nested models provides an adequate fit to the data
• As before, we may construct F tests and generalized likelihood ratio
tests:

where q is the difference in degrees of freedom between the two

models
Hypothesis testing (cont'd)
• As with local regression, these tests do not strictly preserve
type I error rates, but still provide a way to compute useful
approximate p-values in practice
• Like the gam package, mgcv provides an anova method:
anova(fit0,fit,test="F")
anova(fit0,fit,test="Chisq")
• It should be noted, however, that such tests treat as fixed,
even though in reality it is often estimated from the data
using CV or GCV

Generalized Additive Models An Introduction With R 1st Edition Simon N Wood instant download
No ratings yet
Generalized Additive Models An Introduction With R 1st Edition Simon N Wood instant download
89 pages
Department of Electrical:Electronic Engineering - Details of Syllabus
No ratings yet
Department of Electrical:Electronic Engineering - Details of Syllabus
13 pages
Statistical Methods For Bioinformatics Lecture 5
No ratings yet
Statistical Methods For Bioinformatics Lecture 5
48 pages
Spline Methods Draft: Tom Lyche and Knut Mørken
No ratings yet
Spline Methods Draft: Tom Lyche and Knut Mørken
235 pages
Smoothspline PDF
No ratings yet
Smoothspline PDF
4 pages
Module08 PolynomialRegressionSplineGAMs
No ratings yet
Module08 PolynomialRegressionSplineGAMs
56 pages
Lec 08_Polynomial Regression
No ratings yet
Lec 08_Polynomial Regression
56 pages
Flexible Regression - Lecture 6: Marnie Mclean Room 344 Mathematics and Statistics Building
No ratings yet
Flexible Regression - Lecture 6: Marnie Mclean Room 344 Mathematics and Statistics Building
27 pages
Spline Models: - Introduction To CS and NCS - Regression Splines - Smoothing Splines
No ratings yet
Spline Models: - Introduction To CS and NCS - Regression Splines - Smoothing Splines
24 pages
Splines
No ratings yet
Splines
35 pages
SMSP
No ratings yet
SMSP
39 pages
Chapter3 Annotated Almostwholething
No ratings yet
Chapter3 Annotated Almostwholething
26 pages
Lecture02 95791
No ratings yet
Lecture02 95791
94 pages
Basis Approaches
No ratings yet
Basis Approaches
9 pages
Chapter 5: Basis Functions and Regularization
No ratings yet
Chapter 5: Basis Functions and Regularization
4 pages
Penalized regression
No ratings yet
Penalized regression
6 pages
Spline and Penalized Regression
No ratings yet
Spline and Penalized Regression
45 pages
slides4-mrbm2324
No ratings yet
slides4-mrbm2324
40 pages
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
No ratings yet
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
5 pages
Package Splines2': September 19, 2021
No ratings yet
Package Splines2': September 19, 2021
25 pages
16-Splines and Piecewise Interpolation
No ratings yet
16-Splines and Piecewise Interpolation
17 pages
Basis Expansion and Regularization: Prof. Liqing Zhang
No ratings yet
Basis Expansion and Regularization: Prof. Liqing Zhang
45 pages
L5 Spline Regression
No ratings yet
L5 Spline Regression
74 pages
A Practical Guide To Splines: Revised Edition
No ratings yet
A Practical Guide To Splines: Revised Edition
7 pages
A Practical Guide to Splines
No ratings yet
A Practical Guide to Splines
7 pages
Estimating Penalized Spline Regressions: Theory and Application To Economics
No ratings yet
Estimating Penalized Spline Regressions: Theory and Application To Economics
16 pages
Range BPW Fitdd Neu
No ratings yet
Range BPW Fitdd Neu
13 pages
wahba_improper_priors
No ratings yet
wahba_improper_priors
9 pages
s11075-024-02003-7
No ratings yet
s11075-024-02003-7
21 pages
Range BPW Fitdd Neu
No ratings yet
Range BPW Fitdd Neu
13 pages
Smoothing: Smooth
No ratings yet
Smoothing: Smooth
19 pages
A Practical Guide To Spline: Mathematics of Computation January 1978
No ratings yet
A Practical Guide To Spline: Mathematics of Computation January 1978
8 pages
Polynomial and Spline Interpolation: A Chemical Reaction
No ratings yet
Polynomial and Spline Interpolation: A Chemical Reaction
4 pages
Matlab 3
No ratings yet
Matlab 3
42 pages
Radial Basis Function Networks: Yousef Akhlaghi
No ratings yet
Radial Basis Function Networks: Yousef Akhlaghi
28 pages
09_CubicSpline
No ratings yet
09_CubicSpline
13 pages
B-Spline Interpolation: Charles Frye Introduction To Splines
No ratings yet
B-Spline Interpolation: Charles Frye Introduction To Splines
10 pages
Splines
No ratings yet
Splines
110 pages
Splines Toolbos Use Matlab
No ratings yet
Splines Toolbos Use Matlab
110 pages
Lecture19 PDF
No ratings yet
Lecture19 PDF
4 pages
Nonlinear Regression
No ratings yet
Nonlinear Regression
8 pages
Chapter 5 - Excursion B-Splines - Commented2
No ratings yet
Chapter 5 - Excursion B-Splines - Commented2
47 pages
pspline
No ratings yet
pspline
6 pages
Splines and Piecewise Interpolation: Powerpoints Organized by Dr. Michael R. Gustafson Ii, Duke University
No ratings yet
Splines and Piecewise Interpolation: Powerpoints Organized by Dr. Michael R. Gustafson Ii, Duke University
17 pages
10 Matlab Fitting
No ratings yet
10 Matlab Fitting
36 pages
Chapter 7 - Handsout Machine Learning
No ratings yet
Chapter 7 - Handsout Machine Learning
18 pages
Lecture 8
No ratings yet
Lecture 8
24 pages
SDV
No ratings yet
SDV
82 pages
Thesis Evolutionary Computation of Optimal Knots Allocation in Smothing Methodsby2 PDF
No ratings yet
Thesis Evolutionary Computation of Optimal Knots Allocation in Smothing Methodsby2 PDF
110 pages
Lecture 19
No ratings yet
Lecture 19
4 pages
Polynomial Regression
No ratings yet
Polynomial Regression
15 pages
06basis
No ratings yet
06basis
9 pages
A - Practical - Guide - To - Spline - de Boor
No ratings yet
A - Practical - Guide - To - Spline - de Boor
8 pages
A Practical Guide To Spline
No ratings yet
A Practical Guide To Spline
8 pages
Regression Splines
No ratings yet
Regression Splines
4 pages
Bsplines
No ratings yet
Bsplines
5 pages
Isotonic Smoothing Spline Regression
No ratings yet
Isotonic Smoothing Spline Regression
18 pages
An Introduction To Splines: James H. Steiger
No ratings yet
An Introduction To Splines: James H. Steiger
23 pages
Asymptotic Properties of Penalized Spline Estimators: Faculty of Business and Economics
No ratings yet
Asymptotic Properties of Penalized Spline Estimators: Faculty of Business and Economics
32 pages
Interpolation and Extrapolation Optimal Designs 2: Finite Dimensional General Models
From Everand
Interpolation and Extrapolation Optimal Designs 2: Finite Dimensional General Models
Giorgio Celant
No ratings yet
Differential Equations
From Everand
Differential Equations
Harry Hochstadt
3.5/5 (2)
Creating Sparse Finite-Element Matrices in MATLAB Loren On The Art of MATLAB
No ratings yet
Creating Sparse Finite-Element Matrices in MATLAB Loren On The Art of MATLAB
8 pages
INTEGER PROGRAMMING. Integer Programming. Prototype Example. BIP Model. BIP Models
No ratings yet
INTEGER PROGRAMMING. Integer Programming. Prototype Example. BIP Model. BIP Models
11 pages
CE506
No ratings yet
CE506
2 pages
Numerical Mathematics and Computing -- David R_ Kincaid E_ Ward Cheney -- 2008-01-01 -- B
No ratings yet
Numerical Mathematics and Computing -- David R_ Kincaid E_ Ward Cheney -- 2008-01-01 -- B
180 pages
Simplified Solution For Elliptical Bodies
No ratings yet
Simplified Solution For Elliptical Bodies
3 pages
Lecture 5 - Numerical - BMSLec04
No ratings yet
Lecture 5 - Numerical - BMSLec04
22 pages
CV-2022 Scheme and Syllabus
No ratings yet
CV-2022 Scheme and Syllabus
80 pages
Optimization Principles and Algorithms Ed2 v1
No ratings yet
Optimization Principles and Algorithms Ed2 v1
738 pages
Class10 Pt1 QP STD
No ratings yet
Class10 Pt1 QP STD
3 pages
Matrices & Determinants Engineering Practice Sheet
100% (1)
Matrices & Determinants Engineering Practice Sheet
8 pages
Bfgs
No ratings yet
Bfgs
10 pages
Sobolev Gradients: A Nonlinear Equivalent Operator Theory in Preconditioned Numerical Methods For Elliptic Pdes
No ratings yet
Sobolev Gradients: A Nonlinear Equivalent Operator Theory in Preconditioned Numerical Methods For Elliptic Pdes
12 pages
Linear Algebra and Numerical Computing
No ratings yet
Linear Algebra and Numerical Computing
3 pages
Instant Download Infinite Divisibility of Probability Distributions on the Real Line 1st Edition Fred W. Steutel PDF All Chapters
100% (18)
Instant Download Infinite Divisibility of Probability Distributions on the Real Line 1st Edition Fred W. Steutel PDF All Chapters
60 pages
8-19-21 (Submission)
No ratings yet
8-19-21 (Submission)
52 pages
Matrix Notes
No ratings yet
Matrix Notes
23 pages
Instant ebooks textbook Computational Fluid Dynamics and Heat Transfer Emerging Topics 1st Edition R. S. Amano download all chapters
100% (5)
Instant ebooks textbook Computational Fluid Dynamics and Heat Transfer Emerging Topics 1st Edition R. S. Amano download all chapters
51 pages
A Review of Methods For Calculating Heat Transfer From A Wellbore
No ratings yet
A Review of Methods For Calculating Heat Transfer From A Wellbore
15 pages
04022025074920 Am
No ratings yet
04022025074920 Am
137 pages
Admm Homework
No ratings yet
Admm Homework
5 pages
Significant Figures Worksheet
No ratings yet
Significant Figures Worksheet
2 pages
B.sc. Mathematics
100% (1)
B.sc. Mathematics
70 pages
Finite Difference
100% (1)
Finite Difference
18 pages
Stability of Cohesive Crack Model: Part II - Eigenvalue Analysis of Size Effect On Strength and Ductility of Structures
No ratings yet
Stability of Cohesive Crack Model: Part II - Eigenvalue Analysis of Size Effect On Strength and Ductility of Structures
5 pages
05 Nonlinear Dynamics PDF
No ratings yet
05 Nonlinear Dynamics PDF
39 pages
Computational Aeroacoustics An Overview PDF
No ratings yet
Computational Aeroacoustics An Overview PDF
14 pages
Introduction To Linear Programming
No ratings yet
Introduction To Linear Programming
34 pages
Lab Report 1-190469
No ratings yet
Lab Report 1-190469
10 pages
An Introduction To Management Science Quantitative Approaches To Decision Making 14th Edition Test Bank Anderson
No ratings yet
An Introduction To Management Science Quantitative Approaches To Decision Making 14th Edition Test Bank Anderson
22 pages

15 Splines

Uploaded by

15 Splines

Uploaded by

Splines

• k is the order of the polynomial segments of the B-spline curve.

෍ （𝑦𝑖 − 𝑓(𝑥𝑖 ))2 + 𝜆 න(𝑓′′(𝑢))2 𝑑𝑢

• This avoids the knot selection problem altogether by

First, some terminology:

where q is the difference in degrees of freedom between the two

You might also like