Functional Regression Handout
Functional Regression Handout
Fabian Scheipl
Institut für Statistik
Ludwig-Maximilians-Universität München
2 / 398
Who are you?
3 / 398
Credits
Slides and material in the following based, amongst others, on slides and
figures by:
4 / 398
Global Outline
5 / 398
Part I
6 / 398
Introduction
Summary
Introduction
Overview
From high-dimensional to functional data
Summary
8 / 398
Introduction
Overview
Examples of functional data: Berkeley growth study
200
160
Height
120 80
5 10 15
Age
8 / 398
Introduction
Overview
Examples of functional data: Handwriting
0.03
−0.03 −0.01 0.01
y(t)
9 / 398
Introduction
Overview
Examples of functional data: Brain scan images
10 / 398
Introduction
Overview
Characteristics of functional data:
200
0.03
160
y(t)
120 80
I Several measurements for the same statistical unit, often over time
I Sampling grid is not necessarily equally spaced, sparse data
I Smooth variation, that could be assessed (in principle) as often as
desired
I Noisy observations
I Many observations of the same data generating process
↔ time series analysis
J. Ramsay and Silverman 2005
11 / 398
Introduction
Overview
Z
Scalar-on-Function: yi = µ + xi (s)β(s)ds + ε
12 / 398
Introduction
From high-dimensional to functional data
...
n observations
...
...
...
p variables
13 / 398
Introduction
From high-dimensional to functional data
Data with natural ordering:
t1 t2 t3 ... tp
t1 t2 t3 tp
I Longitudinal data
I Ordering along time domain (one-dimensional)
Functional data:
t1 t2 t3 ... tp
T
15 / 398
Introduction
Summary
16 / 398
Descriptive Statistics for Functional Data
Pointwise measures
Example: Growth curves of 54 girls
80
5 10 15
age (years)
Summary Statistics:
I Based on observed functions x1 (t), . . . , xn (t)
I Characterize location, variability, dependence between time points, ...
16 / 398
Descriptive Statistics for Functional Data
Pointwise measures
Example: Growth curves of 54 girls
100 120 140 160 180
20
10
height (cm)
0
−10
80
−20
5 10 15 5 10 15
7
height (cm^2)
40
6
height (cm)
30
5
20
4
10
3
5 10 15 5 10 15
18 / 398
Descriptive Statistics for Functional Data
Covariance and Correlation Functions
v̂X (s, t)
ĉX (s, t) = q
σ̂X2 (s)σ̂X2 (t)
19 / 398
Descriptive Statistics for Functional Data
Covariance and Correlation Functions
40
50
15
heigh
40
40
40 45
55
30
age (years)
t
(cm^2
50 35
10
20
10
)
30
15 25
ag
15)
5
20
s
e
10
10 ear
(y
15
y
ea
5 (
5 age
rs
10
)
5 10 15
age (years)
20 / 398
Descriptive Statistics for Functional Data
Covariance and Correlation Functions
1.0
auto−
0.55
0.75
0.9
0.6
15
0.8
0.75
correla
0.8 0.8
age (years)
0.7 0.75
10
tion
0.6
0.65
0.85
15
ag
5
15) 0.9
5
s
e
10
10 ear
0.7
(y
0.9
y 0.95
ea
0.8
0.9
5 ( 0.85
5 age
rs
5 10 15
age (years)
21 / 398
Introduction
Summary
22 / 398
Basis Representation of Functional Data
Regularly and irregularly sampled functional data
2
.. ..
. .
1 tp xi (tp )
0
0 20 40
t
22 / 398
Basis Representation of Functional Data
Regularly and irregularly sampled functional data
2
.. ..
. .
1 tp xi (tp )
0
0 20 40
t
22 / 398
Basis Representation of Functional Data
Regularly and irregularly sampled functional data
Example bacterial growth curve Sample of curves x1 (t), . . . , xN (t)
5 Observed measurements
4 in ’wide format’:
t1 x1 (t1 ) x2 (t1 ) . . . xN (t1 )
3
.. .. ..
. . .
y
2
.. .. ..
. . .
1
tp x1 (tp ) x2 (tp ) . . . xN (tp )
0
0 20 40
t
2
.. .. ..
. . .
1
tp x1 (tp ) x2 (tp ) . . . xN (tp )
0
0 20 40
t
2
t1,p1 x1 (t1,p1 )
1 .. ..
. .
0
tN,1 xN (tN,1 )
0 20 40
.. ..
t
. .
⇒ Irregular functional data: tN,pN xN (tN,pN )
I functions observed on different time points
I sometimes only sparsely sampled
I more difficult, but often given in practice
24 / 398
Basis Representation of Functional Data
Regularly and irregularly sampled functional data
2
t1,p1 x1 (t1,p1 )
1 .. ..
. .
0
tN,1 xN (tN,1 )
0 20 40
.. ..
t
. .
⇒ Irregular functional data: tN,pN xN (tN,pN )
I functions observed on different time points
I sometimes only sparsely sampled
I more difficult, but often given in practice
24 / 398
Basis Representation of Functional Data
Basis functions
2
k=1
25 / 398
Basis Representation of Functional Data
Basis functions
2 .. ..
. .
1
K θK
0
0 20 40
t
Function given by
K
X
f (t) = θk bk (t)
k=1
26 / 398
Basis Representation of Functional Data
Basis functions
2 .. ..
. .
1
K 1
0
0 20 40
t
Function given by
K
X
f (t) = θk bk (t)
k=1
26 / 398
Basis Representation of Functional Data
Basis functions
2 .. ..
. .
1
K 1
0
0 20 40
t
Function given by
K
X
f (t) = θk bk (t)
k=1
26 / 398
Basis Representation of Functional Data
Basis functions
2 .. ..
. .
1
K K
0
0 20 40
t
Function given by
K
X
f (t) = θk bk (t)
k=1
26 / 398
Basis Representation of Functional Data
Basis representations for functional data
Basis representation Approximate data with basis functions
4
θ i k bk ( t ) ⇒ seek to specify θ̂i,1 , . . . , θ̂i,K such
f ( t ) = Σ k θ i k bi k ( t ) that
3 K
X
xi (t) ≈ θ̂i,k bk (t) .
2
y
k=1
0
0 20 40
t
⇒ Popular criterion:
Specify θ̂i,1 , . . . , θ̂i,K such that quadratic distance becomes minimal,
i.e. !2
X q K
X
xi (tj ) − θi,k bk (tj ) −→ min
θi,k
j=1 k=1
27 / 398
Basis Representation of Functional Data
Basis representations for functional data
2 .. .. ..
. . .
K θ̂1,K θ̂2,K . . . θ̂N,K
0
0 20 40 PK
t
Functional observations represented as xi (t) ≈ k=1 θ̂i,k bk (t).
28 / 398
Basis Representation of Functional Data
Most popular choices of basis functions
1
0
I cheap to compute & numerically
B-Spline of degree 3 stable
4
3 I local support: sparse matrix of basis
2
1
function evaluations
0
0 20 40
t
29 / 398
Basis Representation of Functional Data
Most popular choices of basis functions
30 / 398
Basis Representation of Functional Data
Smoothness and regularization
0
0 20 40
t
31 / 398
Basis Representation of Functional Data
Smoothness and regularization
Penalization:
I minimize quadratic difference from data
+ a roughness penalty term
Specify θ̂i,1 , . . . , θ̂i,K to minimize
p K
!2
X X
xi (tj ) − θi,k bk (tj ) + λ pen(θi ) −→ min
θi,k
j=1 k=1
I with, e.g., P
quadratic penalty on second order differences, i.e.
pen(θi ) = K 2
k=3 ((θi,k − θi,k−1 ) − (θi,k−1 − θi,k−2 )) and λ > 0 a
smoothing parameter
32 / 398
Basis Representation of Functional Data
Smoothness and regularization
Fit with λ = 0 Fit with λ = 1
4 4
3 3
2 2
y
y
1 1
0 0
-1 -1
0 20 40 0 20 40
t t
1
0
-1
0 20 40
t
33 / 398
Basis Representation of Functional Data
Other representations of functional data
34 / 398
Introduction
Summary
35 / 398
Summary
Functional Data:
I Arises in many different contexts and in many applications (curves,
images,...)
I Observation unit represents the full curve, typically discretized, i.e.
observed on a grid
I Important analysis techniques:
I Smoothing and basis representation
I Functional principal component analysis
I Functional regression
Summary Statistics:
I Give insights into location, variability and time dependence in a
sample of curves
I Pointwise calculation, mostly analogous to multivariate case
35 / 398
Summary
Basis representation:
I Different types of raw functional data: regularly and irregularly
sampled
I (Approximate) representation via bases of functions
I ’true functional representation’
I smoothing / vector representation
I Represent a functional datum in terms of a global, fixed, known
dictionary of basis functions and an observation-specific coefficient
vector.
I Different types of basis functions for different purposes / applications
I Obtain desired ’smoothness’ via penalization
36 / 398
Part II
Background: Regression
37 / 398
Recap: Linear Models
39 / 398
Data & Model
Data:
I (yi , xi1 , . . . , xik ); i = 1, . . . , n
I metric target variable y
I metric or categorical covariates x1 , . . . , xp (categorical data in binary
coding)
Model:
I yi = β0 + β1 xi1 + · · · + βp xip + εi ; i = 1, . . . , n
⇒ y = Xβ + ε; X = [1, x1 , . . . , xp ]
I i. i. d. residuals/errors εi ∼ N(0, σ 2 ); i = 1, . . . , n
I estimates ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
39 / 398
Interpreting the coefficients
Intercept:
β̂0 : estimate for y if all metric x = 0.
and all categorical x in their reference category.
metric covariates:
β̂m : estimated expected change in y if xm increases by 1 (ceteris
paribus).
categorical covariates: (dummy-/one-hot-encoding)
β̂mc : estimated expected difference in y between observations in
category c and the reference category of xm (ceteris paribus).
40 / 398
Linear Model Estimation
41 / 398
Properties of β̂
I unbiased: E(β̂) = β
I Cov(β̂) = σ 2 (X> X)−1
for Gaussian ε:
β̂ ∼ N(β, σ 2 (X> X)−1 )
42 / 398
Tests
Possible settings:
1. Testing for significance of a single coefficient:
H0 : βj = 0 vs HA : βj 6= 0
2. Testing for significance of a subvector βt = (βt1 , . . . , βtr )> :
H0 : βt = 0 vs HA : βt 6= 0
3. Testing for equality: H0 : βj − βr = 0 vs HA : βj − βr 6= 0
General:
Testing linear hypotheses H0 : Cβ = d
43 / 398
Tests
F-Test:
Compare sum of squared errors (SSE) of full model with SSE under
restriction H0 :
n − p SSEH0 − SSE
F =
r SSE
−1
(Cβ̂ − d) σ̂ 2 C(X> X)−1 C>
> (Cβ̂ − d) H0
= ∼ F (r , n − p)
r
t-Test:
Test significance of a single coefficient:
β̂j H0
t=q ∼ t(n − p)
\
Var(β̂j )
2
2 β̂j H0
F =t = ∼ F (1, n − p)
\
Var( β̂ ) j
44 / 398
Residuals in the linear model
45 / 398
Types of Residuals
46 / 398
Graphical model checks:
47 / 398
Linear Model in R:
48 / 398
Example: Munich Rents 1999
49 / 398
Model in R
no interaction:
y = β0 + β1 ∗ x1 + β2 ∗ x2.2 + β3 ∗ x2.3
miet1 <- lm(rentsqm ~ size + area)
(beta.miet1 <- coef(miet1))
with interaction:
y = β0 + β1 ∗ x1 + β2 ∗ x2.2 + β3 ∗ x2.3 + β4 x1 x2.2 + β5 x1 x2.3
miet2 <- lm(rentsqm ~ size * area)
(beta.miet2 <- coef(miet2))
miet1 miet2
35
35
area
● normal
● good
30
30
● best
25
25
net rent (DM/sqm)
20
15
15
10
10
5
5
0
round(anova(miet2), 3)
anova(miet1, miet2)
Standardized residuals
20
● ●
4
● ●
● ●
● ●
● ● ● ● ●
●● ● ●●●●
● ●
Residuals
● ● ●●●●
● ● ● ●
10
● ●
●
● ● ● ● ● ● ● ● ●● ● ●
●●
●●
●●
●
● ● ● ●● ● ● ● ● ● ● ● ● ● ●
●●
●
●●
●
●●
●
●
● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●
●● ●●
●●
●
●
●●
●
●
●●
● ●●● ●●●● ●●● ●●● ● ●● ● ● ●
2
● ● ● ● ● ●● ● ● ●
●
●●
●●
●
●●
●●
● ●●● ● ●● ● ● ● ● ●● ● ● ●● ●●
●
●●
●
● ●● ●● ● ● ● ● ● ●●
●● ●●● ●●● ●● ● ● ● ● ●
●●●
●●● ●
● ● ●● ● ●● ●
●●
●
●
●●
●
●
● ● ●● ●●
● ● ●●
● ●●●●● ● ● ●● ● ●● ● ● ●●
●● ● ●
●●●● ●●●●●●●● ●●
● ● ●●●
● ● ● ●●●● ●
●
●●●●
●
●●
●●● ● ●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
● ● ● ● ● ● ●● ● ● ● ●● ● ● ●
● ● ●●● ●
●●● ● ● ●● ● ●●
●
●
●
● ●
●● ● ●● ● ●● ● ●●● ● ●● ●●● ● ●● ● ● ●●●
●●●
●●●● ●●●● ●● ●●
●●●●●●●●●●
●●●●●●●● ● ●
● ●●●● ● ● ●●
●●
●●●●●● ● ●
●
●
●●
●
●
●●
●
●
●●
●
●
● ●● ●●● ●
● ●●● ●●●●● ●● ●● ●● ●●●● ●●●
● ●● ●●●
● ●● ●●●●
●● ●●
● ●● ●● ● ●● ●
●●
●
●
●●
●
●
●●
●
●
● ●● ● ●●●● ●● ● ● ●●● ●● ● ●
● ●
●●
●●
●●●● ●●
● ●●
● ●●
●
●● ●
●●●● ●●
●
●●● ● ●
●●● ●
●●●
● ●● ●
●● ● ●
●
●●
●
●
●●
●
● ●● ● ● ●● ●●
● ●●● ●●
●●
●
●●●●
●●
● ●●● ●●●●●
●● ●
●●● ●●
●●● ● ●●●●●● ●●● ●●●●●●
●
● ●●●●●● ● ●
●●
●
●
●●
●
●
●●
● ●●● ● ● ●● ● ● ● ● ●● ●●
●●●
● ●●
●●●
●
●
●
●●●
●●●●●
●●●●
● ●●●●●●●
●●●
● ●●●●● ●
●●●
●●●●●
●
●
●● ●●
●●
●
●
●●●●
● ●●
●●●●
●●
●
● ●●●●
● ●●●●●● ●
●
● ●
●● ●●
● ● ●●
●● ●●
●●
● ●
●
●●
● ●● ● ● ●
●●● ●
●●
●
●
●●
●●
●
●
●●
●
● ●
●
●
●
●●●
●
● ● ● ● ●●
●● ● ● ● ● ●●●●●●●●●●● ●●
●●● ●●● ●
●● ●●
●
●● ● ●●●
●● ●
●
●● ●
●●●
● ●●
● ●
●
●●●●
●●●●●
● ●●● ●●
●●● ●
●●● ●●●● ●● ●●●●
●●● ●
●●●●●● ●●●● ● ● ●
●●
●
●
●●
●
●
●●
●
●● ●●●●●● ● ●●●
●●●●● ●●●●
●●
●● ● ●●●●●●
●●●●● ●● ●●
●●●●
● ●●
● ●
●●●●●●● ●
●●● ● ●
●
●●
●
●
●
●●
● ●
●
● ●● ● ● ● ● ●●
●
●
●
●● ●
●● ●●
● ●
●● ●
● ●●
●
●●
●● ●● ●
●
●●
● ●●
●●●
●●
●
●
●
●
●
● ●
●●●
● ●●
●●●●
●
●
●
●
●
●
●
●
●
●
● ●●●
●●●●●
●
●
●●
●
●
●
●
●●
●●●●
●
●
●
● ●
●●●
●
●
●●●
●
●●
●
● ● ● ●●● ●
●●
●●
●
●
●
●
●
●●●
●
●●
●●
●●●● ● ●
●●
●
●●
●
●●
●
●
●
●●
● ●
●
●
●●
●
●
●
● ●●● ● ● ● ●●● ● ●●● ●
●●●●● ●●●●
● ●
●●● ●●●●● ●●
●●●
●● ●●
●● ●● ●●●
●● ●●●
●●●●
●● ●● ●●
● ●●● ● ●
●●
●
● ●● ● ● ● ●● ●● ●●●● ●●●● ●●●●
● ●
0
●● ●● ●●
● ●●●● ●● ●● ●
● ● ● ● ● ● ● ● ● ●● ● ● ● ●●
● ●● ●● ●●● ●● ●● ●●●● ●●
●
●● ● ●●
●
● ●●
●●
●
●●
●●
● ●●● ●
●●
●●● ●●
●● ●●
●
●●
●●
●●
●●●
●
●
●●● ●●●●●●
●
●
●● ●●
●● ●●●●●●● ● ●●
●● ●●●●
●●
●
●● ● ●●●● ●
●●
●
●
●●
●
●
●●
●
●
●
● ● ● ●● ●● ● ●●● ●●
●
●● ● ●● ●● ●● ●
●●●
●● ● ●●
●● ●●
●
●●●●
● ●●●●●
●● ●●
●
●●●●●●●
●● ●●●
●●
●●●
●
●●
●●
●
●
●
●
●●
● ●●
●
●
●
●● ●●
●●
● ●
●●●●
●●●●● ● ● ●
●
●
●●●●
●● ●●● ● ● ●●
●
●
●●
●
●
●●
●
●
●●
●
0
● ●●●●●
●● ●●
●● ●
● ●● ●● ● ●● ● ●
●
●
● ● ● ●● ● ● ●●●
●●● ●● ●● ● ●●
●●●●●● ●●
●●
●●● ●●●●●●●●●●●
●● ●●
●●●●●●●
● ●●
●●●●●
●●●●
● ●●●
●
●●
●●● ●
● ●●
●●●●●●
●
●●● ●●
●
●● ●●●●●●●
●● ●●
● ● ●●● ●● ●
●
●●
●
●
●●
● ●
●
●
●●
●
●
●
● ●● ● ●
●● ● ●
●● ●●●
● ●●●
●● ●
●
●●●●
● ●●
●● ●
●●●●● ●
●
● ●●
●●
●●
● ●
●
●●●●●
● ●●
●●●
●●
● ●●●
● ●● ●●● ●● ●
●●●●●●● ● ●● ●●
●
●
●●
●
●
●●
● ●● ●● ●● ● ●●●●●
● ● ● ●
●●
●●●●●
● ●
●●●● ●●
●●●●
●●
●●●●●
●● ●●
●
●●
●●●●
●● ●●
●●●
● ●●●
●●
●●
● ●●
●
●
●●● ●
●●●●●●
●
● ●●●●●
●●
● ●●
●
●●
●●● ● ●● ● ●● ● ●●●
●● ●
●
●
●●
●
●
●●
●
●
● ● ● ●● ● ● ● ●●
●●● ●●
● ●●
●●
●●●
●●●● ● ●● ●●
● ●
●● ●
●●
●●●
●●
●●
● ● ●●
●●
●●●
●●
●●●●
●●
●
● ●●●●●●●●
●●● ● ●
●●● ●● ●● ● ●
● ● ●
●
●
●●
●
●
●●
●
●
●●
●
●
●
● ● ● ● ●● ● ● ●● ● ●●
●● ● ●●
●●●●●
●● ●●●●●●●● ●●●
●●
● ●●●
●●
●●
●
●
● ●
●●●●●
●●
●●
●● ●●
● ●
●●
●●
● ●●●●●
● ●● ●● ● ● ●●
● ● ● ● ●
●
●●
●
●
●●●
●
●
●
●
● ● ● ● ● ●● ●●●●●
● ●● ●●
●●●● ●●
●●
●●●
●●●
●
●●
●●
●●●●
●●● ●●
●
●●●●●●●●
●
● ●
●● ●●
●● ●
● ● ●● ●
●
●●
●
●
●●
●
●
●●
●
●
●
● ● ● ●
●● ●● ● ●● ●
● ●●
● ●●● ●●
●
●●●
●●
●
●●
●●●● ●●● ●● ●●
● ●●●
●●● ●
●
●
●●● ●●
●●●
●
●●
●●●●
●●●
●● ●
●●●●
●● ●● ● ●● ● ●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
● ● ●●●●● ●● ●●● ● ●●●
● ● ●●●● ●● ●●●●●
●●● ●●
●● ●
●
●●●
●● ●●● ●●
●●
●●●
● ●● ●
● ●●●
●●●●
●●
●●
●●●●●
● ●● ●● ●
●
●●
●
●
●●
●
●
●●
●
●
●
● ●● ●●●● ●●● ● ● ●●●●●●●●
●●●●
●
● ●●● ●●
●●
● ●●
●● ●● ●
●
●●
●
●
●●
●
●
●●
● ●●● ● ● ●● ●●● ● ● ●● ●● ● ●● ●
● ●● ●● ● ● ● ●●
●
●
●●
●
●
●●
−10
● ● ● ●● ● ● ● ● ●●
●●●
● ● ●●●●●
●●● ●● ●● ●● ●●●●● ●●●●●● ●
● ● ●● ●
●●
●
●
●
●●
● ●
●
●
●●
●
●
●●
●
●●
●
●
●
● ● ● ●●●●●
● ●●
●● ● ●●
● ● ●●● ●●●
● ●●●●●● ●
●
● ●
●●
●
●
●●
●
●●
●●
●
●
●
●●
●
● ●
●●
●●
●
● ● ●● ● ●● ● ●●
●
●
●●
−2
● ● ● ●●● ● ● ●
●●
●
●
●● ● ● ●●●● ● ●●
● ● ● ●
● ●●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●●
● ● ● ●
●
●●
●
●●
●
●●
●
●
●● ● ● ●●
●●
●
●●
●
● ●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●●
●●
●
●●
●●
● ●●●
●
●
● ● ●●●●●●
●
●
● ●
●
8 10 12 14 16 18 −3 −2 −1 0 1 2 3
Fitted values Theoretical Quantiles
Scale−Location Residuals vs Leverage
Standardized residuals
Standardized residuals
0.0 0.5 1.0 1.5 2.0
● 0.5
●
●
4
●
● ●
● ● ●● ●
● ● ● ●
●● ● ●
● ●
● ● ● ● ●
●
●●●
● ● ● ● ● ● ●
●
● ● ●● ●● ● ●● ●● ● ●● ●●
●● ● ●
● ●● ●● ● ● ●● ● ●
● ● ● ●● ● ●●● ● ● ●● ● ● ●● ● ●● ●
●●●●●●
●
● ●
●
● ●●● ● ● ● ● ●●●● ●●
●● ●● ●●●●
● ● ●
●●●● ●
● ● ● ● ● ● ●●● ●● ● ● ● ●● ●●●
●●
● ●●
● ●● ●● ●
● ●● ●●
●
●
●
●●
● ● ●
● ● ●
2
● ●● ● ●●● ● ●●● ● ● ●● ●
●●
●
● ●
● ● ● ● ● ●●● ● ●●● ●●●●● ●●●
●● ● ●●●●●
● ● ●●
● ●● ●● ●
●
●
●
●
●
●
●
●●
●
●●
●●●
● ●● ●
● ● ●● ● ● ●
● ● ● ●● ●● ●●● ●● ● ●
● ● ●●●●
●● ●●
●● ●●●●● ●●●
●●●●●●●
● ●●● ●● ● ●● ●● ●●
●● ● ● ●
●
●
●
●
●
●
●
●●
●●●●● ● ●
● ● ●
● ● ●●● ●● ●● ●●●● ● ●
●
●●●●●●●●
● ●
●●●●●●● ● ●●●
●
● ●● ● ●●●
● ● ●● ●
●●● ● ●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●● ●● ● ● ● ●
● ● ● ●● ● ●● ●
● ●● ●
●●● ● ●●●●●● ●● ●●
● ● ●● ●●● ● ●●●●● ● ● ● ●
●
●
●●
●
●
●
●●
●
●●●●
●●●● ● ●● ● ●
● ● ● ●● ● ● ● ●● ● ● ●●● ● ●● ●●● ●●● ●●
●
●
● ●● ●
●
●●●●
●● ●●●
● ●●
●
● ●
●●●●
●
●●●●●● ●●●●
●●●●●●
●
●
●●
● ●●●● ● ●●
● ● ● ● ●●●● ● ●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ● ●●
●
● ● ● ● ●● ●●
● ●● ●● ●
●● ●●
●●
●●●●
●
●
●●
●●●●●● ●● ●●●●●● ●
●
●●
●
●●
●●
●
●● ●
● ●●●●
●
● ●●
●●●
●●
● ●
●●●
● ●●
●● ● ●●
●●● ● ●
●
●● ●● ● ●
●●●● ●● ● ●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●● ●●
● ●● ● ● ● ● ●
●
●
● ● ● ●●●
● ●● ● ●● ●● ●●●
●●
● ● ●●
● ●●
●●●
●●●
●
●●●
●●
●
●●●
●
●
●
●
● ●
●
●
●
●
●
●
●●
● ●●
●●
●●●
●●
●●●●
●
●
●●●
● ●●●
●
●
●●● ●
●●●
● ●●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
● ● ●● ●●●
● ●
● ● ● ●● ● ●●●● ● ●● ● ● ●
●● ●●●●●●
●● ●
●●●
●● ●●●● ●
●● ● ●
●●●●
●●●●●●● ● ●
●● ●●● ● ● ● ●● ● ●
●
●
●●
●
●
●●
●
●●
●
●●●● ●
● ●● ●● ● ●●●●● ●●
● ●● ●●●●●● ● ●●● ● ● ●● ● ● ●
●
●
●●
●
●
●
●●
●●
●●● ● ● ●
● ● ●● ●● ● ● ●●● ●●●● ● ● ● ● ●● ●●●●
●●●●
●●●
●●●● ●●●
●● ● ●●
●
●●●●●●
● ●●●
●
●
●
●●
● ●
●●
●●
●● ● ●● ●●● ●●●● ●●●● ●
●● ●
● ● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●●●● ● ● ●
● ● ●● ● ● ●● ● ●● ● ● ● ●● ● ●●
●● ●●●●
●●
●● ●
●●●●●●●●● ●●
● ●
● ●●
●●
●
●●●●
●●
●●●●●●
● ●●
●
● ●●
●●●● ●● ● ● ●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●● ● ● ● ●
● ● ● ● ●●
●● ●●●●● ●●
●●●
●●
●
●●
●
●●● ●●● ●●
●●●
●●●●
●●
●
●
●
●●●●
●●
●
●●
●
●●
●
●● ●
●●●●
●● ●●
● ●●
●●●
●
●●
●
●●
● ●●●● ● ● ●●●●
● ●●
●●●● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
● ●● ● ● ● ● ●● ● ●
● ●● ● ● ●●●●●●● ● ●●●●
● ●●
●●●● ● ●● ●●●● ●●●
●● ● ●●
0
●● ●● ● ● ●
●● ●●●
●●●● ● ● ●● ●●
●
● ●●●●● ● ● ●
●●●●●●
● ●●●●
●●●●●●
●●
●●
●
●●●●
●
●●●●● ●●
●●●●●● ● ●●●●
●●●
●● ●● ● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●●
● ● ●● ●●● ●● ●●●
● ● ●
● ● ●●●●● ●●●●●
●●●
●●●
●●
● ●● ●● ●
●●●● ●●●●
●●●●●●●
●●●●●
● ●●
● ●●
●●●● ●
●
●●●●● ●●● ●●
●●●● ●● ● ●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
● ●●●●●●● ●●
● ● ● ●●● ● ● ● ● ●●● ● ●●● ●
●●
●●
●● ●
●
●●●●
●●
●●●
●●● ●● ●●●
●
● ●●● ●● ●
●
●●
●
●●●●
●●●●●●
● ●●
●●● ●● ●●● ●●
●● ● ●● ● ●●● ●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●●●
●
● ●●●●●
● ● ● ●
●
●●● ●●●● ●● ●● ●
●●●●●
● ●●●
●● ●
●● ●●●●●●●●
●●●●●●
●
●
●●●
●● ●●●
●
●
●
●●●
● ●●●
●●
●●
●
●●●●
●● ●●●
●
●●●●●
● ●● ●●
●
●● ●●
●●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●● ● ● ●
●
● ● ●
●● ● ● ●●●● ●●● ● ●●● ●
● ●●●●●
●● ●● ●●
●●●●●
●●
●● ● ●●●●
●●●
●●●
●● ●
●●●●●
●
●●
●● ●● ●●
●●
● ● ●
●●● ●●
●● ●●●
●●●●
●●● ● ●● ●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●●●
●●●●●●●
● ●● ●●●● ●●●● ●●●
●● ● ●●●
●●●●● ●●
●●● ●●● ●
●●
●● ●
●●● ●●
●● ●
●
●
●●●●
●
●●●● ●● ●●
● ●●
●●● ●●● ●● ●● ●●●●● ● ● ● ● ● ● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ● ●● ● ● ●
●
●● ●●
● ● ● ●●● ●●
● ● ●●●●●● ●● ●●●
●● ●● ●●●● ●
●●●
● ●●●
●●●● ●● ● ● ● ● ●● ● ●
●
●
●●
●
●●
●
●●
● ● ●
● ●●● ● ●● ●●●●●
●
●●●●●●●●●
●●●●
●●
●● ●
●● ●●●●●●
● ●●●
● ● ●●●●
● ●
●●●●
●●● ●● ●●●
●●●● ●●● ●
●●
●●●● ●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●● ●● ●
●●
● ●
● ● ●● ● ●●● ●●●
● ●●●● ● ●●
●● ●●
● ●●●
●● ● ●● ●
●
●●● ●● ●●●●
●●
●●
●
● ● ● ●● ●
●
●
●●
●
●
●●
●●
●●● ● ●
● ● ● ●●●● ● ●● ●●●●
●●●● ●●●
● ●●
●●●●● ●
●●
●● ●●● ●
●●●●●●
● ●● ●●●●
● ●●●●●●●●● ●●● ●●●●
●
● ●●●● ● ●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
● ●
●● ● ● ● ● ●●●●●●● ● ● ● ●●
● ●●
●
●●
●
● ●●
●
● ●● ●
● ●●●●
● ●●
●●●●
●●●● ●●●
●●●
●
●●
●●● ●●
● ●●●●● ● ●●
●
●
●
● ●● ●
●●●●
● ●● ● ●● ●●
●●● ●● ●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●●
● ●●
●
● ●
● ● ●
●●
● ●● ●● ●●●● ● ●●● ● ●● ●●
● ●●●●●●●●●
●● ●
●●● ● ●
● ● ● ●●● ●
● ● ● ●
● ● ●● ● ● ● ●
●
●
●●●●
● ●
● ● ●●● ●●
● ●●
●●●● ● ●
●
●●
●
● ●
●
−2
● ● ●●●●● ●● ● ●●● ● ●● ● ●
●●
● ● ●●● ● ●●●
●● ●● ●●●●
●● ● ●●●●●●● ●● ● ●
●● ● ●
● ● ● ●
●
●
●
●
●
●●
●●
●●● ● ●
● ● ●●● ● ● ●
●● ●●●●●●●●● ● ● ●
● ●● ● ● ● ●● ●●●●●●●
● ●●
●●●
●
●
●
●●●
●●● ●●●● ● ●●●●● ● ●●● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ● ● ●
● ● ● ● ●●●●
●● ●● ● ●●
● ●● ● ●● ● ● ●
●
●● ● ●
● ●● ● ●● ● ● ●●● ● ●
●
● ● ●●●
●
●● ●●● ●● ● ●●● ●●● ● ●
●●● ●●●
● ●● ●●
●● ●● ● ●● ● ● ●● ● ●
●
●
●●
●●
●● ●
●● ● ● ● ● ● ● ● ● ●●●● ● ●●● ●● ●● ● ●
● ●●● ●● ●● ●●● ●● ● ● ● ●
●
●
●●
●
●●● ●
● ●● ●
●●● ●● ●● ● ● ● ●●
●● ● ●
● ●● ●
●●● ● ●● ● ●● ●
●●● ●
●● ● ●● ● ● ●●
● ●● ● ● ● ●
● ● ● ●● ● ●●● ●●●●●
● ●●● ●
● ● ●● ● ●●●●● ● ● ●● ●● ● ●
● ● ● ● ●●● ● ● ●●● ●●
●● ● ●● ●● ●●
● ● ● ● ●● ●● ● ● ● ●● ●●●● ●● ●● ●
●●● ●● ● ●●● ● ● ● ●
● ● ● ●●● ● ● ●
●● ● ●● ●● ● ● ●●●●● ●
● ●● ●
● ● ● ●● ● ●● ● ●
● ● ●● ● ● ● ● ●●●● ● ● ● ● ●
●
●
●
●
● ●●● ● ● ● ● ●
● ● ●
● ● ●●●
●●
●
●●●
●●
●● ● ● ● ●
●
● ●● ●●
●●
●● ● ●●●● ● ●●●
● ●
● ● ●● ●
●●
●
● ●
●
Cook's distance
−4
●
10
●
●
●
●
● ●●
5
● ● ●
● ●
● ● ●●● ●
^ε
● ●
● ●
● ●● ●
● ● ●●
●●● ●●●●●● ● ●
0
● ● ●
● ● ● ●● ●●● ● ● ●
●● ● ●●●● ●● ● ● ●● ● ● ●
● ● ● ●● ●●●●● ● ● ● ● ● ●● ● ●● ● ●
●
● ● ● ● ● ●
●
● ● ● ●● ● ● ● ●
● ● ●
● ● ● ●
●●
●● ●● ● ●
●
●
−10
●
●
●
I y = Xβ + ε
I Gaussian errors: ε ∼ N(0, σ 2 I)
⇒ y ∼ N(Xβ, σ 2 I)
⇒ E(y) = Xβ; Var(y) = σ 2 I
58 / 398
Recap: Linear Models
59 / 398
Binary Target: Naive Approach
Data:
I binary target y (0 or 1)
I metric and/or categorical x1 , . . . , xp
naive estimates:
ŷi = β̂0 + β̂1 xi1 + · · · + β̂p xip
I ŷi not binary
I could try to interpret ŷi as P̂(yi = 1)
I no variance homogeneity
I ŷi < 0 ? ŷi > 1? ⇒ ŷi must be between 0 and 1
Idea:
P̂(yi = 1) = h(x>
i β̂) with h : (−∞, +∞) → [0, 1]
59 / 398
Binary Target: GLM Approach
I yi ∼ B(1, πi )
I model for E(yi ) = P(yi = 1) = πi
I use response function h: π̂i = h(x> i β̂)
or linkfunktion g : g (π̂i ) = xi β̂ where g () = h−1 ()
>
Logit-Model:
exp(x>
i β̂)
π̂i = h(x>
i β̂) =
1 + exp(x>
i β̂)
60 / 398
Binary Target: Coefficients of the Logitmodel
exp(x>
i β) πi
πi = ⇔ log = x>
i β̂
1 + exp(x>
i β) 1 − πi
πi
⇔ = exp(β0 ) exp(β1 xi1 ) . . . exp(βp xi )
1 − πi
61 / 398
Binary Target: Probit- & cloglog-Models
Probit-Model:
use standard-Gaussian ECDF as response function:
π̂i = Φ(x>
i β̂)
cloglog-Model:
response function:
π̂i = 1 − exp(− exp(x>
i β̂))
62 / 398
Binary Targets: Expectation and Variance
63 / 398
Example: Patent Injunctions
64 / 398
Binary Target: R-Implementation
## 2.5 % 97.5 %
## (Intercept) 0.462 0.355 0.601
## uszwUSPatent 0.676 0.592 0.772
## jahr 0.931 0.915 0.947
## azit 1.125 1.095 1.157
## aland 1.088 1.066 1.111
## ansp 1.018 1.011 1.025
## brancheBioPharma 1.975 1.676 2.328
## herkunftD/CH/GB 1.381 1.174 1.625
## herkunftUS 0.859 0.741 0.997
## estimated
## einspruch 0 1
## nein 2223 624
## ja 925 1094
Count Data as Targets
Daten:
I positive, whole number target y (counts, frequencies)
I metric and/or categorical x1 , . . . , xp
⇒ naive estimates Ê(yi ) = x>
i β̂ could become negative
⇒ model log(Ê(yi )), i.e,
Ê(yi ) = exp x> i β̂ = exp(βˆ0 ) exp(βˆ1 xi1 ) . . . exp(βˆp xip )
67 / 398
Count Data as Targets: log-linear Model
Distributional assumption:
I yi |xi ∼ Po (λi ) ; λi = exp(x>
i β)
⇒ E(yi ) = Var(yi ) = λi
Overdispersion:
I Frequently Var(yi ) 6= λi :
⇒ more flexible model with dispersion parameter φ:
Var(yi ) = φλi
⇒ alternative distributions: Tweedie, Negative Binomial
68 / 398
Exampe: Patent Citations
## df AIC
## pat2 9 21021.23
## pat3 10 16341.48
round(cbind(
summary(pat2)$coefficients[2:5, -c(3, 4)],
summary(pat3)$coefficients[2:5, -c(3, 4)]
), 3)
exp(x>
i β)
I logit regression: E(yi |xi ) = P(yi = 1|xi ) = 1+exp(x>i β)
I log-linear model E(yi |xi ) = exp(xi β)
I Distributional assumption: Given independent (xi , yi ) with
exponential family density f (yi ):
yi θi −b(θi )
⇒ f (yi |θi ) = exp φ ωi − c(yi , φ, ωi ) ; θi = θ(µi )
I E(y |x ) = µ = b 0 (θ ) = h(x> β)
i i i i i
I Var(y |x ) = φb 00 (θ )/ω ; ω = n
i i i i i i
70 / 398
Simple Exponential Families
Normal N(µ, σ 2 ) µ θ2 /2 σ2
µ
Bernoulli B(1, µ) log( 1−µ ) log(1 + exp(θ)) 1
Poisson Po(µ) log(µ) exp(θ) 1
Gamma G (µ, ν) −1/µ − log(−θ)
√ 1/ν
Inverse Gauß IG (µ, σ 2 ) 1/µ2 − −2θ σ2
71 / 398
Simple Exponential Families
Normal µ=θ 1 σ 2 /ω
exp(θ)
Bernoulli µ = 1+exp(θ) µ(1 − µ) µ(1 − µ)/ω
Poisson µ = exp(θ) µ µ/ω
Gamma µ=1− √ 1/θ µ2 µ2 /(νω)
Inverse Gauß µ = 1/ −2θ µ3 µ3 σ 2 /ω
72 / 398
R-Implementation: glm()
73 / 398
Advantages of GLM-Formulation
74 / 398
Recent Extensions:
75 / 398
ML Estimation: Idea
Pn > 2
OLS estimate in linear model: i=1 (yi − xi β) → min
I
√ Pn
(y −x> β)2
density for y: ni=1 f (yi |β, xi ) = ( 2πσ)−n exp − i=1 2σi 2 i
Q
I
76 / 398
ML Estimation: Procedure
Pn
I log-likelihood l(β) = i=1 log(f (yi |β, xi ))
∂
I score function s(β) = ∂β l(β)
I (iterative) solution for s(β) = 0
via Fisher-Scoring or IWLS
77 / 398
ML Estimation: Fisher-Scoring
I basically Newton method:
−1
β (k+1) = β (k) − ∂β∂> s(β) s(β)
Newton−Verfahren
β(0)
s(β)
β(1)
β(2)
β
78 / 398
ML Estimation: Fisher-Scoring & IWLS
79 / 398
Properties of ML Estimators
80 / 398
Tests
Linear hypotheses H0 : Cβ = d vs HA : Cβ 6= d
Estimation for β under restriction H0 : β̃
I LR-Test:
lq = −2(l(β̃) − l(β̂))
I Wald-Test:
I Score-Test:
u = s(β̃)> F−1 (β̃)s(β̃)
a
under H0 : lq, w , u ∼ χ2r , r = rank(C) (no. of restrictions)
⇒ rejecct H0 if lq, w , u > χ2r (1 − α).
81 / 398
Tests in R
√ a
summary.glm uses w ∼ N(0, 1) for H0 : βj = 0:
round(summary(pat2)$coefficients[8:9, ], 3)
83 / 398
Model Diagnostics: Residuals
84 / 398
Model validation: plot.glm()
●●●●● ●●●●● ●●
●
●
●
●
●●
● ●● ● ● ●● ●● ●● ●● ●
●●
●
●●
●
●
●●●● ●● ●● ●● ● ● ●
● ●
●● ●
● ●●
●
●●
●
●
●
●●
5
5
● ● ●
●● ● ●
●
●
●●●● ●
●
●●
●
●●●●●
●
●
●●
●●
●
●
●●
●
●●
●
●●
●●●
●
●●
●●
●
●●●
●
●
●
●● ●
●
●
●●●
●
●
●●●
●
●●
●
●
●
●
●●●●●
● ●
●●● ●
●●
●
●
●●
●
●
●
●●
●
●
●●
● ● ●
●
●
●●●
●
●●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
●
●●
●
●
● ●
●
●●●
● ● ●● ●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●●●
●●●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●●
●
●●●
●
● ●●●●
●
● ●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●●●●
●●
●●
● ●
●
●
●●
●
●●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●●
●
●
●●
●
●
●●●●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●●
●
●
●●●
●●
●●
●●● ●
● ● ●
●
●●
●
●
●
●
●●
●
●●
● ●
●
●
●
● ●●●
●●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●●
●
●●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●● ●●
●
● ●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●●
●
● ●
●
●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●●
●● ●
●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●
●●
●
●
●●
●
●●●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●●●
●
●
●●
●
●●
●
●●
●
●●
●●
●●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●
●●
●
●
●●
●
●●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●●●
● ● ● ●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●●●
●
●●●
●
●
●
●●
●
●
●
● ●
●●
●●
●
●
●●
● ●
●
●●
●
●
●
●●
●
●●●
●
●
●
●
●●
●●●
●
●●
●
●
●
●● ●
●
●●
●
●●
● ●
●
●●
●●●
●
●●●
●
● ●
●
●
●●
●
●
●●
●
●
●●
● ●
●●
●
●
●●●
●
● ●
●●
●
●●
●
●●
●●
●●
●●
●
●●
●●
●
●
● ●●
●
●
●
●
●●
●
●●
●
●
●●
●●
●●
●
●
●
●●
●
●
●
●
●●
●●
●●●●●
●●● ●●
●●
●
●●
●
●●
●
●
●●
●
●●
●●
●
●
●
● ●●
−5
−5
●●
● ●
−5 5 15 25
● 4743 1871 ● ● ● 2796
3.0
● ●
●
● ●●●●● ●
● ●● ●● ● ●
● ● ● ●● ● ●●●● ●●●● ● ●● ●●
● ●
●●●●● ● ●● ●●●●●●●●
● ●
●
●● ●
●● ●●
●
●
●●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
● ●●
●●●●
●
●
●
●
●●
●
●●●●
●●
● ● ●
●
●
●
●
● ● ●
● ●●●●●●
●●
●●●
●
●●
●●
●●
●
●●
●
●
●●●
●●
●
●●●
●●●
●
●
●●
●●●● ●●●●
●●
●●●
● ●●●● ●●●●●●●●
● ● ●
●
●
1.5
●●
●●●
●● ●●●
●●
●●
●
●●●
●
●●
●●●●
●● ●●
●
●
●●●
●
●●
●
●●●
●
●●● ●
● ●
● ●
●●●
●
●
●●
●
●
●●
●
●●
●●
●
●
●●
●●
●●
●
●●
● ●
●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●●●
●●
●
●●●
●
●
●
●
●●
●
●●
●
● ●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●●●
●●
●●
●
●
●
●●
●
●●
● ●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
● ●●●●
●
● ●
●●
●●●● ●
●
●
●
●
●
●
●
● ●
●●
● ●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●
●
● ●●
●
●
●
●
●
●●
●
●
●●
●●
● ●●
●
●
●
●●
●
●●
●
●●●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
● ●
●
●●
●●
●
●●●
●●
●
●
●
●●
●
●●● ●
●●
●
●●
●
●●
●
●
●●
● ●●
●●
●
●●
●●
●●
●
●●
● ●
●●●
●
●
●●
●
●
●●
●●
●
●●
●● ●● ● ● ●● ●●
●
●
●
●
●●
●●●
●
●
●●
● ●
●
●●
●
●
●●●
●
●●
●
●●
●
●
●●
●
●●
●●
●
● ●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●●
●
●●
●
●
● ●
●
●●
●
●●
●
●
●●●
●
●●●
● ●
●●
●
●●
●
●●
●
●●●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●● ●
●●
●●
●●
● ●
●●●● ●
●
●
●
●
●
●
● 1
● ● ●●
●
●●
●
●
●●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●
●●●
●●●
●
●●
●
●●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●● ●● ●
●
●
●
●
●
●
●
●
●
●
●● 0.5
●●
●●●●
● ●
●●
●
●
●●
●
●●
●
●●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●● ●
●
●●
●
●
●●
●●
● ●
●
●●
●
●●
●
●●
●●
●● ●●
●
● ● ●
●
●
●
●●
●
●●
●
●
●●
●
●●
●●
●
●
●● ●●
●
●
●●
●
●●
●
●●
●
●
● ●●●
●●
●
●●
●
●
● ●
●
●
●●
●
●●
●
●
●
●
● ●
●●● ●
●● ●●
●● ●●
●
●
●
●
●
●
●
●●
●
●● Cook's distance
●
●
●
●●
●
●
●●
●
●
●
●●
● ●
●
●●
●
●
●●
●
●
●●
●
● ●
●
●●
●
●●
●
●
● ●
●●
●
●●
●●
● ●
●
●●
●
●● ●
●●●●
●
●●
●
●● ●●
● 0.5
1
●
●● ●
●● ●● ●
● ●
● ● ● 3939 ● 4290 ●
0.0
●
●
●●
●●
●
● ●
●●
●
● ●
● ●
● ●
−0.5 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.1 0.2 0.3 0.4
86 / 398
Motivation
bla
blub
20
10
5
^ε
0
−10
20
● ●
● ● ● ●
● ●
15
15
● ● ● ●
● ●
● ● ● ● ● ●
● ● ● ● ●● ●
●● ● ●
● ● ●● ●
● ●● ● ● ●
● ●● ● ●●
● ●● ● ●
●
● ● ● ● ●● ● ●● ● ● ●
● ● ●
● ●● ● ●●
●●
● ● ● ● ●● ● ● ●
● ●
10
10
●● ●● ●● ● ●●● ● ● ●● ●
● ● ● ● ● ● ● ● ● ●
●●● ● ●●● ●●● ● ● ●
●
● ● ● ●● ● ● ●
● ●
● ●●● ●
●● ●
●●● ●
●
● ● ●● ● ● ● ●● ●●●● ● ●● ● ●● ● ● ●● ●● ● ● ●
●●
● ●●●●
● ●●
● ●●● ●●●● ● ● ●● ●
●
● ● ●●●
● ●●●● ● ●● ● ●●●●●● ●●●● ● ● ●
● ●
● ●
●● ●●● ● ● ●● ●● ● ● ● ●● ●
●●
● ● ● ●
●●●●● ●●●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ●●
● ● ●● ●●●●●● ● ● ● ● ●
●
●
● ●
● ●●●
●
●●●● ●●
●
● ●
●●
●
●
●● ●
●
● ● ● ●
● ● ●●
●● ●
●● ● ● ● ●●● ●
● ● ●●●●● ●●●●● ●
●
● ● ●
● ● ●●
●● ●
●● ● ● ●
●● ●●●●● ● ●● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●●●●●●●●● ●●●
●●●● ● ● ● ● ●● ● ●
●●
● ●
●●●
● ● ●●● ●●
●
●● ●●
●
● ● ● ● ● ●● ●
●●● ● ●
● ●
●● ●●● ●●●●●●●●●●●
●
● ● ● ● ●
●● ●● ●● ● ● ●● ● ● ●● ●● ● ●●●● ● ● ●● ● ● ●● ●
●
● ●●●●●
●
●
●●●
●
●●
●●●
●
●●
●
●●
●●
●● ●●
●
●
●●
● ●●●●●
●●● ●●●
●
●● ●
●●
● ● ● ●●
●●
● ● ● ● ● ●●●● ●●
●● ●●●●●● ●●● ●●●●
●● ● ● ●● ●●●● ● ●●●
●
●● ● ●● ● ●●
●●
● ● ●
●
●● ● ●●●
●●●●●
●●●
● ● ● ●
●●
●
●● ●
●
●●
●
●●
●
●
●●●●●
●● ●● ●
●●●●●●● ● ● ● ● ● ● ●
● ●● ●
●●● ●●●●
●●●●● ● ●●
●●
●●
●
●●●●
● ●●
●● ●●
●●●●
●●
●●● ● ●
●●●●●●●● ● ● ●● ●
5
5
● ●●● ● ●
●●
●●●
● ● ●
●
●●●●●
●●
● ●
●
●● ●
● ●●●●●
● ●●● ●
●
● ●● ● ●● ● ● ● ●
●● ●●● ●●●
●●● ● ●●●●●●
●●
● ●●● ● ● ●●
●
● ●●●
● ●
● ●●● ● ●●● ● ● ●
● ●● ● ●● ● ●● ● ● ●●● ● ● ●● ● ●●●● ● ● ●●●●
● ●● ● ●
●●
● ● ●
●
●
●●●●● ●●
●●●
●●●
●●●●
●●●
●
● ●●
●● ●
●
●●●● ● ●● ●● ● ● ● ● ●●● ●
●● ● ●
●●●●●● ●●
●●●●●●● ●●●●●
●●●●● ●●
●●●
●
●
● ●●
●●● ●●●● ● ●● ●●
● ● ● ●
● ● ●●
●●
●●
●●
●
●●●
●
●●●
● ●
●
● ●
●●
●
●●●
●●●●
●
●●●
●
●●●
●●●
●●
●
●●●
● ● ●● ●
● ●●●● ● ●● ● ● ●●
● ●
●
● ● ● ●●●●●
●● ●●
● ●●●●●●●●
● ●●●
●●●
● ●●●
●
●
●●●● ●
●
●
●●
●●
●
●●●
● ● ●● ●
● ●● ●● ● ●●● ● ●● ● ● ●
●
● ● ●
●●
●●● ●●●
●
●●● ●●● ●
●●●●●●●
●●
●●
●
●● ●
●
●●● ●●●
●
●●●● ●
●● ●
●● ● ● ●
●● ●● ● ●● ● ●●● ● ● ●●
● ● ● ● ●● ●●● ●
●●●● ● ●● ●●
● ●
●●●
●● ●●
●●
● ●●●
●
●●●● ●
●● ●
●● ● ● ●● ● ●
● ● ●● ●
^ε
^ε
●● ●●● ● ●● ●●●● ● ●●
●●
● ●
●●●
●
● ●● ●
●
● ●●●
●
●
●
●●
●●
●
●●
●
●●●
●
●
●●
●
●●●
●
●●
●
● ●
●●●
●
●
●
● ●
●●●
●●
●●
●
● ●● ●●
●● ● ● ●●● ●●●●● ● ● ●●
● ●●●● ●●● ●
● ●●● ●●● ●●●● ●●●
●●
●●
●●
●●
●●
●
●
●
●
●
●●●
●
●
●
● ●
●●●
●●
●●
●
● ●● ● ●
● ● ● ● ●●●●
● ●● ●
●●●●● ●
●●
●● ●●●●
●●
● ●
●● ●● ●●●●●
●●●
●●●● ●
● ●
●●●● ●
●●
● ●
●●● ●
● ● ● ● ●● ● ● ● ● ● ● ●
●●●●● ● ● ●
●● ● ● ●●●●●●●● ●● ● ●●
●●
●●●●●●●● ● ● ●
●● ● ●
●●
● ●●● ●
● ● ●● ●● ● ● ●● ●●
●● ● ● ●●
●●●●
● ●●●●●● ●
●
●● ●●● ●● ●● ● ● ●●
● ● ● ●●
● ● ● ● ●●
● ●●●
●●
●●● ● ●●●●●●●●●
●● ●●●
●● ● ●●● ●●● ● ● ●●● ●●●● ●●
● ●
●●● ●●
● ● ●●
●●
●●
● ●●
●●●●●●●●●● ●
●●● ●●●●●
●●●
●●
●●●
●●
●● ●●● ●●●●● ● ● ● ● ●●●● ●●● ●● ● ●● ●● ●●●●●
●●●
●●
●●●
●●
●● ●● ● ● ●
● ● ●
●●●
● ●● ●●● ● ●●● ●
● ●●●●
●●● ●●●●●●● ●●●●●●● ●●● ● ● ●●●●● ●● ● ● ●●●●●●● ●●●●●●● ●●●
●●●●●●
●●● ●●●●●●● ●●● ●●●●●● ● ●
● ●● ●●
● ● ●●●
● ●● ●
● ●
● ● ●●
●●●● ● ● ●●● ●●● ● ● ● ● ●● ●●●●● ●● ● ●●
●●●● ● ● ●● ● ● ●●● ●● ●
●
●●● ●
●●●●●
●
●
●
●● ●
● ●
● ●●
●●
●●●
●●
●●●●
●●●
●●
●●●
●
●●
●
●
●
●
●
● ●
●
●
●
● ●●
●●
●● ●
●●
●●●
●●●
●
● ●● ●●●●
●
● ● ●
● ● ● ●●●
●●●●●● ●●●●●● ● ●●●●●●●●
●●●●●●●
●● ●●●
● ●
●●
●●●
●
●●
●
●
●
●
●
● ●
●
●
●
● ●●
●●
●● ●
●●●●●●●
●
●●●●
●●●●
●
●● ● ●
●
● ● ●
●
●
●● ●
●
●●●● ●●●●●●●
●●●●
●●●
●
●
●●
●●
● ●●
● ●●●●●
● ●
●
●●●●
●●
●●● ●●● ● ●●
● ● ●
●● ●●●●● ●●●
●●●● ●●● ●●● ●
●●●●●
● ●
●● ●
●
● ● ●●●
● ●
●
●●●●
●●
●●● ●● ● ●●●
●
0
0
● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●
●●●●
●● ●●●●
●●●●● ●●
●●
● ●
●●
●
●●●
●●●●
●●● ●
●●
● ●●
●●
●●●
●● ●● ●●
●
●
●
●
●
●
●●
●●● ● ●●
●
●● ● ●● ● ●●
● ● ●●
● ●●●
● ●
● ●●●● ●● ● ●● ●●●●●●●
● ●●●
●● ●● ●
●●● ●
●
●● ●
●●
●●
●
●●●
●● ●● ●●
●
●
●
●
●●
●●
●●●● ●●●
●● ●● ● ●●●
● ● ●●
●● ● ●
●●
●
●●
●●● ● ● ●●
● ●●
● ●
●
●
●●● ●
●●
●●
● ●
●
●
●
●●●●●●●●
●●
●
● ●●●●
● ● ●
●●
●● ● ●
● ● ●●
●
●●●
●
●●
●●●● ● ●●●●●●● ●
●● ●●
●
●●●●
●
●
● ●
●
●
●
●●●●●●●●
●●
●
● ●●●●
● ● ●
●●
●●●●●
●● ●●
●● ●
● ●● ●● ●
●● ●●●
●
●●●
●●
●●●●●
●●●
●
●●
●●●
●
●●●●
●●●●
●●
●●●●●●
●
●
●●
●●●
●
●●●
● ●●
● ●●●
●● ●
●
●●●
● ●● ● ● ● ●
● ● ●● ● ● ●●●●●●
●●●● ●● ●
●●
●●●●
●● ●●●
● ●
●
●●●●
●●
●●●●●●
●
●
●●
●●●
●
●●●
● ● ●
● ●●
●●
● ●
●
●●
●●●● ●●
● ●
●
●●
●● ●
●
●
●
●●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●● ●
●
●
●●
●●
●
●
●●●
●●
●●●
●
●
●●
●●
●
●
●●
●●
●
●●
●●
●●●●●
●● ● ●●● ● ● ● ● ● ●●● ● ●● ●●
●
●●
●
●
● ● ●●●
● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●● ●
●
●
●●
●●
●
●
●●●
●●
●●●
●●
●●
●●●
●
●
●●
●
●
●●
●●
● ●●●
●
●● ● ●●
● ● ●● ●
● ● ● ●●● ● ●●
● ●●
●● ●
●● ●●
●
● ● ●● ●
●●●●●
●● ●
● ●
● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●
●●●●●●●●●● ●●●● ● ●●
●
● ● ●● ●
●●●●●
●● ●
● ●
● ●● ● ● ●●●
●
●●
●●
●●●●●
●● ●
●● ●●●●●● ●●
● ●● ● ● ● ●● ● ● ●
●● ● ●●●●●●●●●●
●
●●●●● ●●
● ●● ● ●● ●●● ●●
● ● ● ●●● ●●●●●
●
● ●●● ●
●●
●●●
●
●●●
● ●●
●●● ●●
●
●●●●
●
● ●
●●●
●●
●
●●●●●●●●
●●●●●●
● ●
● ● ● ● ● ●
● ● ● ●● ●●● ● ●●
●●●● ●●●●●●●●●
● ●●
●●● ●●
●
●●●●
●
● ●
●●●
●●●
●●●●●●●
●
●●●●●●
● ●
●● ● ● ● ●
●
●●●●●
●●● ●●
●●●
●
●●
●●●●
●
●
●
●●●●●
●●
●●
●
●●
●
●
●●●
●
●
●● ●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●●
● ●● ●
●● ●● ●●●● ● ●● ● ● ●
● ●● ●●●●
●●●● ●● ●●●●
●●●● ●
●●
●
●●
●●●
●●
●●
●
●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●
●
●
● ●
●
●
●
●●●
●
●● ●●●●● ●
● ●●●●
●● ●●
●
●● ● ●● ● ●
●●●
●●●●
●●●
● ●●
●
●● ●
●●● ●●
●
●●
●●●●●
● ●
●●
●
●
●●●
●●
●
●
●●●
●
● ● ● ●
●
●
●● ●●● ●●●● ●
● ● ●
● ● ● ● ●●● ● ●●●
●●●●●●●●●● ●● ●● ●
● ●
●
●● ●●
●
●●
●●●●●
● ●
●●
●
●
●
●
●●
●●
● ●●●●
●
●● ● ●
●
●
●● ●●
● ●●●
●●●
●● ●
● ●●
●
●● ●
● ●●
●●
●●●●●
●●●●●●●
●
●
●
●●●
●
●●
●
●●
● ●●
●●●
●●●
● ●
●
●
●●●
●
● ●●
●
●●● ●
●●
●
●
●●
●
●
●
●● ●● ●
●●●●
● ● ● ●
● ● ● ●● ●●●●● ●●●●●●●
●● ●●●
●●●●
●● ●
●
●
●●●
● ●●
●●●
●●●
● ●
●
●
●●●
●
● ●●
●
●● ● ●
●
●● ●
●●●
●
●
●
●●●●● ●
●●
● ●●● ● ●
−10 −5
−10 −5
● ● ●● ●● ●●
●●● ●●● ● ●
● ● ● ● ● ●●●●
● ● ● ● ●● ●● ●●●● ● ● ● ●●●●
● ●● ● ●●● ● ●
●●
● ●●●
●● ●
●
●●●
● ●
● ●●●●●●
●● ●
●
● ●
●●
●● ●●● ●● ●
●
●● ● ●
● ● ● ●● ●● ● ●● ● ●●● ●●●
●
● ●
●●●●●●●●
●● ●
●
● ●
●●
●● ● ●●
● ●●●● ●
●●
●
● ●●
●●●●●●● ● ● ●
●
●●● ● ●●● ●
●
●●●●
●●
●
●●
●●●●
●●
●●●●● ●
●●●
●●●
● ●
●
●
●●●
●●●
●
●
●●●
●
● ●●●●●●
●●
●
●
●
● ● ●●
● ●●
●● ●●
●●
● ●
●●●●
● ● ● ●
●● ●
●
● ●●●●●●●●●●●●
●●●
●●
●
●●●●
●●● ●
●●● ●
●●●
●●●
●
●●
●
●●●
●●●
●
●
●●●
●●●
● ●●●●
●●
●
●
●
●●●●
●
●
● ●●
●●
●
● ●●● ● ●
● ● ●
● ●●● ●
●● ●
●●● ●
● ●● ●
● ●●● ●●●● ● ●●
●
● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●
● ●●●● ●●● ●● ●● ● ●●
●
● ●
● ● ● ● ●● ●●●●● ● ●
● ● ● ●●● ● ● ●● ● ● ● ● ●● ●● ● ●● ●
●●●● ● ●
● ● ● ●●● ● ● ● ● ●
● ● ● ●●
●
●● ●●
●
●● ●● ●
●
●
●
●
●
●
●
●
● ●● ●● ●●
●●
● ●
●
●
●
●
●
●
● ●●●●●
● ●
●●●● ● ●● ● ●● ● ● ● ●
● ●● ●●●● ●● ● ● ●●●●●●●
● ●●● ● ●●
●●
● ●
●
●
●
●
●
●
● ●● ●●●
● ●
●●
●● ●●● ● ●●● ● ● ●
●● ● ● ● ● ●● ● ● ●
● ● ● ● ●● ● ●
●● ●● ● ● ●● ● ● ●● ● ● ● ● ●
●●● ●●● ● ● ●● ●
●●● ● ●●●●● ● ● ● ● ●● ● ●●
●●● ● ●
●● ●●
●
● ●● ● ●
●●● ● ●●●●● ● ● ●●● ●●● ● ●●
● ● ●● ● ● ●●● ● ●● ●
●●●
● ●
●● ● ● ● ● ● ● ● ● ●●●●● ●●● ● ●● ● ●● ●●●
●
●●●
●●● ● ● ● ●●
● ● ●
●● ●
●● ●●●●●
● ● ● ● ● ●●
●● ●●
●●●●
● ●●● ●●
● ● ●
●●● ● ●
●●●●● ● ●● ●●●●
●● ●●
●●● ●● ●
●●● ● ●
● ● ●
● ● ● ●●● ●
● ●
●● ●●● ●
● ●● ● ● ● ●● ● ● ● ● ●● ● ●●●●● ●
● ●
●
●● ●●●● ●●●● ● ●
●●
●● ● ● ● ●● ●●
●● ●
●●●
● ● ● ●● ● ●
●● ●
●● ●
●●●
● ● ● ●● ● ●
●● ● ● ●● ●●● ● ● ● ●●●● ●● ● ● ● ●
● ● ● ●● ●●●
● ● ●●● ● ● ●●●● ●● ● ● ● ●
●
● ●●● ●● ●● ● ●
●
●●● ● ● ● ● ●● ●●
● ● ●●● ● ●
● ● ● ●
● ● ● ● ● ●● ●
● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
●● ● ●
● ●
● ●
● ● ● ●
Simple Transformation
88 / 398
Polynomial Transformation
I Polynomial Model
I y = f (x) + ε = β0 + β1 x + β2 x2 + · · · + βl xl + ε
I In R: Use poly(x,degree) to avoid collinearity
89 / 398
Polynomial Transformation: Collinearity
x <- seq(0, 1, l = 200)
X <- outer(x, 1:5, "^")
X.c <- poly(x, 5)
round(cor(X), 2)
round(cor(X.c), 2)
## 1 2 3 4 5
## 1 1 0 0 0 0
## 2 0 1 0 0 0
## 3 0 0 1 0 0
## 4 0 0 0 1 0
## 5 0 0 0 0 1
91 / 398
Polynomial Transformation: Synthetic example
●
●●
● ●●
2
●
●●
●● ● ●
●
● ●
● ● ●●● ●
● ●●●
● ● ●
● ●● ● ● ●●
● ● ● ●
●
● ● ● ●● ● ●●
● ● ●
● ● ● ● ●● ● ●●● ●
1
● ● ● ● ●
● ● ●
● ● ●● ● ● ●●
● ●● ● ● ●● ●● ●● ●
● ● ●
●● ●● ● ● ● ● ● ● ● ● ●
●● ●● ● ●● ●● ●●● ●● ● ●
●
y
●● ● ●● ● ●
●● ●● ● ● ●
● ● ● ● ●
● ● ●
● ●● ● ● ● ●● ● ●
● ● ● ● ● ●●
● ● ●
● ●● ● ● ●●
0
● ● ●● ● ● ● ●
●● ● ● ●●●
● ● ●● ●●●
●● ● ● ● ●
● ● ● ●● ●●● ●
●● ●
● ● ● ● ● ● ●●
●● ● ●
● ●●
● ●● ● ●
● ●●
●●
●● ● ● ●
● ● ● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ●
●
−1
● ●● ●● ●
●●● ● ●
● ● ● ● ●
● ●● ●● ● ●
● ●
● ● ●● ●● ●
92 / 398
Piecewise Polynomials
93 / 398
Piecewise Polynomials
Piecewise Polynomials
●
true f
degree 15
●
5 piecewise ●●
● ●●
2
●
quadratic polynomials ●●
●● ● ●
●
● ●
● ● ●●● ●
● ●●●
● ● ●
● ●● ● ● ●●
● ● ● ●
●
● ● ● ●● ● ●●
● ● ●
● ● ● ● ●● ● ●●● ●
1
● ● ●
● ● ●
● ● ●●●
● ● ● ●
●● ●
● ●● ●●●● ● ●●● ● ● ●●●● ●
●● ●● ● ● ● ● ● ●
●● ●● ● ●● ●
● ● ● ●● ●●
y
●● ● ●● ● ●
●● ●● ● ● ● ●
● ● ● ●
● ● ●
● ●● ● ● ● ● ●●
●
● ● ● ● ● ●●
● ● ●
● ●● ● ● ●●
0
● ● ●● ● ● ● ●
● ●● ● ●●●
● ● ●● ●●●
●● ● ● ● ●
● ● ● ●● ●●● ●
●● ●
● ● ● ● ● ● ●●
● ●●
●
●
●●●●● ● ●
● ●●
●●
●● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●● ●● ● ●
−1
● ●● ●● ●
●●● ● ●
● ● ●● ● ● ●
● ● ● ●● ●
● ● ● ●● ●● ●
95 / 398
Polynomial Splines: Example
polynomial spline degree 0 polynomial spline degree 1
● ●
2
● ●● ● ●●
● ●
●● ●●
●● ● ●
● ●● ● ●
●
● ● ● ●
● ● ●●● ● ● ● ●●● ●
● ● ●●● ● ● ● ●●● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ●● ● ● ●● ● ●●● ● ● ● ●● ● ● ●● ● ●●● ●
● ● ● ●●● ●● ● ● ● ●●● ●●
1
1
● ● ● ●● ● ● ● ● ● ● ●● ● ● ●
● ● ●● ●●●
●
● ● ●● ● ● ● ● ●● ●●●
●
● ● ●● ● ●
● ●● ●● ●●●● ● ● ● ● ●● ●● ●●●● ● ● ●
● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●●●● ●
●●●●● ● ● ● ●● ●● ●●● ●● ● ●●●●● ● ● ● ●● ●● ●●● ●● ●
● ● ● ● ● ● ● ● ● ●
y
y
●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ●
● ● ●● ● ● ●● ● ● ●● ● ● ●●
● ●
0
0
● ●● ● ● ●●
● ● ●● ● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ●
● ●
●● ●●● ● ● ●● ● ●●● ●● ●●● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●●● ● ● ● ●● ●●● ●
●
●●●● ● ● ● ● ●● ● ● ●
●●●● ● ● ● ● ●● ● ●
● ● ●●●
● ● ● ● ●● ● ● ●●●
● ● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ●
−1
−1
● ●
● ●
●●● ● ● ●● ●● ● ● ●
● ● ●
●●● ● ● ●● ●● ● ● ●
●
● ● ●● ● ● ● ●● ●
●● ● ● ●● ● ●
● ● ● ● ● ● ● ● ● ●
●● ●● ● ● ●● ● ●● ●● ● ● ●● ●
● ● ● ● ●●● ● ● ● ● ●●●
● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
● ●
2
● ●● ● ●●
● ●
●● ●●
●● ● ●
● ●● ● ●
●
● ● ● ●
● ● ●●● ● ● ● ●●● ●
● ● ●●● ● ● ● ●●● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ●● ● ● ●● ● ●●● ● ● ● ●● ● ● ●● ● ●●● ●
● ● ● ●●● ●● ● ● ● ●●● ●●
1
1
● ● ● ●● ● ● ● ● ● ● ●● ● ● ●
● ● ●● ●●●
●
● ● ●● ● ● ● ● ●● ●●●
●
● ● ●● ● ●
● ●● ●● ●●●● ● ● ● ● ●● ●● ●●●● ● ● ●
● ● ●● ● ● ●●●● ● ● ● ●● ● ● ●●●● ●
●●●●● ● ● ● ●● ●● ●●● ●● ● ●●●●● ● ● ● ●● ●● ●●● ●● ●
● ● ● ● ● ● ● ● ● ●
y
y
●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●
● ●
●● ● ● ● ● ●● ● ● ● ●
●●●
● ●● ●● ● ● ● ●● ● ● ●●●
● ●● ●● ● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ● ●●
● ● ● ●
0
●
0
● ●● ● ● ●●
● ● ●● ● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ●
● ●
● ● ●●● ● ● ●● ● ●●● ● ● ●●● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●●● ● ● ● ●● ●●● ●
●
●●●● ● ● ● ● ●● ● ● ●
●●●● ● ● ● ● ●● ● ●
● ● ●
●● ●
● ● ● ●● ● ● ●
●● ●
● ● ● ●●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
−1
−1
● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ●● ●● ● ●
● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●
●●● ● ●● ● ●●● ● ●● ●
● ● ●● ● ● ● ● ● ●● ● ● ●
● ●● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ●●● ● ● ● ● ●●●
● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
97 / 398
Truncated Polynomials
98 / 398
Truncated Polynomials: Example
basis functions
●
●●
2
●
●● ●
●
●
●
●●● ●
● ●
● ● ●●● ●
● ●●
● ● ● ●
● ●● ● ● ●●
● ●
● ● ● ●● ● ●●●
● ● ● ● ●
● ● ●● ●●
1
● ● ● ●
● ● ● ●
● ● ● ●●
● ● ●
● ●● ● ● ●●
●●
●● ● ●● ● ● ● ● ● ●●
● ● ● ● ● ●
●● ●● ● ● ● ●
●● ●● ● ● ●● ● ● ●●
● ●
●
y
● ● ● ●● ● ●
● ● ● ●
●● ●
● ● ● ● ● ● ●
●●● ● ● ● ● ● ●
● ● ●
● ● ● ● ●●
● ● ●
0
● ●● ●●
●● ● ● ●
●
●
●● ● ● ●● ● ●●●
● ●●
● ●
● ● ● ● ● ●
● ● ● ● ●● ●
● ●● ●
●●● ● ● ●
● ● ● ●●
●● ● ●●
● ●
●● ● ●
●
● ●● ●●
●● ● ●
● ● ●
● ● ● ●
● ●●● ● ● ●
● ● ● ●● ●● ●
−1
● ●
● ●● ● ●
●
● ●
● ● ● ● ●
● ● ●●
● ●● ● ● ●
● ●
● ● ●● ●
●●
● ●
● ●
●● ●●
2
● ●
●● ● ●● ●
● ●
● ●
● ●
●● ●
● ●● ●
●
● ● ● ●
● ● ●●● ● ● ● ●●● ●
● ●● ● ●●
● ● ● ● ● ● ● ●
● ●● ● ● ●● ● ●● ● ● ●●
● ● ● ●
● ● ● ●● ● ●●● ● ● ● ●● ● ●●●
● ● ● ● ● ● ● ● ● ●
● ● ●● ●● ● ● ●● ●●
1
● ● ● ● ● ●
99 / 398
● ● ● ● ● ● ● ● ● ●
● ●● ● ●● ● ● ● ●● ● ●● ● ●
● ●● ● ● ●●
●●
● ●● ● ● ●●
●●
●● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ● ● ●
● ●
Truncated Polynomials: Discussion
Numerical disadvantages
I Basis function values can become very large
I strong colinearity of basis functions
⇒ numerically preferable: B-spline-basis functions
100 / 398
B-splines: Idea
101 / 398
B-Splines: Basis Functions
B−spline basis functions
1.0
degree l=0
0.8
0.6
B(x)
0.4
0.2
0.0
κ1 κ2 κ3 κ4 κ5 κ6 κ7 κ8 κ9 κ10 κ11
103 / 398
(B-)Splines as Linear Models
Model: y = f (x) + ε
How to estimate f (x)?
Idefine basis functions bk (x); k = 1, . . . , K
PK
I f (x) ≈
k=1 θk bk (x)
⇒ ŷ = f (x) = K
ˆ
P
k=1 θ̂k bk (x)
⇒ this is a linear model ŷ =Bθ̂
b1 (x1 ) . . . bK (x1 )
.. ..
with design matrix B = . .
b1 (xn ) . . . bK (xn )
I analogously applicable to GLMs: g (µ̂) = Bθ̂
104 / 398
B-Splines: R-Implementation
bs in splines package creates a B-spline Designmatrix B:
library("splines")
B <- bs(x, df = 12, intercept = T)
m_bspline <- lm(y ~ B - 1)
B_scaled <- t(t(B) * coef(m_bspline))
plot(x, y, pch = 19, cex = .5, col = "grey")
matlines(x, B, lty = 1, col = 1, lwd = 2)
●
●●
2
●
● ●●
● ●
● ●
●● ●
● ●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ●● ● ● ●●
● ● ● ● ● ●●● ●
● ● ● ●
● ● ●
● ● ● ●●
1
● ● ● ● ● ●
● ● ● ●●● ● ●● ●
●● ●
● ● ● ●
● ● ● ●● ● ● ●● ● ● ● ● ●
● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ●● ●●
y
● ● ● ● ● ● ●
● ● ●● ● ● ●
● ● ● ●
●
● ● ● ●
●
●● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
0
● ● ● ●● ●● ●
● ● ●● ● ● ● ●
● ● ● ● ●
● ● ● ●● ●
●
●
● ● ●
● ● ● ● ● ● ●● ●● ● ●
● ●● ● ● ●
● ● ● ●● ●
● ● ●● ● ●● ●
●
● ● ● ●● ● ●
● ● ● ● ●
● ● ● ●
●● ●● ● ● ● ●● ●●
●
−1
● ● ● ● ●
● ● ● ● ● ● ● ●
●● ●●
●● ● ● ●
● ●● ● ● ●● ● ●
● ● ● ●
● ● ● ●
x
B-Splines: R-Implementation
library("splines")
B <- bs(x, df = 12, intercept = T)
m_bspline <- lm(y ~ B - 1)
B_scaled <- t(t(B) * coef(m_bspline))
plot(x, y, pch = 19, cex = .5, col = "grey")
matlines(x, B, lty = 1, col = scales::alpha(1, .7), lwd = .5)
matlines(x, B_scaled, lty = 1, col = 2, lwd = 2)
●
●●
2
●
● ●●
● ●
● ●
●● ●
● ●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ●● ● ● ●●
● ● ● ● ● ●●● ●
● ● ● ●
● ● ●
● ● ● ●●
1
● ● ● ● ● ●
● ● ● ●●● ● ●● ●
●● ●
● ● ● ●
● ● ● ●● ● ● ●● ● ● ● ● ●
● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ●● ●●
y
● ● ● ● ● ● ●
● ● ●● ● ● ●
● ● ● ●
●
● ● ● ●
●
●● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
0
● ● ● ●● ●● ●
● ● ●● ● ● ●
● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ●● ●● ● ●
● ●● ● ● ●
● ● ● ●● ●
● ● ●● ● ●● ●
●
● ● ● ●● ● ●
● ● ● ● ●
● ● ● ●
●● ●● ● ● ● ●● ●●
●
−1
● ● ● ● ●
● ● ● ● ● ● ● ●
●● ●●
●● ● ● ●
● ●● ● ● ●● ● ●
● ● ● ●
● ● ● ●
x
B-Splines: R-Implementation
library("splines")
B <- bs(x, df = 12, intercept = T)
m_bspline <- lm(y ~ B - 1)
B_scaled <- t(t(B) * coef(m_bspline))
plot(x, y, pch = 19, cex = .5, col = "grey")
matlines(x, B, lty = 1, col = scales::alpha(1, .7), lwd = .5)
matlines(x, B_scaled, lty = 1, col = scales::alpha(2, .7), lwd = 1)
lines(x, fitted(m_bspline), lty = 1, col = 3, lwd = 2)
●
●●
2
●
● ●●
● ●
● ●
●● ●
● ●
● ● ● ● ● ●
● ● ●
● ● ● ●
● ●● ● ● ●●
● ● ● ● ● ●●● ●
● ● ● ●
● ● ●
● ● ● ●●
1
● ● ● ● ● ●
● ● ● ●●● ● ●● ●
●● ●
● ● ● ●
● ● ● ●● ● ● ●● ● ● ● ● ●
● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ●● ●●
y
● ● ● ● ● ● ●
● ● ●● ● ● ●
● ● ● ●
●
● ● ● ●
●
●● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●
0
● ● ● ●● ●● ●
● ● ●● ● ● ●
● ● ●● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ●● ●● ● ●
● ●● ● ● ●
● ● ● ●● ●
● ● ●● ● ●● ●
●
● ● ● ●● ● ●
● ● ● ● ●
● ● ● ●
●● ●● ● ● ● ●● ●●
●
−1
● ● ● ● ●
● ● ● ● ● ● ● ●
●● ●●
●● ● ● ●
● ●● ● ● ●● ● ●
● ● ● ●
● ● ● ●
x
Splines: Summary
108 / 398
Recap: Linear Models
109 / 398
Example: Sleep Deprivation Data
109 / 398
Example: Sleep Deprivation Data
110 / 398
Example: Sleep Deprivation Data
● ●
● ● ● ● ● ● ●
● ● ●
300 ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
● ● ● ●
●
●
● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ●
200 ● ●
● ● ● ● ● ●
● ● ●
● ●
300 ● ● ●
● ●
● ●
● ●
●
● ●
●
● ●
● ●
●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
200
351 352 369 370 371 372
400 ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ●
300 ●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
● ● ● ● ● ●
●
●
●
● ●
● ● ●
● ●
● ● ●
200
1 4 7 1 4 7 1 4 7 1 4 7 1 4 7 1 4 7
Days of sleep deprivation
Example: Sleep Deprivation Data
Model global trend: Reactionij ≈ β0 + β1 Daysij
##
## Call:
## lm(formula = Reaction ~ Days, data = sleepstudy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -110.848 -27.483 1.546 26.142 139.953
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 251.405 6.610 38.033 < 2e-16 ***
## Days 10.467 1.238 8.454 9.89e-15 ***
## ---
## Signif. codes:
## 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 47.71 on 178 degrees of freedom
## Multiple R-squared: 0.2865, Adjusted R-squared: 0.2825
## F-statistic: 71.46 on 1 and 178 DF, p-value: 9.894e-15
Example: Sleep Deprivation Data
With estimated global level and trend added:
308 309 310 330 331 332
● ●
●
●
400 ●
●
●
●
Average reaction time [ms]
● ●
● ● ● ● ● ● ●
● ● ●
300 ● ● ●
●
● ● ●
●
● ● ●
●
● ● ●
● ● ● ●
●
●
● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ●
200 ● ●
● ● ● ● ● ●
● ● ●
● ●
300 ● ● ●
● ●
● ●
● ●
●
● ●
●
● ●
● ●
●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
200
351 352 369 370 371 372
400 ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ●
300 ●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
● ● ● ● ● ●
●
●
●
● ●
● ● ●
● ●
● ● ●
200
1 4 7 1 4 7 1 4 7 1 4 7 1 4 7 1 4 7
Days of sleep deprivation
⇒ obviously inappropriate model
Example: Sleep Deprivation Data
● ● ●
● ● ● ● ● ●
● ● ● ● ●
300 ● ● ●
●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ●
●
●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
200 ● ●
●
●
●
●
● ● ●
● ●
●
● ● ● ●
●
● ● ● ● ●
● ● ● ● ● ●
200
● ● ● ● ●
200
0.02.55.07.5 0.02.55.07.5 0.02.55.07.5 0.02.55.07.5 0.02.55.07.5 0.02.55.07.5
Days of sleep deprivation
⇒ better fit
Motivation: From LM to LMM
116 / 398
Motivation: From LM to LMM
yij = β̄0 + (β0i − β̄0 ) + β̄1 x1ij + (β1i − β̄1 )x1ij + εij
117 / 398
Motivation: From LM to LMM
I idea of a random effect model
I β̄ is the population level effect β.
I express subject-specific deviations βi − β̄ as Gaussian random variables
bi ∼ N(0, σb2 ).
I this yields
yij = β0 + β1 x1ij + b0i + b1i x1ij + εij
with
εij ∼ N(0, σ 2 ), (b0i , b1i )> ∼ N2 (0, Σ).
I or alternatively:
yij = b0i + b1i x1ij + εij
with
εij ∼ N(0, σ 2 ), (b0i , b1i )> ∼ N2 (β0 , β1 )> , Σ .
118 / 398
Partial Pooling
119 / 398
Advantages of the Random Effects Approach
I decomposition of random variability in data into
I subject-specific deviations from population mean
I deviation of observations from subject means
⇒ more precise estimates of population trends
I some degree of protection against bias caused by drop-out
I random effects serve as surrogates for effects of unobserved
subject-level covariates
⇒ control for unobserved heterogeneity
I distributional assumption bi ∼ N stabilizes estimates b̂i (shrinkage
effect) compared to fixed subject-specific estimates β̂i without
distributional assumption
I intuition: estimates are stabilized by including prior knowledge in the
model, i.e., assuming that subjects from the population are mostly
similar to each other
120 / 398
Advantages of the Random Effects Approach
I random effects model the correlation structure between observations:
yij = β0 + bi + εij
i.i.d. i.i.d.
with bi ∼ N(0, σb2 ); εij ∼ N(0, σε2 )
Cov(bi , bi ) σ2
=⇒ Corr(yij , yij 0 ) = p = 2 b 2.
Var(yij ) Var(yij 0 ) σb + σε
y = Xβ + Ub + ε
b ∼ N(0, G)
ε ∼ N(0, R)
122 / 398
Conditional and Marginal Perspective
Conditional perspective:
Interpretation:
random effects b are subject-specific effects that vary across the
population.
Hierarchical formulation:
expected response is a function of population-level effects (fixed effects)
and subject-level effects (random effects).
123 / 398
Conditional and Marginal Perspective
Marginal perspective:
Interpretation:
random effects b induce a correlation structure in y defined by U and G,
and thereby allow valid analyses of correlated data.
Marginal formulation:
model is concerned with the marginal expectation of y averaged over the
population as a function of population-level effects.
124 / 398
Linear Mixed Model for Longitudinal Data
For subjects i = 1, . . . , m, each with observations j = 1, . . . , ni :
with
Pm
I y = (y> > >
1 , y2 , . . . , ym ) (n = i=1 ni entries)
I > > >
ε = (ε1 , ε2 , . . . , εm ) (n entries)
I β = (β0 , β1 , . . . , βp )>
I X = [1 x1 . . . xp ]
I b = (b1 , b2 , . . . , bm ) of length mq, with b ∼ Nmq (0, G)
I G = diag(Σ, . . . , Σ))
I U = diag(U1 , . . . , Um ) with dimension n × mq
I Ui = 1 u1i . . . u(q−1)i with dimension ni × q. Variables in Ui are
typically a subset of those in X.
125 / 398
Other Types of Mixed Models
I hierarchical/multi-level model:
e.g., test score yijk of a pupil i in class j in school k:
yijk = β0 + x>
ijk β + b1j + b2k + εijk
with random intercepts for class (b1j ∼ N(0, σ12 )) and school
(b2k ∼ N(0, σ22 ))
I crossed designs:
e.g., score yij of a subject i on an item j:
yij = β0 + x>
ij β + b1i + b2j + εijk
with random intercepts for subject (b1i ∼ N(0, σ12 ), subject ability)
and item (b2j ∼ N(0, σ22 ), item difficulty)
126 / 398
Likelihood-Based Estimation of Linear Mixed Models
ML-Estimation
I determine ϑ̂ML so that profile likelihood in V of the marginal model
is maximal:
y ∼ N(Xβ, V(ϑ))
1n o
l(β, ϑ) = − log |V(ϑ)| + (y − Xβ)> V(ϑ)−1 (y − Xβ)
2
−1
β(ϑ)
b = arg max (l(β, ϑ)) = X> V(ϑ)−1 X X> V(ϑ)−1 y
β
1n >
o
lP (ϑ) = − log |V(ϑ)| + (y − Xβ(ϑ))
b V(ϑ)−1 (y − Xβ(ϑ))
b
2
→ max
ϑ
128 / 398
Generalized Linear Mixed Models
129 / 398
Generalized Linear Mixed Models
Model:
130 / 398
Caveat: Effect Attenuation in GLMMs
LMM Logit−GLMM
1.00
4
0.75
h(xiβ + b0 i)
0
0.50
−4 0.25
−8 0.00
−2 −1 0 1 2 −2 −1 0 1 2
x
1
For random intercept logit-models: βmar ≈ √ βcond
1+0.346σb2
131 / 398
GLMM Estimation
LMM estimation exploits analytically accessible marginal likelihood:
Z
L(β, ϑ, φ) = L(b, β, φ, ϑ)db
is the density of
For GLMMs:
n
Z !
Y
L(β, ϑ, φ) = f (yi |β, φ, b, ϑ) f (b|ϑ)db
i=1
(...sucks)
132 / 398
GLMM Estimation Algorithms
133 / 398
Mixed Models in a Nutshell
I standard regression models can model only the structure of the
expected values of the response
I mixed models are regression models in which a subset of coefficients
are assumed to be random unknown quantities from a known
distribution instead of fixed unknowns, and this means we can
I model the covariance structure of the data (marginal perspective)
I estimate (a large number of) subject-level coefficients without too
much trouble (conditional perspective)
I random intercepts can be used to model subject-specific differences in
the level of the response
→ grouping variable as a special kind of nominal covariate
I a random slope for a covariate is like an interaction between the
grouping variable and that covariate
→ grouping variable as a special kind of effect modifier for that
covariate
I hard estimation problems: variance components difficult to optimize,
often very high-dim. b
134 / 398
Recap: Linear Models
135 / 398
Splines
I Splines
I piecewise polynomials with smoothness properties at knot locations
I can be embedded into (generalized) linear models (e.g. ML estimates)
I Problem: choice of optimal knot setting.
I Two-fold problem:
I how many knots?
I where to put them?
I two possible solutions, one of them good:
I adaptive knot choice: make no. of knots and their positioning part of
optimization procedure
I penalization: use large number of knots to guarantee sufficient model
capacity, but add a cost (penalty) for wiggliness / complexity to
optimization procedure
135 / 398
Function estimation example: climate reconstruction
0.5
0.0
0.0
−0.5
−0.5
−1.0
0.5
0.0
0.0
−0.5
−0.5
136 / 398
Sensitivity to number of basis functions
5 basis functions 10 basis functions
0.5
0.5
0.0
0.0
−0.5
−0.5
0 500 1000 1500 2000 0 500 1000 1500 2000
0.5
0.0
0.0
−0.5
−0.5
137 / 398
Penalized ML-Estimation
138 / 398
Penalties
I Frequently used: Z
pen(f ) = (f 00 (z))2 dz.
pen(f ) = θ 0 Pθ,
140 / 398
Penalized LS-Estimation
θ̂ = (B0 B + λP)−1 B0 y.
141 / 398
Penalization as Prior
142 / 398
Penalization as Prior
Another perspective:
I penalty/prior encodes assumption about likely differences between
coefficients of adjacent basis functions
I random walk prior: e.g., (θj+1 − θj ) ∼ N(0, λ2 )
I obvious question: what about the first d coefficients, for d-th
differences?
143 / 398
Penalization as Prior
144 / 398
Penalization as Prior
145 / 398
Mixed Model Decomposition of Penalized Terms
Decompose a regularized effect into its penalized (“random”) and
unpenalized (“fixed”) components for better numerical stability & direct
applicability of mixed model algorithms:
Let h = rank(P) and p = dim(θ). Decompose
θ= X̃ β + Z̃ b
p×(p−h) (p−h)×1 p×h p×1
147 / 398
Influence of λ (1st differences)
λ=0.001 λ=1
0.5
0.5
0.0
0.0
−0.5
−0.5
0 500 1000 1500 2000 0 500 1000 1500 2000
λ=1000
0.5
0.0
−0.5
148 / 398
Influence of λ (2nd differences)
λ=0.001 λ=1
0.5
0.5
0.0
0.0
−0.5
−0.5
0 500 1000 1500 2000 0 500 1000 1500 2000
λ=1000
0.5
0.0
−0.5
149 / 398
Optimizing smoothing parameters
150 / 398
Implementation in R
library("mgcv")
formula <- temp ~ s(year, bs = "ps", m = c(2, 2), k = 20)
151 / 398
Implementation in R
0.5
s(year,18.78)
s(year,16.5)
0.0
0.0
−0.5
−0.5
year year
152 / 398
I gam similar syntax to glm:
I s(z) requests a smooth effect of z
I bs="ps" specifies a P-spline basis
I m=c(2,2) controls order of spline (spline order+1, m[1]) and
difference order of penalty (m[2]).
I k=20 controls number of basis functions
I method="REML" to control optimization criteria
153 / 398
I equivalent degrees of freedom (edf) measure complexity of a function
estimate
I for the linear model,
I equivalently:
edf(λ) = tr B(B0 B + λP)−1 B0 .
d ≤ edf(λ) ≤ dim(θ).
154 / 398
Generalized Additive Models
I Generalized Additive Models (GAM) extend generalized linear models
as:
E(y |η) = h(η), η = x0 β + f1 (z1 ) + . . . + fq (zq ).
155 / 398
GAM estimation
156 / 398
I additivity possibly too strong assumption: ignores interactions
I More general:
η = x0 β + f (z1 , . . . , zq )
157 / 398
Surface Estimation
I Two flavors:
I tensor product splines
I radial basis functions: basis functions over (subsets of) R2
158 / 398
Tensor Products
159 / 398
Tensorproducts
I represent surface as
K X
K 0
X
f (z1 , z2 ) = θjk Bjk (z1 , z2 ).
j=1 k=1
160 / 398
161 / 398
162 / 398
163 / 398
Penalty terms for Tensor Product Splines
θ = (θ11 . . . θK 1
.. ..
. .
θ1K . . . θKK )>
164 / 398
I let P1 penalty matrix for spline in z1 . Then
θr P1 θr>
165 / 398
I let P2 penalty matrix for spline in z1 . Then
θc> P2 θc
166 / 398
I in combination we get a penalty term
θ > (λ1 I ⊗ P1 + λ2 P2 ⊗ I) θ.
| {z }
=P(λ1 ,λ2 )
167 / 398
Radial Basis Functions
I For a given knot κ = (κ1 , κ2 ), a radial basis function is defined as
168 / 398
Tensor Product Splines vs. Radial Basis Functions
I tensor products:
I invariant against linear covariable transformations
I allow for combination of covariables on different domains, scales, units.
I allow for anisotropy: different roughness over different axes.
I radial basis functions:
I invariant against rotations of covariate space
I useful for spatial/isotropic effects
I only have a single smoothing parameter
169 / 398
I surface estimates can represent interactions of metric covariates
I how to represent interactions between categorical and metric
covariates?
170 / 398
I model equation
η = . . . + u1 f1 (z1 ) + u2 f2 (z2 ) + . . .
171 / 398
I more generally, varying coefficient effects are written as
η = . . . + uf (z) + . . .
η = . . . + uf (t) + . . .
172 / 398
I model representation: actually a simpler special case of tensor
product bases:
I “basis” matrix for categorical covariate u is a matrix of dummy
variables
I “basis” matrix for linear effect of a metric covariate u is simply u
I in both cases, the design matrix for the varying coefficient term is
created by the tensor product of the effect modifier’s spline basis and
the covariate’s design matrix.
I for f = (f (z1 ), . . . , f (zn ))0 = Bθ, multiplication with u simply means
u · f = diag(u1 , . . . , un )Bθ :
173 / 398
Part III
174 / 398
Multivariate Principal Component Analysis (Review)
176 / 398
Multivariate Principal Component Analysis (Review)
Other representations of functional data
Idea:
I Find normalized weight vectors φk ∈ Rp that maximize the sample
variance of ξik = φ> p
k xi for x1 , . . . , xn ∈ R .
I Identify most important modes of variation in the data
Definition
Given observations x1 , . . . , xn ∈ Rp with zero mean, the first principal
component (PC) is defined by
1 X n > 2
φ1 = arg max φ xi .
kφk2 =1 n i=1
φ>
j φk = 0 for all j < k.
176 / 398
Multivariate Principal Component Analysis (Review)
Other representations of functional data
Remarks:
I Principal components φk ∈ Rp have same length/structure as the
data
I Normalizing restriction kφk k2 = 1 makes sure that PCs are well
defined, but no unique specification (for any solution φ the vector
−φ is a solution, too)
I Orthogonality constraint φ>j φk = 0 ensures that φk indicates a new
mode of variation that is not explained by the preceding components
φ1 , . . . , φk−1
I PCA can be seen as a dimension reduction tool: use
ξi = (ξi1 , . . . , ξiK ) K p
instead of the observed values xi = (xi1 , . . . , xip ) .
177 / 398
Multivariate Principal Component Analysis (Review)
Other representations of functional data
Alternative characterization:
Theorem (Multivariate PCA as eigenanalysis problem)
Let V = n1 X> X be the sample covariance matrix of (mean-centered)
x1 , . . . , xn with eigenvalues ν1 ≥ ν2 ≥ . . . ≥ νm > 0.
Then
n
1 X > 2
φ1 = arg max φ xi = arg max φ> V φ
kφk2 =1 n kφk2 =1
i=1
Vφ1 = ν1 φ1 , kφ1 k = 1.
Vφk = νk φk k = 1, . . . , m.
178 / 398
Multivariate Principal Component Analysis (Review)
Other representations of functional data
Remarks:
I Eigenanalysis of V is equivalent to singular value decomposition
(SVD) of X
I computationally vastly more efficient
I usually only need to compute first few leading singular vectors and
values
I very efficient algos using e.g. random projections, for sparse matrices
I variants for partially observed data (→ simple recommender systems)
I The eigenvalues ν1 , . . . , νm describe the amount of variability
explained by their principal components.
I Proportion of variability explained by the k-th principal component
νk
0 < Pm ≤ 1.
j=1 νj
179 / 398
Multivariate Principal Component Analysis (Review)
180 / 398
Functional Principal Component Analysis
Definition
Basic setting:
I Data generating process: smooth random function X (t) with
I unknown mean function µX (t) = E[X (t)]
I unknown covariance function vX (s, t) = Cov(X (s), X (t))
I Observed functions: x1 (t), . . . , xn (t)
I often given on a set of sampling points t1 , . . . , tp
I preprocessing, smoothing
I For simplicity, assume that the functions are centered, i.e.
n
1X
µ̂X (t) = xi (t) ≡ 0
n
i=1
180 / 398
Functional Principal Component Analysis
Definition
girls
boys
180
39 boys, 54 girls
160
I p = 31 measurements
140
Height
child)
I measurements not equally
80
spaced
60
5 10 15
Age
181 / 398
Functional Principal Component Analysis
Definition
1 1
φ2 (t)dt
R
induces norm kφk = hφ, φi 2 = 2
182 / 398
Functional Principal Component Analysis
Definition
183 / 398
Functional Principal Component Analysis
Definition
Remarks:
I The definition of the functional principal components is equivalent to
the multivariate case:
I vectors xi → functions xPi (t)
p R
I scalar product hxi , φi = j=1 xij φj → hxi , φi = xi (t)φ(t)dt
I Functional principal components φk (t) are again only defined up to a
sign change
I One can show that φk is the k-th eigenfunction of the sample
covariance operator
1 Xn
Z
(Vf )(s) = xi (s)xi (t) ·f (t)dt.
|n
i=1
{z }
v̂X (s,t)
# functional PCA
growth.pca <- pca.fd(growth.fd, nharm = 2, centerfns = T)
186 / 398
Functional Principal Component Analysis
Example: FPCA
Berkeley growth study: Principal components
0.4
0.2
0.0
−0.2
−0.4
PC 1
PC 2
5 10 15
Age
187 / 398
Functional Principal Component Analysis
Example: FPCA
Interpretation:
I Karhunen-Loève representation:
∞
X
xi (t) = µX (t) + ξik φk (t)
k=1
The score ξik describes the weight of the k-th principal component
for the i-th observation.
I Effect of the k-th principal component:
Assume new scores
(
cc1 c2 k = 112
ξ˜k =
0 otherwise.
200
++++++++++++++
++++++++++++
++++++
180
180
−−−−−−−−−−
++++ −−−−−−−−−
++++ −−−−−
+
++ −−−−
++++ ++
++
++
++
+−
−
+
+−
+
++
++++ ++
+++ − − ++
++++++++++++++++++++++++
160
160
++++ ++++ −−−
++++
++++ −−−−−−−−−−−
−−−−−−−−−−−−−−−− ++++ −−−−
++ −−−−−−−− ++++ −−−−
+
++++
+
−−−−−−− +++++ −−−−−−−−−
++++ −−−−− +
140
140
+ −− +
+++−−−−−−−−−
++ −
−−−−−
− +++++
Height
Height
+++ +++++ −−−−−−
++++ −−−−− +++++ −−−−−
++++ −−−−−− +++++ −−−−−
++++ −−−−− + −−−
120
120
++++ −−−−−−−−−− +
+
+ −− −
++
+ ++++−−−−−
++++ −−−−−
− ++++−−−−−
++++ −−−−− ++++−−−−−
+++ −−−− −−−−−
+
++
+−
+++ −−−− + −
100
100
++−
+−−−
+++ −−−−−−−
+ +−
+−
+
+++ −−−−
− ++−−+
++−−
++ −−−− ++
+−−−−
++ −−−− +−
−−
+++
−
++−−−−
80
80
−
+
− −+−
+−
−−
60
60
5 10 15 5 10 15
Age Age
√
I Effect shows µ̂X ± 2 νk φk .
I PC 1 as individual growth effect
189 / 398
Functional Principal Component Analysis
Example: FPCA
Dimension reduction:
I Use K -dimensional individual score vectors
ξi = (ξi1 , . . . , ξiK )
190 / 398
Functional Principal Component Analysis
Example: FPCA
Berkeley growth study: Principal component scores
●
●
girls
20
●
●
boys
● ●
●
●
●
●
● ● ●
●
●●●
10
● ●
●
●
● ● ● ●
●● ●
Score PC 2
●
●
●
● ● ●● ●
● ●
●
●
● ● ● ● ●
●
●
0
● ●
● ● ●
●
●
●
● ● ●
●
●
●
●
● ●
● ●
● ●
● ●
●
−10
●
● ● ● ● ●
● ● ●
●●
●
●●
● ● ● ●
● ●
●
● ●
Score PC 1
I PC 2 as genderspecific effect
191 / 398
Idea
Aim at smooth functional principal components
for better interpretation.
193 / 398
Influence of α:
α = 0.1 α = 10 α = 1000
Effect of PC 1 Effect of PC 1 Effect of PC 1
200
200
200
+++++
+++++
+++++++ +++++
++++++++++++++++++++++ +++++++++ +++++
+++++++ +++++++ +++++
180
180
180
+++++ ++++++ +++++
++++ ++++++ ++++
++++ +++++ +
++
++++ +++++ ++++
++++
++++ +++++ ++++ −−−−−−
−−−−−−
160
160
160
+
+
++++ ++++
+ −− ++++ −−−−−−
++++ −−−−−−−−−−−−−−−−−−−−−− ++++ −−−−−−−−−−−− ++++ −−−−−−
+
++++
++
−−−−−−−−
−−− ++++ −−−−−−−−−− ++++
++++ −−−−−−− ++++ −−−−−−−− ++++ −−−−−−
++++ ++++ −−−−−−− ++++ −−−−−−
−−−−−− −−−−−− −−−−−−
140
140
140
++++ −−−−− ++++ −−−−−− ++++ −−−−−−
++++ −−−−− ++++ −−−−−− ++++
Height
Height
Height
+
+ −−−
−
++++ −−−−− ++++ −−−−− +
+++ −
−−−
++++ −−−−− ++++ −−−−− ++++ −−−−−−
+
++
+
+ −−−−
−−− ++
++
−−−−
−− ++++
++++ −−−−−−
−−−−−
120
120
120
++++ −−−−−−−−−− ++++ −−−−−−−−− ++++ −−−−−
++++ −− ++++ −−− ++++ −−−−−−−−−−
++++ −−−−−− ++++ −−−−− ++++
++++ −−−− ++++ −−−− ++++ −−−−−−−−−
−
+++ −−−−− +++ −−−− ++++
100
100
100
+++ −−−− +++ −−−− ++++ −−−−−−−−−−
−
+++ −−−−− +++ −−−− ++
+
+++ −−−−−−− +++ −−−− ++ −−
++++ −−−−−−−−−
++ +++ −−−−−−−
+
+ −− − +
++ −−−−− −−−
+++−−−−− −−−−−
80
80
80
−−
−
−−− −−−−
60
60
60
5 10 15 5 10 15 5 10 15
200
200
−−− −−
−−−−− −−−−−
180
180
180
−−−−−−−−−−−
−−−−−−−− −−−−− −−−−− ++
−−−−− −−−−− −−−−− ++++++
−−−− −−−−− −−−−−+++++++
+−
+− +−
−−−− −−−−−+++++++
+++++++++− ++++++
++++++ −−− +++++++++++++++++++++++ +−
+−
+−
+− ++++++++++++++++++++++++++++ −−−−−+++++
160
160
160
+−
++++ −−− ++
++++++−−
+−
++− +−
+−
+−
−−
−
+
−
−
+
+
−++++++
+−
++++ −−−− ++
+++ −− −−
+
+++
+−
+−
+
++++ −−−−−−− +++++ −−−−− +−
+− +−
+−
+−
++
+ +++++ −−−− −+
++ +−
−− +−
+−
+++ −−−− ++++ −−−− +−
+−
140
140
140
+−
++++ −−−−−−− ++++ −−−−− +−
+++−
+−
+− +−
+−
+++++ −−−−−− ++++ −−−−
Height
Height
Height
++
+−−−−−
+++++−−−−−− ++++ −−−−− +−
++−
+−
+−
+−
+++++−−−−− ++++ −−−−− +++++++
−−−−
+++++−−−−− ++++ −−−−− +++−−−−−
++++−−−−−−−−
120
120
120
++++−−−−−−− + ++++++−−−
++++
++− −− ++
+−−−−−
+++−
++
+++++ −−−−−
+++− −−−−− ++++ −− +++++ −−−−
++
+−−−−− − ++ −−−− +++++ −−−−−
+−
+−
+−
+− +−
+−
+− −−−− +++++ −−−−
100
100
+−
100
+++− +++− +++++ −−−−
+−
+−
+− −−−−
++ ++
+−
+− −−−− +++++ −−−−−
+−
+−
+− ++
+−
+−
+−
+− +++++ −−−−−−−−
++
+−
+− −− +−
+−
+− −− −−−−
+−
+− +−
+−
+− −−−−
80
+−
80
80
+− +−
−
+−
−
60
60
60
5 10 15 5 10 15 5 10 15
195 / 398
Smoothing the covariance surface
196 / 398
R-Code: FACE & SSVD
library(refund)
growthmat <- rbind(t(growth$hgtm), t(growth$hgtf))
growth_face <- fpca.face(growthmat, argvals = growth$age, knots = 25, npc = 2)
growth_ssvd <- fpca.ssvd(growthmat, argvals = growth$age, npc = 2)
FACE SSVD
0.3
0.3
0.2
0.2
0.1
0.1
0.0
0.0
−0.1
−0.1
−0.2
−0.2
PC 1 PC 1
−0.3
PC 2 −0.3 PC 2
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Age Age
197 / 398
Multivariate Principal Component Analysis (Review)
198 / 398
Functional PCA for Sparse Functional Data
Sparse Functional Data
198 / 398
Functional PCA for Sparse Functional Data
Sparse Functional Data
●
I Observations per child:
160
●
●
●
Ti ∈ {2, . . . , 6}, median of 4
140
●
Height
●
●
●
I
120
●●
●
random children
80
●
● ●
5 10 15 dataset: 2883)
Age
199 / 398
Functional PCA for Sparse Functional Data
Borrowing Strength across Observations
Basic setting:
I Use observation points directly without previous smoothing
I Account for additional measurement errors εij ∼ N (0, σ 2 )
I Model:
X∞
yij = xi (tij ) + εij = µX (tij ) + ξik φk (tij ) +εij
| {zk=1 }
Karhunen-Loève
200 / 398
Functional PCA for Sparse Functional Data
Borrowing Strength across Observations
Estimation:
I Estimate mean and covariance functions using the pooled data
(”borrowing strength”)
I µ̂(t): by local linear smoother (or splines)
I v̂ (s, t) and σ̂ 2 :
Cov(yij , yil ) = Cov(xi (tij ), xi (til )) + σ 2 δjl = vX (tij , til ) + σ 2 δjl
● ● ● ●
0.01 ● ● ● ●
2 0.0
12
● ● ●
0.0
● ● ● ● 08 ●
0.0
● ●
0.01
08
●
● ●●
0.01 ● ● ●
●
●
●
●
●
●
● ●
● ●
4 ● ●
●
●
● ● ●
●
● ● ● ● ● ●
● ● ●
● ●
● ● ●
● ●
● ● ● ●
● ●
0.01
20
20
●
●
●
●
● ●
● ●
4
0.0
0.020
1
● ● ● ●● ● ● ●
● ● ● ● ● ● ●● ●● ● ● ● ●
●
● ●
●● ● ●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
● ●
● ● ●
● ●
0.01 ● ●
●
●
●
●
●
● ●
● ● ● ● ● ●
● ● ● ●
6 ●
●
●
● ●
●
●
●
● ●
0.01 ●
●
●
●
● ●
● ● ● ● ●
●
●●
●
●
●●
● ●
●
● ●● 6 ●
● ●●
● ● ●● ● ● ● ●
● ● ● ●●●● ● ●●● ● ● ●●● ●
● ● ● ●
● ●● ●
● ● ● ● ● ●● ●
● ● ●
● ● ● ●
● ● ●
● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ● ● ●
●●● ● ●● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
●
Age
0. ● ●
● ● ●
15
● ●● ●
15
● ● ● ●● ●
01 ● ● ●●
●
●
●●
8 0.015
●● ●
●● ●● ● ● ●● ●● ●
● ● ● ● ● ●
● ● ● ●● ● ● ● ● ● ●● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●● ● ● ● ● ●
● ● ● ● ● ● ●
● ●●● ●● ●● ● ● ● ● ● ●● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ●● ● ●
● ● ●
●
● ● ●● ●
●●●●
● ●
●
●
●● ● ●
●
● 0.0 ●
●●●●
●
● ●● ● ● ●
●
●
● ● ●
● ●
● ●
●
● ●
18 ● ● ● ●
●
● ●
●
●
● ● ● ●● ● ● ●● ● ●
● ● ● ●● ● ● ●● ●●
●
● ●
●
●●
● ● ● ●
● ● ● ● ● ●
● ●
●● ●
●
● ●● ● ● ● ● ● ●
●● ● ●
● ● ● ● ●
● ●● ●
● ●
● ● ● ● ●
● ● ● ●● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●● ● ●
●
● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ●●● ● ●● ● ● ● ● ●
● ●●●
●
● ●● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ● ● ●
● ●
0. ● ●
●●
● ●
●●
● ●
●●
●
● 02 ●●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
● ● ●● ● ● ●
● ● ● ● ●
● ● ●
●
● ● ● ●
●
●● ● ●
0.0 ● ● ● ●● ● ●
● ● ●
2 ● ●
●
10
● ● ● ●
10
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
0.010
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ●
● ● ●
● ●● ● ● ● ●●
● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●●● ● ● ● ● ●●
● ●●● ● ● ● ●●
● ● ● ● ●
●
●● ● ● ● ● ● ●
● ● ● ● ●
●● ● ● ● ●
●● ● ● ●
● ●
● ● ● ●
● ●● ● ● ● ●
● ● ● ● ● ● ● ● ● ●●● ● ● ● ●
● ● ● ● ● ● ●
● ●●
● ● ● ●●
● ● ● ●
● ● ● ● ● ● ●
●● ●● ● ● ● ●
● ● ● ●●
● ● ● ● ●
●● 0.0 ● ● ●
●
●
●
●
●
● 0.0 ●
● ● ●
●
●
● ●
●
●
●
●
● ●
22 ●
●
●
●
22 ●
● ● ●
● ● ● ● ● ●
● ● ● ●
● ●
● ● ● ● ● ●
●
● ● ●
● ● ●
● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ● ●
● ●● ● ● ●
● ● ● ● ●
● ● ● ●
●
●
● ● ●
● ● ●
10 15 20 10 15 20
Age Age
0.00
0.00
Scaled eigenfunctions
−0.05
−0.05
−0.10
−0.10
100% 97.3%
0.000458% 1.15%
4.19e−06% 0.764%
−0.15
−0.15
3.24e−06% 0.268%
5 10 15 20 25 5 10 15 20 25
Age Age
Figure 2: Above: Covariance function estimates Ĉsq (left) and Ĉtr (right) for the cortical
thickness data, with locations (tij1 , tij2 ) of the values (6) that are smoothed to produce
these estimates; thus only points with tij1 > tij2 are included at right. Below: Scaled
(P. T. Reiss and Xu 2018; Cederbaum, Scheipl, et al. 2018)
√
I surface estimate under positive-definiteness constraint!
eigenfunction estimates λ̂ φ̂ , for j = 1, 2, 3, 4, for the resulting estimated covariance j j 202 / 398
Functional PCA for Sparse Functional Data
Borrowing Strength across Observations
Berkeley Growth Study:
Mean estimation based on pooled data
200
●
●
●
180
● ●
● ● ● ● ● ●
● ●
● ● ●
● ● ● ●
● ● ●
●
● ●
● ● ●
● ●
● ●
● ● ●
● ● ● ●
● ●
● ● ●
● ● ●
● ● ● ●
●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
160 ●
● ● ● ●
●
● ●
● ● ● ●
● ●
● ● ● ● ● ●
● ●
● ●
● ●
● ●
● ● ● ●
●
● ● ●
● ● ● ●
●
● ● ● ● ● ●
● ●
● ● ●
● ● ●
●
● ● ●
● ● ●
●
● ● ● ● ● ●
● ●
● ●
● ● ●
● ● ● ●
140
● ● ● ●
● ● ● ●
●
●
● ● ● ●
● ● ● ● ●
Height
● ● ● ● ●
●
● ●
● ●
● ●
●
● ●
● ● ● ●
● ●
●
● ●
● ● ● ●
● ●
●
120
● ●
● ●
● ●
●
● ●
● ● ●
●
● ●
●
● ●
● ●
● ●
●
●
● ●
●
100
●
● ●
●
● ●
●
●
●
●
● ●
●
●
●
●●
● ●
●●
●
●
●●
80
●
●●●
●●
●●
●● ●
●●●
●
●●
●
●
60
5 10 15
Age
203 / 398
Functional PCA for Sparse Functional Data
Borrowing Strength across Observations
Berkeley Growth Study: Sparsified Data
Covariance estimation based on pooled data (raw & smoothed):
15
15
10
10
Age
Age
5
5
5 10 15 5 10 15
Age Age
I Notation:
yi = (yi1 , . . . , yiTi ) , µi = (µX (ti1 ), . . . , µX (tiTi ))
φik = (φk (ti1 ), . . . , φk (tiTi )) T = {tij , i = 1, . . . , n, j = 1, . . . , Ti }
I Intuition: ξ˜ik is the best prediction of the true score ξik given the
observed values ỹi and the pooled information from all observation
points (T).
205 / 398
Functional PCA for Sparse Functional Data
Example
206 / 398
Functional PCA for Sparse Functional Data
Example
Berkeley growth study: Estimated principal components
Effect of PC 1 Effect of PC 2
200
200
−−
−−
180
180
−−−
−−− +
+−
−− ++−
+ + −−
+ −−
−
+− +−
+−
−− −
160
160
− −
+ + +++++++ + +−
− +
−− ++ +−+−
− + + −
− + +−
140
140
−
+ ++−
Height
Height
− + −
+ ++
− −
−
+ ++ +
−
120
120
− ++ +
−
− + −
+
+ −
+
100
100
− +
+ −
+
− +
−−+ +−
−−
+
−− ++
80
80
++ +−
−
++
60
60
5 10 15 5 10 15
Age Age
10
●
boys ●
●
● ●
●
●
●● ●
5
●
●
● ●
● ●
● ●
● ● ●
● ● ●
● ● ●●
● ● ●●
Score PC 2
● ● ●
0
●
● ● ● ●
●●
● ● ●
●
● ●
● ● ● ●● ● ●● ●
●
●● ●
● ● ●
● ●
−5
● ●●
● ● ●
● ●
● ●
−10
● ●
●
● ●
● ● ● ●
●
●
−15
Score PC 1
209 / 398
Summary
Functional PCA:
I Directly extends multivariate PCA to functional data
I Dimension reduction technique
I ”optimal” low-rank basis representation for given data:
most variance explained with smallest K
I eigenvalue decay gives indication of inherent complexity of the data
I ”low-pass” filter via truncated FPC basis representations
I clustering
I Key tool for exploring functional data and further analyses
Iclustering, anomaly detection, ...
Isupervised learning with functional features
⇒ simply use FPC scores ξi as scalar feature vectors.
I Penalized, smoothed versions often show clearer effects and facilitate
interpretation
I Phase variation can be a problem: slow eigenvalue decay
210 / 398
900
200
800
Extensions: 150
1000
800
700
Spot prices
600 600
400
“functional fragment” data: via (low-rank) matrix completion500
400 5
methods 50
300
10
15
10
15 s
5
t
20 200
(Descary and Panaretos
0 2018, e.g.) 20
5 10 15 20
Hours
(c) Subsample of the original dataset (d) Fragmented subsample (d=0.5)
200 200
150 150
Spot prices
Spot prices
100 100
50 50
0 0
5 10 15 20 5 10 15 20
Hours Hours
(e) (f) ~K
Fragmented and discretized subsample (d= 0.5) Rn
900
200
800
1000
150 800
700
Spot prices
600 600
400
100 200
500
0
5 400
10 5
50
10 300
15
t
15 s
20 20 200
0
5 10 15 20
Hours 211 / 398
Part IV
Background: Boosting
212 / 398
Introduction
Implementation
Introduction
Motivation
Why boosting?
Implementation
214 / 398
Aims and scope
f ∗ (x) = β0 + β1 x1 + · · · + βp xp (GLM) or
∗
f (x) = β0 + f1 (x1 ) + · · · + fp (xp ) (GAM)
⇒ f ∗ should be interpretable
214 / 398
Example 1 - Birth weight data
215 / 398
Birth weight data (2)
www.yourultrasound.com, www.fetalultrasoundutah.com
216 / 398
Example 2 - Breast cancer data
217 / 398
Classical modeling approaches
I Classical approach to obtain predictions from birth weight data and breast
cancer data:
Fit additive regression models (Gaussian regression, Cox regression) using
maximum likelihood (ML) estimation
I Example: Additive Gaussian model with smooth effects (represented by
P-splines) for birth weight data
⇒ f ∗ (x) = β0 + f1 (x1 ) + · · · + fp (xp )
abdominal volume biparietal diameter
200
60
100
40
f(volabdo)
f(bpd)
20
0
−100
0
−200
−20
volabdo bpd
218 / 398
Problems with ML estimation
219 / 398
Boosting - General properties
220 / 398
Why boosting?
221 / 398
Introduction
Implementation
222 / 398
Gradient boosting - estimation problem
222 / 398
Gradient boosting - estimation problem (2)
Pn
I Example: R(f ) = n1 i=1 (yi − f (xi ))2 corresponds to minimizing the
expected squared error loss
I optimization over a function space =⇒ we’re in trouble...
223 / 398
Naive functional gradient descent (FGD)
224 / 398
Naive functional gradient descent (2)
(Very) simple example: n = 2, y1 = y2 = 0, ρ = squared error loss
1
(f(1) − 0)2 + (f(2) − 0)2
=⇒ R(f ) =
2
∂R ˆ[m−1] [m−1]
=⇒ (f ) = fˆ(i)
∂f(i) (i)
z = f12 + f22
3
10 10 14
14 12 16
16 12
2
6
1
2
f2
0
−1
−2
16 14 12 16
−3
10 10 12 14
−3 −2 −1 0 1 2 3
f1
225 / 398
Naive functional gradient descent (3)
226 / 398
Componentwise Gradient Boosting
227 / 398
Componentwise Gradient Boosting (2)
Functional gradient descent (FGD) boosting algorithm:
[0]
1. Initialize the n-dimensional vector f̂ with some offset values (e.g., ȳ ).
Set m = 0 and specify the set of base-learners.
Denote the number of base-learners by p̃.
2. Increase m by 1.
∂
Compute the negative gradient − ∂f ρ(Y , f ) and
ˆ
evaluate at f [m−1]
(xi ), i = 1, . . . , n.
This yields the negative gradient vector
[m−1] ∂
u = − ρ(y , f )
∂f y =yi ,f =fˆ[m−1] (xi ) i=1,...,n
..
.
228 / 398
Componentwise Gradient Boosting (3)
..
.
..
.
229 / 398
Componentwise Gradient Boosting (4)
..
.
[m] [m−1]
4. Update f̂ = f̂ + ν û[m−1] , where 0 < ν ≤ 1 is a real-valued step
length factor.
5. Iterate Steps 2 - 4 until m = mstop .
230 / 398
Simple example
I In case of Gaussian regression, gradient boosting is equivalent to iteratively
re-fitting the residuals of the model.
I use a B-spline basis with 20 basis functions and ridge penalty as base-learner:
0.10 ●
0.10 ●
Residuals
● ●● ●
● ● ● ● ● ● ●● ●
● ● ● ● ●
● ● ●● ● ● ● ●● ●
● ● ●● ● ● ● ●● ●
●● ●● ● ● ●● ●● ● ●
● ● ●● ● ●● ● ● ● ● ●● ● ●● ● ●
y
0.00 ●
● ● ●● 0.00 ●
● ● ●●
● ●●●● ● ● ●●●● ●
● ●● ● ●●
● ● ● ●● ● ●●
● ●
●●● ● ● ●
● ● ●●● ● ● ●
● ●
● ●● ● ●● ● ● ●● ● ●● ●
● ●● ● ● ● ●● ● ●
● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ●
●● ● ● ●● ● ●
● ●
● ● ● ● ● ●
●● ● ●● ●
● ●
−0.05 ● ● ● −0.05 ● ● ●
●● ● ● ●● ● ●
● ●
● ● ● ●
● ●
● ●
● ●
−0.10 ● −0.10 ●
−0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2
x x
0.10 ●
0.10 ●
● ●● ●
● ● ● ● ● ● ●● ●
● ● ● ● ●
● ● ●● ● ● ● ●● ●
●
●● ●●
● ● ●● ●
● ●●
●
●● ●
●
●● ● ●
●
●
●● ●●
● ● ●● ●
● ●●
●
●● ●
●
●● ● ●
●
231 / 398
y
0.00 ●
● 0.00 ●
●
Properties of gradient boosting
232 / 398
Properties of gradient boosting
I gradient boosting can optimize any loss function via a series of simple LS
steps
=⇒ huge flexibility, scales well
I local linear approximation to loss surface good enough for ν 1
I The step length factor ν could be chosen adaptively. Legend has it that
adaptive strategies do not improve the estimates of f ∗ and lead to an
increase in running time
=⇒ set ν small (ν = 0.1) but fixed.
Fixed ν also required for (unbiased) variable selection, easy tuning.
233 / 398
Introduction
Implementation
234 / 398
Gradient boosting with early stopping
234 / 398
Illustration of variable selection and early stopping
I Very simple example: 3 predictor variables x1 , x2 , x3 ,
[m]
3 linear base-learners with coefficient estimates β̂j , j = 1, 2, 3
I Assume that mstop = 5
I Assume that x1 was selected in iteration 1, 2, 5
I Assume that x3 was selected in iteration 3 & 4
[mstop ] [0]
f̂ = f̂ + νû[0] + νû[1] + νû[2] + νû[3] + νû[4]
[0] [0] [0] [1] [1]
= β̂0 + ν β̂0 + β̂1 x1 + ν β̂0 + β̂1 x1 +
[2] [2] [3] [3] [4] [4]
ν β̂0 + β̂3 x3 + ν β̂0 + β̂3 x3 + ν β̂0 + β̂1 x1
[1] [4] [2] [3]
= β̂0∗ + ν β̂1 + β̂1 x1 + ν β̂3 + β̂3 x3
236 / 398
Shrinkage
I Early stopping will not only result in sparse solutions but will also lead to
shrunken effect estimates (→ only a small fraction of û is added to the
estimates in each iteration).
I Shrinkage leads to a downward bias (in absolute value) but to a smaller
variance of the effect estimates (similar to Lasso or Ridge regression).
⇒ Multicollinearity problems are addressed.
237 / 398
Variable selection: complications & improvements
238 / 398
Introduction
Implementation
239 / 398
mboost
Package mboost:
I baselearners: linear, (tensor product) splines, trees, radial basis
functions, random effects, ...
I wide variety of loss functions
I parallelized cross validation, stability selection
I computationally fairly efficient: sparse matrix algebra, index
compression, array arithmetic for tensor product designs (I. D. Currie
et al. 2006)
I ... but creates huge model objects ...
(Hothorn, Buehlmann, et al. 2018)
239 / 398
gamboostLSS
Package gamboostLSS :
I extensions to models with multiple additive predictors
I loss is a negative log-likelihood, additive predictors for different
distribution parameters
I e.g.: model conditional variances and means
I e.g.: bivariate Poisson for modeling soccer scores (Groll et al. 2018):
model rates (attacking strengths) and association (tactic effects).
I mboost as computational engine
(Mayr et al. 2012)
240 / 398
Summary
241 / 398
Part V
242 / 398
Introduction
Model
Covariate Effects
Effect Representation
Applications
Introduction
Motivation
Framework
Model
Covariate Effects
Effect Representation
Applications
244 / 398
Functional Data
244 / 398
Functional Data
Tissue type
Corticalis
60
Nerve
S.Gland
Spongiosa
50
Diffuse Reflectance [%]
40
30
20
10
0
3500
3500
2500
2500
Total CD4 Cell Count
1500
500
500
0
0
-20 -10 0 10 20 30 40 -20 -1
Months since seroconversion
1.5 1.6
Sampled Data
!^ CI)
!^ CI)
Full Data
246 / 398
Structured Functional Data
247 / 398
Structured Functional Data: Longitudinal
Patient B
0.8 0.7
Fractional Anisotropy
0.4 0.5 0.6
visit
0.3
1 4
2 5
3 6
0.2
0 20 40 60 80 100 120
248 / 398
Structured Functional Data: Spatial
1.0 0.5
●
log(Precipitation)
●
0.0
● ●
●
●
−0.5
●
●
● ● ● Arctic
● ●●
● Atlantic
−1.0
● ●
● ● Continental
● ● ●
●
●
● ● Pacific
●●● ●
●
● ●
●● 2 4 6 8 10 12
Months
249 / 398
Non-Gaussian Functional Data
250 / 398
Application: The Piggy Panopticon
PIGWISE Project: RFID surveillance of pig behaviour (Maselyne et al. 2014)
I measure proximity to trough (yes-no) every 10 secs over 102 days for
100 pigs
I additionally: humidity, temperature over time
⇒ models of feeding behaviour potentially useful for ethology (porcine
sociology, clustering) & quality control (disease, quality of feed stock)
pig 57
100
60
day
20
251 / 398
Functional Data Analysis
252 / 398
Aims and Means
253 / 398
Key Idea
254 / 398
Introduction
Model
Generic Framework
Generalized Functional Additive Mixed Models
Covariate Effects
Effect Representation
Applications
255 / 398
Generic Additive Regression Model
Observations (Yi , Xi ), i = 1, . . . , N, with
I Yi a functional (scalar) response over interval T = [a, b], [t, t]
I Xi a set of scalar and/or functional covariates.
RS
:
R u(t)
I constrained effect of functional covariate l(t) x(s)β(s, t) ds
Rt Rt
0 : ; t−δ :
258 / 398
Generalized Functional Additive Mixed Models
Structured additive regression models of the general form
259 / 398
Application: Model
Model feeding rate
I for each day i = 1, . . . , 102 for a single pig
I as smooth function over time t
Response: binary feeding indicators ỹi (t) summed over 10min intervals
Z t
y (t) ∝ ỹi (s)ds
t−10min
Model
Covariate Effects
Motivation
Recap: Penalized Splines
Effect Representation
Applications
261 / 398
Covariate Effects: Examples
Xr fr (Xr , t)
∅ functional intercept β0 (t)
R u(t)
humidity hum(t) linear functional effect l(t) hum(s)β(s, t)ds;
R u(t)
smooth functional effect l(t) f (hum(s), s, t)ds;
concurrent effects f (hum(t), t) or hum(t)βh (t)
R t−1
yi (t) (auto-regressive) functional effects t−δ y (s)β(s, t)ds;
lagged effects f (yi (t − δ), t) or yi (t − δ)β(t)
hum(t), temp(t) concurrent interaction effects
e.g., f (hum(t), temp(t), t) or temp(t)β(hum(t))
scalar covariate i aging effect iβ(t) or f (i, t)
(day indicator) functional random intercepts bi (t)
261 / 398
Spline regression
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Unpenalized Splines
1.0
1.0
0.5
0.5
0.5
0.0
0.0
0.0
−0.5
−0.5
−0.5
−1.0
−1.0
−1.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
1.0
1.0
0.5
0.5
0.5
0.0
0.0
0.0
−1.0 −0.5
−1.0 −0.5
−1.0 −0.5
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Penalized Splines
Use very flexible basis and add penalization for excessively wiggly fits:
=⇒ trade-off between goodness-of-fit and simplicity/generalizability
λ = 1e−04 λ=1
● ●
● ●
●● ● ●● ●
1.0
1.0
● ● ●●● ● ● ● ●●● ●
●● ● ● ●●● ● ●● ● ● ●●● ●
●● ● ● ●●
● ●● ●● ● ● ●●
● ●●
● ●
● ● ● ●
0.5
0.5
● ●●●● ● ●●●●
● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ●
● ● ●● ● ● ●●
●●● ●●●
● ●
y
y
0.0
0.0
● ● ● ● ● ●
● ●● ●
●●● ● ●● ●
●●●
●●
●●
● ●● ●●
●●
● ●●
●● ● ●● ●● ● ●●
● ●
●● ●
●
● ● ●
●● ●● ●
●
● ● ●
●●
●● ●● ●● ● ● ●● ●● ●● ● ●
● ●● ●● ●
●● ●● ● ●● ●● ●
●● ●●
● ●● ● ●●
−1.0
−1.0
● ● ● ●● ●
●● ● ● ● ●● ●
●●
●● ●● ●● ●●
● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
λ = 1000 λ = λGCV
● ●
● ●
●● ● ●● ●
1.0
1.0
● ● ●●● ● ● ● ●●● ●
●● ● ● ●●● ● ●● ● ● ●●● ●
●● ● ● ●●
● ●● ●● ● ● ●●
● ●●
● ●
● ● ● ●
0.5
0.5
● ●●●● ● ●●●●
● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ●
● ● ●● ● ● ●●
●●● ●●●
● ●
y
y
0.0
● ● ● 0.0 ● ● ●
● ●● ●
●●● ● ●● ●
●●●
●●
●●
● ●● ●●
●●
● ●●
●● ● ●● ●● ● ●●
● ●
●● ●
●
● ● ●
●● ●● ●
●
● ● ●
●●
●● ●● ●● ● ● ●● ●● ●● ● ●
●● ●●
● ● ●
● ●● ●●
● ● ●
●
● ● ● ●
● ●● ● ●●
−1.0
−1.0
● ● ● ●● ●
●● ● ● ● ●● ●
●●
●● ●● ●● ●●
● ●
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
Tensor Product Splines
Source:http://www.web-spline.de/web-method/b-splines/tpspline.png
Introduction
Model
Covariate Effects
Effect Representation
Tensor Product Representation
Penalization
Spline Bases & Penalties
FPC Bases & Penalties
Applications
266 / 398
Effect Representation
266 / 398
Effect Representation
267 / 398
Tensor Product Representation and Penalization
pen(θr |λtr , λxr ) = θrT (λxr Pxr ⊗ IKt + λtr IKx ⊗ Ptr )θr
= θrT Pr (λtr , λxr )θr .
=⇒ very flexible:
I Combine any basis & penalty for Xr with any basis & penalty for t!
→ Huge variety available in pffr() via interface to mgcv.
I Penalization parameters λtr , λxr separately control the relative
complexity of effects over the functional domain and the covariate
space, respectively.
268 / 398
Tensor Product Basis & Kronecker Sum Penalties
P ⊗ I, I ⊗ P: repeated penalties that apply to each subvector of θ
associated with a specific marginal basis function (Wood 2006a):
⊥
1 ·z1 ⊥1 ·F1 ∇1 ·z1 ∇1 ·F1 n1 ·z1 n1 ·F1
⊥ ·z ⊥1 ·F2 ∇1 ·z2 ∇1 ·F2 n1 ·z2 n1 ·F2
⊥11 ·z32 ⊥1 ·F3 ∇1 ·z3 ∇1 ·F3 n1 ·z3 n1 ·F3
⊥2 ·z1 ⊥2 ·F1 ∇2 ·z1 ∇2 ·F1 n2 ·z1 n2 ·F1
⊥2 ·z2 ⊥2 ·F2 ∇2 ·z2 ∇2 ·F2 n2 ·z2 n2 ·F2
!
⊥1 ∇1 n1
z1 F1
⊥2 ∇2 ⊥2 ·z3 ⊥2 ·F3 ∇2 ·z3 ∇2 ·F3 n2 ·z3 n2 ·F3
Bxr ⊗ Btr = n2
⊗ =
z2 F2
⊥3 ∇3 n3 ⊥3 ·z1 ⊥3 ·F1 ∇3 ·z1 ∇3 ·F1 n3 ·z1 n3 ·F1
z3 F3
⊥4 ∇4 n4
⊥ ·z ⊥3 ·F2 ∇3 ·z2 ∇3 ·F2 n3 ·z2 n3 ·F2
3 2
⊥3 ·z3 ⊥3 ·F3 ∇3 ·z3 ∇3 ·F3 n3 ·z3 n3 ·F3
⊥4 ·z1 ⊥4 ·F1 ∇4 ·z1 ∇4 ·F1 n4 ·z1 n4 ·F1
⊥4 ·z2 ⊥4 ·F2 ∇4 ·z2 ∇4 ·F2 n4 ·z2 n4 ·F2
⊥ ·z ⊥4 ·F3 ∇4 ·z3 ∇4 ·F3 n4 ·z3 n4 ·F3
P4 3 PFz
z
1 PFz PF
Pz PFz Pz PFz
IKx ⊗ Pt = 1 ⊗ PFz PF = PFz PF
1
Pz PFz
PFz PF
P⊥ P∇⊥ Pn⊥
P⊥ P∇⊥ Pn⊥
P⊥ P∇⊥ Pn⊥
P∇⊥ P∇ P∇n
Px ⊗ IKt = P∇⊥ P∇ P∇n ⊗ (1 1) = P∇⊥ P∇ P∇n
Pn⊥ Pn∇ Pn
Pn⊥ Pn∇ Pn
Pn⊥ Pn∇ Pn
Partial Effects fr (x)(t) as Latent GPs
fr (Xr )(t) = Br θr ,
θr ∼ N 0, (Pr (λtr , λxr ))−
270 / 398
Choice of Bases and Penalties
Any suitable
I marginal basis Btr (e.g. B-splines)
I penalty Ptr (e.g. pth order difference matrix).
over support T .
271 / 398
Marginal Bases for Scalar & Concurrent Effects
272 / 398
Marginal Basis for Functional Covariates
Z H
X
xi (s)β(s, t)ds ≈ wh xi (sh )β(sh , t)
S h=1
H Kx X
Kt
(s) (t)
X X
≈ wh xi (sh ) Bks (sh )Bkt (t)θr ,ks kt
h=1 ks =1 kt =1
(s)
where Bkx (x(sh ), sh ) are radial basis functions or elements of another
tensor product basis.
(McLean et al. 2014, for scalar response)
274 / 398
Functional Random Effects
275 / 398
Functional Random Effects
Issues:
I often fairly important model component: errors typically
autocorrelated and not homoscedastic along T
=⇒ need to capture somehow with ei (t) for valid conditional
inference
I typically require fairly large basis Btr to capture high frequency /
local behavior
=⇒ estimation scales very badly for g with many level
I not (really) locally adaptive
I Pb must be fixed a priori
=⇒ no estimation of inter-level dependency structure, only of
relative variability between levels of g .
276 / 398
Functional Random Effects: FPC representation
277 / 398
Functional Random Effects: FPC representation
278 / 398
Functional Random Effects: FPC representation
279 / 398
Functional Covariates: FPC representation
For xi (s) ≈ K
P x
k=1 ψk (s)ξik ,
Z X Z X
xi (s)β(s, t)ds = ξik ψk (s)β(s, t)ds = ξik β̃k (t)
S k S k
280 / 398
Functional Covariates: FPC representation
281 / 398
Introduction
Model
Covariate Effects
Effect Representation
Applications
282 / 398
Penalized Estimation
E(y|X ) = g (Bθ)
X
−2 log (L(θ|y, B, ν)) + λv θ > P̃v θ → min (1)
v
with B = [B1 | . . . |BR ], θ = [θ1> , . . . , θR> ]> and P̃v the marginal penalties
suitably padded with zeros.
282 / 398
Penalized Likelihood & Mixed Effect Representation
283 / 398
This is also just a kind of varying coefficient model...
284 / 398
Inference is mostly a solved problem:
285 / 398
Advantages of Mixed Model Framework
286 / 398
Introduction
Model
Covariate Effects
Effect Representation
Applications
Application 1: Piggy Panopticon
Application 2: Nutrition & ICU survival
287 / 398
Application: The Piggy Panopticon
PIGWISE Project: RFID surveillance of pig behaviour (Maselyne et al. 2014)
I measure proximity to trough (yes-no) every 10 secs over 102 days for
100 pigs
I additionally: humidity, temperature over time
⇒ models of feeding behaviour potentially useful for ethology (porcine
sociology, clustering) & quality control (disease, quality of feed stock)
pig 57
100
60
day
20
287 / 398
Example: Model
288 / 398
Example: Model Fit
µ̂(t) & y (t) for selected days (training data)
1 6 10 16 21
1.00
0.75
0.50
0.25
27 31 37 42 48
1.00
0.75
0.50
0.25
type
53 59 63 69 74 fitted
1.00 observed
0.75
0.50
0.25
80 84 90 95 101
1.00
0.75
0.50
0.25
0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24
t
289 / 398
Example: Model Predictions
µ̂(t) & y (t) for selected days (test data)
2 5 11 17 20
1.00
0.75
0.50
0.25
26 32 38 41 47
1.00
0.75
0.50
0.25
type
52 58 61 67 73 fitted
1.00 true
0.75
0.50
0.25
79 82 88 94 100
1.00
0.75
0.50
0.25
0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24
t
290 / 398
Example: Estimates
β̂0 (t)
3
-3
-6
0 6 12 18 24
t
b̂i (t)
10
-10
-20
-30
0 6 12 18 24
t
291 / 398
Example: Estimates
β̂(t, s); max. lag= 3 h
estimate upper CI lower CI
0
-1 value
0.1
0.0
-0.1
s-t
-0.2
-0.3
-0.4
-2 -0.5
-3
0 6 12 18 24 0 6 12 18 24 0 6 12 18 24
t
292 / 398
Example: Alternative Model
293 / 398
Example: Alternative Model
β0(t) f (i, t)
^ ^
−1 estimate upper CI lower CI
100
−2
75 value
2.5
−3
day i
0.0
50
−2.5
−4 −5.0
25
−5
0
0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24
t t
value
−1 0.2
0.1
s−t
0.0
−0.1
−2 −0.2
−3
0 6 12 18 24 0 6 12 18 24 0 6 12 18 24
t
294 / 398
Exposure-Lag-Response Association (ELRA):
Nutrition & ICU Survival
I multi-center study of critical care patients from 457 ICUs
(≈ 10k patients)
I investigate acute mortality (first 30d)
I confounders z: age, gender, Apache II Score, year of admission, ICU
random effect, ...
I 12-day nutrition record xi (s)
I prescribed calories (determined at baseline)
I daily caloric intake
I daily caloric adequacy (CA)= caloric intake/prescribed calories
6709 38318 50609
125 ●
●
100 ●
● ● ● ●
●
75 ●
●
● ● ●
● ●
50 ● ●
●
25 ●
0 ● ● ● ● ● ● ● ●
caloric adequacy (%)
100 ●
● ● ● ● ●
●
●
●
75 ●
50 ●
25 ● ●
●
0 ● ● ●
● ● ●
● ●
● ● ●
75 ●
●
●
50 ●
25
0 ● ●
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
Protocol day te
Model Assumptions:
delayed, time-limited, cumulative & time-varying effect of time-varying
exposure xi (s)
Idea:
(partial) effect of xi (s) on log-hazard at time t:
Z
xi (s)β(t, s)ds
W (t)
297 / 398
Piece-wise Exponential Model
298 / 398
ELRA: Example
Compare hazard of patient with given nutrition record to constant
undernutrition (e.g., ceteris paribus):
100 1.25
75
0.75
●
caloric adequacy [%]
50 ●
● 0.5
●
●
25 ●
0.25
0 ●
1 2 3 4 5 6 7 8 9 10 11 10 20 30
protocol day t
299 / 398
Part VI
300 / 398
GAMM Algorithm
refund
FDboost
GAMM Algorithm
GAMMS as GLMs as LMs
Estimation Algorithm
refund
FDboost
302 / 398
Introduction
302 / 398
MLE for GAMMs
303 / 398
IWLS
304 / 398
mgcv Algorithm
I inner loop:
run IWLS to convergence for θ̂λ for each evaluation of `(λ)
I outer loop:
optimize `(λ)
305 / 398
mgcv Algorithm: Outer Loop
kỹ − Bθ̂λ k2 + θ̂λ> Pλ θ̂λ
`(λ) ∝ + log B> WB + Pλ − log |Pλ |
ν
307 / 398
mgcv Algorithm: Discretisation
308 / 398
mgcv Alternative Inference Algorithms
309 / 398
GAMM Algorithm
refund
Scalar responses: pfr
Functional responses: pffr
FDboost
310 / 398
refund
310 / 398
refund::pfr
effect syntax
R
linear functional effectR x(s)β(s)ds lf(x)
smooth functional effect RF (x(s), s)ds af(x)
FPC based x(s)β(s)ds fpc(x)
311 / 398
refund::pffr
I wrapper for GFAMMs around mgvc’s model fitting functions
I defines additional term types for functional covariates and formula
specials for functional responses
I formula-based model definitionRadapted from mgvc:
e.g. E(y (t)|x) = β0 (t) + x1 (s)β1 (s, t)ds + x2 β2 (t) + f (x3 )
becomes y ~ 1 + ff(x1) + x2 + c(s(x3))
Iby default, all effects vary over t
→ tensor product representation of effects
=⇒ all effects available for scalar responses, covariates in mgcv usable for
functional responses (... almost)
I constant effects wrapped in c()
I specification of basis over t in arguments bs.yindex, bs.int
effect syntax
R
linear functional effectR x(s)β(s, t)ds ff(x)
smooth functional effectR F (x(s), s, t)ds sff(x)
FPC based x(s, t)β(s, t)ds
as k ξˆk β̃k (t)
P
ffpc(x)
FPC based random-effects bgi (t) pcre(g, ...)
313 / 398
GAMM Algorithm
refund
FDboost
314 / 398
Introduction
314 / 398
Component-wise gradient boosting
I Boosting is an ensemble method that aims at minimizing the
expectation of a loss criterion.
I The predictor is iteratively updated along the steepest gradient with
respect to the components of an additive predictor (functional
gradient descent).
I Model represented as a sum of simple (penalized) regression models,
the base-learners, fitted to the negative gradients by OLS in each
step.
I In each boosting iteration only the best fitting base-learner is updated
(component-wise) with step-length ν.
I For functional response regression, response and predictors are
functions over T .
315 / 398
Some transformation functions ξ and loss functions ρ
PR
Model: ξ(yi |Xi = xi ) = f (Xi ) = r =1 fr (xri )
ξ ρ(Y , h(x))
mean regression E L2 -loss
median regression q0.5 L1 -loss
quantile regression qτ check function
generalized regression g ◦E neg. log-likelihood
GAMLSS vector of Q par. neg. log-likelihood
316 / 398
Algorithm: component-wise gradient boosting
317 / 398
Algorithm: functional GAMLSS boosting
I [Step 1:] initialize all parameters, set m = 0
I [Step 2:] (within each iteration m)
I for q = 1, ..., Q
(q)
I compute the negative partial gradients ui for i = 1, . . . , N of the
empirical risk w.r.t. the predictor f (q) using the current estimates of
all distribution parameters
(q) (q)
I fit each base-learner fr to ui , r = 1, . . . , R
(q)
I select the best fitting base-learner fr ∗
(q ∗ )
I select parameter q ∗ with the best fitting base-learner fr ∗ and
update its coefficients with a small step-length ν
I [Step 3:] unless m > mstop set m = m + 1, go to Step 2.
(Mayr et al., 2012; Brockhaus et al., 2017; non-cyclical: Thomas et al, 2017)
318 / 398
Tuning parameters of gradient boosting
319 / 398
Summary
Idea:
I iteratively boost the model performance
(=ˆ reduce expected loss)
I by fitting and evaluating the partial effects (base learners) fr (Xr , t)
component-wise
I using one partial effect at a time to update the model
320 / 398
Comparison
321 / 398
mboost
322 / 398
Generalized Linear Array Models
ND
I Tensor product design matrices B = d=1 Bd become huge very
quickly
I .. but they have lots of repeating structure, by construction
I Idea: use structure for more efficient computations of Bθ,
B> diag(w)B
I never actually compute B, only marginal Bd , d = 1, . . . , D
I perform matrix operations by clever re-dimensioning and successive
operations on marginal Bd
I e.g. D = 2 : (B1 ⊗ B2 )θ = B2 ((B1 Θ)> )> with Θ = [θj k]j,k
I large gains for high D, parameter count
I at least 1 order of magnitude fewer ops for Bθ, 2-3 orders for
B> diag(w)B
323 / 398
GAMM Algorithm
refund
FDboost
Example: Emotion components data
324 / 398
Implementation in FDboost
I Main fitting function:
FDboost(formula, timeformula, data, ...)
I timeformula
= NULL for scalar-on-function regression,
= ∼ bbs(t) for function-on-function regression
324 / 398
Example: Emotion components data
Data set from Gentsch et al. 2014, also used in Rügamer et al. 2018
I Main goal: Understand how emotions evolve
I Participants played a gambling game with real money outcome
I Emotions “measured” via EMG (muscle activity in the face)
I Influencing factor appraisals measured via EEG (brain activity)
I Different game situation, a lot of trials
325 / 398
Example: Emotion components data
1 25
4 2 26
EEG
0 3 27
value [microvolt]
4 28
−4 5 29
−8 6 30
8 32
50
15 33
EMG
25 19 34
22 35
0
23 36
−25 24
0 100 200 300 400
time
Model equation:
327 / 398
Results for more complex model
328 / 398
Example: FDboost call
FDboost(EMG ~ 1 +
brandomc(id, df = 5) +
bhist(EEG, df = 20),
timeformula = ~ bbs(t, df = 4),
control = boost_control(mstop = 5000,
trace = TRUE),
data = data)
329 / 398
Part VII
330 / 398
Functional regression: Edge Cases, Problems, Pitfalls
332 / 398
Identifiability
Functional covariates x(s) often well approximated by truncated
Karhunen-Loève-expansions,
∞
X M
X
xi (s) = ξi,l φX
l (s) ≈ ξi,l φX
l (s).
l=1 l=1
For finite grid data, the design matrix is rank-deficient if M < Kj or if the
span of the basis for β in s-direction contains functions orthogonal to
{φX
l , 1 ≤ l ≤ M} using numerical integration.
332 / 398
Identifiability
333 / 398
Identifiability
I Iff the kernels of penalty and design matrix do not overlap, there is a
unique minimum of the penalty on each hyperplane defined by
coefficient vectors θ representing β(s, t) that yield identically valued
effects fj (x(s))(t).
I If kernels of penalty and x(s) overlap, bad things can happen: e.g.
meaningless linear transformations or constant shifts of estimated
β(s, t)
→ c.f. collinearity in classical models.
334 / 398
Identifiability: Synthetic Example
t
0.4 0.4 0.4 0.4
s 0.6 0.2 s 0.6 0.2 s 0.6 0.2 s 0.6 0.2
0.8 0.8 0.8 0.8
0.0
1.0 0.0
1.0 0.0
1.0 0.0
1.0
335 / 398
Identifiability: Practical recommendations
336 / 398
Phase & Amplitude Variation
30
2 2
Absolute Time
20
1 1
10
0 0 0
0 10 20 30 0 10 20 30 0 10 20 30
Time Time Individual Time
337 / 398
Phase Variation: Functional responses
338 / 398
Phase Variation: Functional responses
0.3
centered FA-RCST
0.7
0.1
FA-CCA
0.5
-0.1
0.3
-0.3
0 20 40 60 80 0 10 20 30 40 50
CCA tract location t RCST location s
339 / 398
Phase Variation: Functional responses
1.0
93
0.68
MS 1
0.10
control 0.76
0.5
70
CCA tract location t
0.56
0.05
0.59
Mean FA-CCA
-0.05 0.00
0.12
46
-0.12
0.50
-0.5
-0.32
23
-0.56
-0.10
-1.0
-0.76
0.41
-1
0 23 46 70 93 0 23 46 70 93 0 23 46 70 93 23 46 70 93
CCA tract location t CCA tract location t CCA tract location t CCA tract location t
340 / 398
Phase Variation: Functional covariates
341 / 398
Phase & Amplitude Variation
30
2 2
Absolute Time
20
1 1
10
0 0 0
0 10 20 30 0 10 20 30 0 10 20 30
Time Time Individual Time
342 / 398
Phase & Amplitude Variation: Registration
Typical procedure:
Decompose xi (t) = (wi ◦ γi )(t) = wi (γi (t))
I warping functions γi : T → T
I map clock time of observed curve xi to common system time of
registered curves
(e.g. growth curves of kids: earlier/later puberty etc.)
I no time jumps: γi continuous
∂
I no time reversals: ∂t γi (t) > 0
I γi (min T ) = min T , γi (max T ) = max T
I registered functions wi (t̃): landmarks like maxima/minima typically
aligned, easier to interpret
I decomposition into horizontal/phase variation γi and
vertical/amplitude variation wi
(Marron et al. 2015, e.g.)
343 / 398
Phase & Amplitude Variation: Registration
344 / 398
Registration Approaches
345 / 398
Registration Approaches
I L2 -Distance Based:
Criterion L(γi ; xi , x0 ) = (x0 (t) − xi (γi−1 (t)))2 dt → minγi
R
I
346 / 398
Registration in practice
347 / 398
What’s my n?
348 / 398
Autocorrelation & Variance Heterogeneity
Ifairly successful approach: rephrase functional response models as
models for scalar function evaluations by shifting all functional
structure into the predictor.
I issue: intra-functional dependency needs to be modeled (see slide
before)
I issue: GFAMMs are conditional models with independence
assumption over yi (tj )|X , i = 1, . . . , n, j = 1, . . . , T
=⇒ most models will require smooth residual terms Ei (t) to capture
intra-functional dependencies, variance heterogeneity:
scales rather terribly....
I marginal type models computationally preferable, but:
I not clear how to incorporate into mgcv’s computational framework,
generally
I hard/impossible to generalize to non-Gaussian case
I doable: AR(1) residual structure
I doable: explicit model for time- and covariate-dependent variance for
Gaussian data (family = "gaulss", more tomorrow)
349 / 398
What my thesis comitte did not get to hear
350 / 398
Predictive modeling with functional covariates
351 / 398
Functional regression: Edge Cases, Problems, Pitfalls
352 / 398
Multi-Level (Functional) Data
352 / 398
Functional Random Effects
353 / 398
Functional Linear Mixed Model
with auto-covariance Z 0 0
( functions K (t, t ) := Cov(Z (t), Z (t )), and
1 if g (i) = g (i 0 )
indicators δiig0 := (Cederbaum, Pouplier, et al. 2016)
0 else
354 / 398
FLMM: Inference
355 / 398
FLMM: Covariance Surface Estimation
356 / 398
FLMM: Final Model-Reestimation
Alternatives:
a) Simple LMM for
I responses ỹ = (ỹ1 (t11 ), . . . , ỹn (tnTn )>
I random effect design matrices P ΦZ = [φ̂Zk (ti )]
Z
i Ti ×K
I random effect co-variances diag(ν̂1Z , . . . , ν̂KZ Z )
Closed form solutions for ξˆik
Z.
P
b) Re-estimate µ(t, Xi ) = r fr (Xri , t) simultaneously with functional
random effects using FPC bases φ̂Zk (t).
I Option a) is (much) faster and worked well for recovering random
effects in synthetic data.
Option b) will yield more honest uncertainty quantification for
covariate effects in µ.
Both performed much better than direct estimation of spline-based
random effects in simulations.
I procedure extends to more grouping factors & random slopes,
irregular/sparse data
(Cederbaum, Pouplier, et al. 2016; Cederbaum, Scheipl, et al. 2018)
357 / 398
Multivariate Functional Data
(1) (D)
I functional data frequently multivariate yi (t) = (yi (t), . . . , yi (t))
e.g. spatial accelerations for accelerometry, trajectory data over a
plane, ...
I challenge: model dependence structure within (y (1) (t), . . . , y (D) (t))
I one possible approach based on GFAMM, FLMM, MFPCA
described in the following
I assumptions: identical grids and commensurable measurement scales
for all dimensions
358 / 398
FAMM for each dimension
y(d) contains function evaluations, B(d) θ (d) represents fixed effects, Φ(d) contains
evaluations of random effect FPCs, unstructured errors (d) ∼ N(0, σd2 I)
359 / 398
Multivariate FAMM
with
¯ ∼ N 0, diag(σ12 , ..., σp2 ) ⊗ I
360 / 398
Multivariate Functional PCA P
We want a multivariate FPC representation y (t) = K k=1 ξk φ(t),
(1) (D)
where φk (t) = (φk (t), . . . , φk (t))> with eigenvalues νk .
Assume a finite univariate Karhunen-Loève representation for each y (d) (t) exists, i.e.
XKd (d) (d)
y (d) (t) = ρk ψk (t),
k=1
362 / 398
MFPCA: Estimation Algorithm
363 / 398
Multivariate FAMM
364 / 398
Density Functions as Outcomes
Problem:
Density functions are positive, must integrate to 1. P
How to guarantee this in an addditive model where y (t) ≈ r fr (X , t)?
Solution:
Similar problem as with non-Gaussian responses for LMs, similar solution:
=⇒ transformation of the response into a more friendly space without
restrictions.
365 / 398
Density Functions as Outcomes
366 / 398
Vector space structure for densities
367 / 398
Vector space structure for densities
368 / 398
The clr-transformation
Introduced by Aitchison 1986, Boogart et al. 2014
Let L20 (µ) be the space of square-integrable functions with integral zero
(sub-vector space of L2 (µ)).
The centered-log-ratio (clr) transformation B 2 (µ) → L20 (µ), f 7→ f˜ is
defined as Z
˜
f = log(f ) − µ(T )−1
log(f )dµ
with inverse
exp(f˜)
f =R
exp(f˜)dµ
Well-defined, linear, injective!
This means: we can take elements in B 2 (µ), send them to L20 (µ), analyse
them there, and map them back to B 2 (µ).
369 / 398
An inner product for densities
370 / 398
Additive functional regression model for PDFs
Formulation in B2 (µ)
371 / 398
Additive functional regression model for PDFs
Formulation in L20 (µ)
372 / 398
Application: relative income within households
Distribution of the income share of the woman
I Economic consequences of gender identity (Bertrand 2014):
wife income
distribution of s = wife income + husband income in the U.S.
I German Socio-Economic Panel (SOEP):
3.0
I Potentially relevant
2.0
2.0
factors: region, year,
children, ...
1.0
1.0
I Positive probability mass
0.0
0.0
at 0 and 1
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0
373 / 398
The SOEP-Data
German Socio-Economic
1984 Panel (SOEP) (Schupp et al. nd):
I wide-ranging representative longitudinal study of private households
1987
in Germany
1990
I mixed discrete + continuous PDFs estimated per
1993
I year: from 1984 to 2016
I region: e.g., south = {Bavaria, Baden-Würt.}
geographical1996
I child status: age 0 − 6 / age 7 − 18 / older or no child in household
1999
Example PDFs:
2002
Weighted Densities per Year: south, child group 1 Weighted Densities per Year: south, child group 3
2005
2.0
2.0
2008
1.5
1.5
Density
Density
1.0
1.0
2011
−
−
−
0.5
0.5
−
−
−
−
−
− −
−
2014 − −
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
1984 1987 1990 1993 1996 1999 2002 2005 2008 2011 2014
374 / 398
Model formulation
I Mixed reference measure µ = δ0 + λ + δ1
I Response PDFs y (s) ∈ B 2 (µ) of income share s ∈ [0, 1]
old: b 0
5
1.5
new: b 0 Å b new
0
1.0
PDF
clr
-5
0.5
~
old: b 0
new: b~ +b ~
0 new
-10
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
376 / 398
Phase & Amplitude Variation
30
2 2
Absolute Time
20
1 1
10
0 0 0
0 10 20 30 0 10 20 30 0 10 20 30
Time Time Individual Time
377 / 398
3 ideas:
I warping functions are (generalized) cumulative distribution functions
I define FPCA for densities via clr-projection
I do MFPCA of (derivatives of) warping functions and registered
functions to get joint representation of phase and amplitude variation
378 / 398
Warping functions & densities
379 / 398
FPCA for densities
380 / 398
MFPCA for warping functions and registered
functions
1. perform registration with your favorite method: xi (t) → (γ̂i (t), ŵi (t̃))
∂
2. do FPCA of clr ∂t γ̂(t) , ŵ (t̃).
3. combine univariate FPCs into multivariate FPC via MFPCA
=⇒ MFPC scores represent both phase and amplitude variation
(Happ, Scheipl, et al. 2019) (conjectured:
=⇒ not well-definable distinction between phase and amplitude variation becomes less
relevant, just describe the data compactly...)
381 / 398
Phase-Amplitude-MFPCA: Example
PC 1 (26.8% Fréchet variance explained) PC 1 PC 1
Full variation Phase variation Amplitude variation
1.5 1.5
++++ ++ −
+ −+ − ++−− +++
++ ++ +− ++ ++
+ + + +
+ + +++ ++ + − + − + −+ − + − +−−− ++
−−
+ + + +
+ + +++ ++
0.9 + −+ − + +
+ + + ++++ + + + + ++
+ −− + −+ − + −+ − + − + − + − +−
++
− + + + ++++ + + + + +
++ +++ + + + + − +− ++ +++ + ++ ++ +
1.0 +++ ++ ++−− ++−− + + + − +− + 1.0 ++ +
+ +− +−−− +++ −− −+
+− +
−− 0.6 −−
+ −−− − − −− +− + −−− − − −−
− − −−− −− − − −−− −−
+ −− − − − − + − − − − −
0.5 − − − − −− − + 0.5 − − − − −
+ −−
−−− −−− −−− − − −− 0.3 + − + −−
−−− −−− −−− − −− −−−−
−− −−
+ − + − +−−
+ − + − +−
0.0 +
−+
−+−+
−+−+−−+
−−
−++ −− − 0.0 −+
+−+
−+
−+−+
−+ −+
−+−−−− 0.0 +
−+
−+
−++−
−− +−
+−+−−+
++−−
0 10 20 30 0 10 20 30 0 10 20 30
Time Time Time
382 / 398
Functional regression: Edge Cases, Problems, Pitfalls
383 / 398
Alternatives
383 / 398
Model
A
X H
X
yi (t) = xia βa (t) + zihm bhm (t) + ei (t)
a h
384 / 398
Basis Representation
Use a (near) lossless basis representation yi (t) = Φ(t)y?i , i.e.,
Y = Y? Φ
N×T N×K K ×T
Model
A
X H
X
yik? = ?
xia βak + ?
zihm bhmk + eik? ,
a h
PK ? φ (t) etc..
with βa (t) = k βak k
385 / 398
Inference
386 / 398
Distributional assumptions
387 / 398
Pros & Cons
388 / 398
Pros & Cons
389 / 398
Functional regression: Edge Cases, Problems, Pitfalls
390 / 398
Clustering functions, the easy way
390 / 398
Clustering functions, more fancy
391 / 398
Clustering functions, more fancy yet
392 / 398
References I
Bender, A., A. Groll, and F. Scheipl (2018). A generalized additive model approach to time-to-event analysis. In: Statistical
Modelling 18.3-4, pp. 299–321.
Bender, A., F. Scheipl, et al. (2018). Penalized estimation of complex, non-linear exposure-lag-response associations. In:
Biostatistics dkxy003.
Brockhaus, S., A. Fuest, A. Mayr, and S. Greven (2018). Signal regression models for location, scale and shape with an
application to stock returns. In: Journal of the Royal Statistical Society: Series C (Applied Statistics) 67.3, pp. 665–686.
Brockhaus, S., M. Melcher, F. Leisch, and S. Greven (2017). Boosting Flexible Functional Regression Models with a High
Number of Functional Historical Effects. In: Statistics and Computing 27.4, pp. 913–926.
Brockhaus, S. and D. Ruegamer (2018). FDboost: Boosting Functional Regression Models. R package version 0.3-2.
Brockhaus, S., F. Scheipl, T. Hothorn, and S. Greven (2015). The functional linear array model. In: Statistical Modelling 15.3,
pp. 279–300.
Brumback, B. and J. Rice (1998). Smoothing spline models for the analysis of nested and crossed samples of curves (with
discussion). In: Journal of the American Statistical Association 93, pp. 961–994.
Cardot, H., F. Ferraty, and P. Sarda (1999). Functional Linear Model. In: Statistics and Probability Letters 45.1, pp. 11–22.
Cardot, H., F. Ferraty, and P. Sarda (2003). Spline estimators for the functional linear model. In: Statistica Sinica 13.3,
pp. 571–592.
Cederbaum, J., M. Pouplier, P. Hoole, and S. Greven (2016). Functional linear mixed models for irregularly or sparsely sampled
data. In: Statistical Modelling 16.1, pp. 67–88.
Cederbaum, J., F. Scheipl, and S. Greven (2018). Fast symmetric additive covariance smoothing. In: Computational Statistics
& Data Analysis 120, pp. 25–41.
Chen, D., H.-G. Müller, et al. (2012). Nonlinear manifold representations for functional data. In: The Annals of Statistics 40.1,
pp. 1–29.
Chen, T. et al. (2019). xgboost: Extreme Gradient Boosting. R package version 0.81.0.1.
https://CRAN.R-project.org/package=xgboost.
Chiou, J. M. and P. L. Li (2007). Functional clustering and identifying substructures of longitudinal data. In: Journal of the
Royal Statistical Society. Series B: Statistical Methodology 69.4, pp. 679–699.
Chiou, J.-M. and P.-L. Li (2008). Correlation-based functional clustering via subspace projection. In: Journal of the American
Statistical Association 103.484, pp. 1684–1692.
Chiou, J., H. Müller, and J. Wang (2004). Functional response models. In: Statistica Sinica 14.3, pp. 675–694.
Cuevas, A. (2014). A partial overview of the theory of statistics with functional data. In: Journal of Statistical Planning and
Inference 147, pp. 1–23.
References II
Currie, I. D., M. Durban, and P. H. Eilers (2006). Generalized linear array models with applications to multidimensional
smoothing. In: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68.2, pp. 259–280.
Currie, I., M. Durban, and P. Eilers (2006). Generalized linear array models with applications to multidimensional smoothing.
In: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68.2, pp. 259–280.
Descary, M.-H. and V. M. Panaretos (2018). Recovering covariance from functional fragments. In: Biometrika 106.1,
pp. 145–160.
Di, C.-Z., C. Crainiceanu, B. Caffo, and N. Punjabi (2009). Multilevel functional principal component analysis. In: Annals of
Applied Statistics 3.1, pp. 458–488.
Egozcue, J. J., J. L. D´az–Barrero, and V. Pawlowsky–Glahn (2006). Hilbert space of probability density functions based on
Aitchison geometry. In: Acta Mathematica Sinica 22.4, pp. 1175–1182.
Ferraty, F. and P. Vieu (2006). Nonparametric Functional Data Analysis. Springer Series in Statistics. Springer, New York.
Theory and practice.
Genest, M., J.-C. Masse, and J.-F. Plante. (2017). depth: Nonparametric Depth Functions for Multivariate Analysis.
R package version 2.1-1. https://CRAN.R-project.org/package=depth.
Gentsch, K., D. Grandjean, and K. R. Scherer (2014). Coherence explored between emotion components: Evidence from
event-related potentials and facial electromyography. In: Biological Psychology 98, pp. 70–81.
Goldsmith, J., J. Bobb, et al. (2011). Penalized Functional Regression. In: Journal of Computational and Graphical Statistics
20.4, pp. 830–851.
Goldsmith, J., C. Crainiceanu, B. Caffo, and D. Reich (2012). Longitudinal Penalized Functional Regression for Cognitive
Outcomes on Neuronal Tract Measurements. In: Journal of the Royal Statistical Society: Series C 61.3, pp. 453–469.
Goldsmith, J., M. Wand, and C. Crainiceanu (2011). Functional regression via variational Bayes. In: Electronic Journal of
Statistics 5, p. 572.
Goldsmith, J., F. Scheipl, et al. (2018). refund: Regression with Functional Data. R package version 0.1-17.
https://CRAN.R-project.org/package=refund.
Greenwell, B., B. Boehmke, J. Cunningham, and G. Developers (2019). gbm: Generalized Boosted Regression Models. R
package version 2.1.5. https://CRAN.R-project.org/package=gbm.
Greven, S., C. Crainiceanu, B. Caffo, and D. Reich (2010). Longitudinal Functional Principal Component Analysis. In:
Electronic Journal of Statistics 4, pp. 1022–1054.
Greven, S. and F. Scheipl (2017). A general framework for functional regression modelling. In: Statistical Modelling 17.1-2,
pp. 1–35.
References III
Groll, A., T. Kneib, A. Mayr, and G. Schauberger (2018). On the dependency of soccer scores–a sparse bivariate Poisson model
for the UEFA European football championship 2016. In: Journal of Quantitative Analysis in Sports 14.2, pp. 65–79.
Happ, C. (2018). MFPCA: Multivariate Functional Principal Component Analysis for Data Observed on Different
Dimensional Domains. R package version 1.3-1. https://github.com/ClaraHapp/MFPCA.
Happ, C. and S. Greven (2018). Multivariate functional principal component analysis for data observed on different
(dimensional) domains. In: Journal of the American Statistical Association 113.522, pp. 649–659.
Happ, C., F. Scheipl, A.-A. Gabriel, and S. Greven (2019). A general framework for multivariate functional principal component
analysis of amplitude and phase variation. In: Stat 8.1.
He, G., H. Müller, and J. Wang (2003). “Extending correlation and regression from multivariate to functional data”. In:
Asymptotics in Statistics and Probability. Ed. by M. Puri. VSP International Science Publishers, pp. 301–315.
Hofner, B., L. Boccuto, and M. Göker (2015). Controlling false discoveries in high-dimensional situations: boosting with stability
selection. In: BMC bioinformatics 16.1, p. 144.
Hofner, B., T. Hothorn, T. Kneib, and M. Schmid (2011). A framework for unbiased model selection based on boosting. In:
Journal of Computational and Graphical Statistics 20.4, pp. 956–971.
Hofner, B., A. Mayr, N. Fenske, and M. Schmid (2018). gamboostLSS: Boosting Methods for GAMLSS Models. R package
version 2.0-1. https://CRAN.R-project.org/package=gamboostLSS.
Hofner, B., A. Mayr, N. Robinzonov, and M. Schmid (2014). Model-based boosting in R: a hands-on tutorial using the R
package mboost. In: Computational statistics 29.1-2, pp. 3–35.
Hothorn, T., P. Buehlmann, et al. (2018). mboost: Model-Based Boosting. R package version 2.9-1.
https://CRAN.R-project.org/package=mboost.
Hothorn, T., P. Bühlmann, et al. (2010). Model-based boosting 2.0. In: Journal of Machine Learning Research 11,
pp. 2109–2113.
Hyndman, R. J. and H. L. Shang (2010). Rainbow Plots, Bagplots, and Boxplots for Functional Data. In: Journal of
Computational and Graphical Statistics 19.1, pp. 29–45.
Ivanescu, A., A.-M. Staicu, F. Scheipl, and S. Greven (2015). Penalized function-on-function regression. In: Computational
Statistics 30.2, pp. 539–568.
Jacques, J. and C. Preda (2013). Funclust: A curves clustering method using functional random variables density approximation.
In: Neurocomputing 112, pp. 164–171.
James, G. M. and C. A. Sugar (2003). Clustering for Sparsely Sampled Functional Data. In: Journal of the American Statistical
Association 98.462, pp. 397–408.
References IV
James, G. (2002). Generalized linear models with functional predictors. In: Journal of the Royal Statistical Society: Series B
(Statistical Methodology) 64.3, pp. 411–432.
Lang, S. et al. (2014). Multilevel structured additive regression. In: Statistics and Computing 24.2, pp. 223–238.
Li, Z. and S. N. Wood (2019). Faster model matrix crossproducts for large generalized linear models with discretized covariates.
In: Statistics and Computing, pp. 1–7.
Loève, M. (1978). Probability theory II. Springer.
López-Pintado, S. and J. Romo (2009). On the Concept of Depth for Functional Data. In: Journal of the American Statistical
Association 104.486, pp. 718–734.
Maier, E., A. Stoecker, B. Fitzenberger, and S. Greven (2019). “Flexible Regression for Probability Densities in Bayes Spaces”.
in preparation.
Malfait, N. and J. Ramsay (2003). The historical functional linear model. In: Canadian Journal of Statistics 31.2, pp. 115–128.
Marra, G. and S. N. Wood (2011). Practical variable selection for generalized additive models. In: Computational Statistics &
Data Analysis 55.7, pp. 2372–2387.
Marron, J. S., J. O. Ramsay, L. M. Sangalli, and A. Srivastava (2015). Functional Data Analysis of Amplitude and Phase
Variation. In: Statistical Science 30.4, pp. 468–484.
Maselyne, J. et al. (2014). Validation of a High Frequency Radio Frequency Identification (HF RFID) system for registering
feeding patterns of growing-finishing pigs. In: Computers and Electronics in Agriculture 102, pp. 10–18.
Mayr, A. et al. (2012). Generalized additive models for location, scale and shape for high-dimensional data - a flexible approach
based on boosting. In: Journal of the Royal Statistical Society, Series C - Applied Statistics 61.3, pp. 403–427.
McLean, M. W. et al. (2014). Functional generalized additive models. In: Journal of Computational and Graphical Statistics
23.1, pp. 249–269.
Meinshausen, N. and P. Bühlmann (2010). Stability selection. In: Journal of the Royal Statistical Society: Series B (Statistical
Methodology) 72.4, pp. 417–473.
Meyer, M. J. et al. (2015). Bayesian function-on-function regression for multilevel functional data. In: Biometrics 71.3,
pp. 563–574.
Morris, J. S. (2015). Functional Regression. In: Annual Review of Statistics and Its Application 2.1, pp. 321–359.
Morris, J. S., P. J. Brown, et al. (2008). Bayesian analysis of mass spectrometry proteomic data using wavelet-based functional
mixed models. In: Biometrics 64.2, pp. 479–489.
Morris, J. S. and R. J. Carroll (2006). Wavelet-based functional mixed models. In: Journal of the Royal Statistical Society:
Series B (Statistical Methodology) 68.2, pp. 179–199.
References V
Morris, J. and R. Carroll (2006). Wavelet-based functional mixed models. In: Journal of the Royal Statistical Society: Series B
(Statistical Methodology) 68.2, pp. 179–199.
Müller, H. and F. Yao (2008). Functional additive models. In: Journal of the American Statistical Association 103.484,
pp. 1534–1544.
N, W. S. (2019). mgcv: Mixed GAM Computation Vehicle with Automatic Smoothness Estimation. R package version
1.8-27. https://CRAN.R-project.org/package=mgcv.
Nychka, D. (1988). Confidence intervals for smoothing splines. In: Journal of the American Statistical Association 83,
pp. 1134–1143.
Prchal, L. and P. Sarda (2007). “Spline estimator for functional linear regression with functional response”. unpublished. url:
http://www.math.univ-toulouse.fr/staph/PAPERS/flm_prchal_sarda.pdf.
R Development Core Team (2011). R: A Language and Environment for Statistical Computing. R Foundation for
Statistical Computing. Vienna, Austria. http://www.R-project.org/.
Ramsay, J. O., H. Wickham, S. Graves, and G. Hooker (2018). fda: Functional Data Analysis. R package version 2.4.8.
https://CRAN.R-project.org/package=fda.
Ramsay, J. and G. Hooker (2017). Dynamic data analysis. New York: Springer.
Ramsay, J. and B. Silverman (2005). Functional Data Analysis. 2. ed. New York: Springer.
Reiss, P., L. Huang, and M. Mennes (2010). Fast Function-on-Scalar Regression with Penalized Basis Expansions. In: The
International Journal of Biostatistics 6.1, p. 28.
Reiss, P. and T. Ogden (2009). Smoothing parameter selection for a class of semiparametric linear models. In: Journal of the
Royal Statistical Society: Series B (Statistical Methodology) 71.2, pp. 505–523. issn: 1467-9868. doi:
10.1111/j.1467-9868.2008.00695.x. url: http://dx.doi.org/10.1111/j.1467-9868.2008.00695.x.
Reiss, P. T. and M. Xu (2018). Tensor product splines and functional principal components.
https://works.bepress.com/phil_reiss/46.
Reiss, P. and R. Ogden (2007). Functional principal component regression and functional partial least squares. In: Journal of
the American Statistical Association 102.479, pp. 984–996.
Rügamer, D. et al. (2018). Boosting factor-specific functional historical models for the detection of synchronization in
bioelectrical signals. In: Journal of the Royal Statistical Society: Series C (Applied Statistics) 67.3, pp. 621–642.
Ruppert, D., R. Carroll, and M. Wand (2003). Semiparametric Regression. Cambridge, UK: Cambridge University Press.
Saefken, B., T. Kneib, C.-S. van Waveren, and S. Greven (2014). A unifying approach to the estimation of the conditional
Akaike information in generalized linear mixed models. In: Electronic Journal of Statistics 8.1, pp. 201–225.
References VI
Sangalli, L. M., P. Secchi, S. Vantini, and V. Vitelli (2010). K-mean alignment for curve clustering. In: Computational Statistics
& Data Analysis 54.5, pp. 1219–1233.
Scheipl, F., J. Gertheiss, and S. Greven (2016). Generalized functional additive mixed models. In: Electronic Journal of
Statistics 10.1, pp. 1455–1492.
Scheipl, F., A.-M. Staicu, and S. Greven (2015). Functional additive mixed models. In: Journal of Computational and Graphical
Statistics 24.2, pp. 477–501.
Scheipl, F. and S. Greven (2016). Identifiability in penalized function-on-function regression models. In: Electronic Journal of
Statistics 10.1, pp. 495–526. url: http://arxiv.org/abs/1506.03627.
Shi, J. Q. and T. Choi (2011). Gaussian process regression analysis for functional data. Chapman and Hall/CRC.
Sørensen, H., J. Goldsmith, and L. M. Sangalli (2013). An introduction with medical applications to functional data analysis.
In: Statistics in Medicine 32.30, pp. 5222–5240.
Srivastava, A. and E. P. Klassen (2016). Functional and shape data analysis. Springer.
Sun, Y. and M. G. Genton (2011). Functional Boxplots. In: Journal of Computational and Graphical Statistics 20.2,
pp. 316–334.
Tucker, J. D. (2017). fdasrvf: Elastic Functional Data Analysis. R package version 1.8.3.
https://CRAN.R-project.org/package=fdasrvf.
Tucker, J. D., W. Wu, and A. Srivastava (2013). Generative models for functional data using phase and amplitude separation.
In: Computational Statistics & Data Analysis 61, pp. 50–66.
Van den Boogaart, K. G., J. J. Egozcue, and V. Pawlowsky-Glahn (2014). Bayes hilbert spaces. In: Australian & New Zealand
Journal of Statistics 56.2, pp. 171–194.
Wang, J.-L., J.-M. Chiou, and H.-G. Müller (2016). Review of Functional Data Analysis. In: Annual Review of Statistics and Its
Application 3.1, pp. 1–41.
Wang, K., T. Gasser, et al. (1997). Alignment of curves by dynamic time warping. In: The annals of Statistics 25.3,
pp. 1251–1276.
Wood, S. N. (2006a). Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC.
Wood, S. N. (2006b). Low Rank Scale Invariant Tensor Product Smooths for Generalized Additive Mixed Models. In:
Biometrics 62.1.
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized
linear models. In: Journal of the Royal Statistical Society (B) 73.1, pp. 3–36.
Wood, S. N. (2012). On p-values for smooth components of an extended generalized additive model. In: Biometrika 100.1,
pp. 221–228.
References VII
Wood, S. N., Y. Goude, and S. Shaw (2015). Generalized additive models for large data sets. In: Journal of the Royal
Statistical Society: Series C (Applied Statistics) 64.1, pp. 139–155.
Wood, S. N., F. Scheipl, and J. Faraway (2012). Straightforward intermediate rank tensor product smoothing in mixed models.
In: Statistics and Computing, pp. 1–20. url: http://dx.doi.org/10.1007/s11222-012-9314-z.
Wood, S. N. and M. Fasiolo (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application
to Tweedie location, scale and shape models. In: Biometrics 73.4, pp. 1071–1081.
Wood, S. N., Z. Li, G. Shaddick, and N. H. Augustin (2017). Generalized additive models for gigadata: modeling the UK black
smoke network daily data. In: Journal of the American Statistical Association 112.519, pp. 1199–1210.
Wood, S. N., N. Pya, and B. Säfken (2016). Smoothing parameter and model selection for general smooth models. In: Journal
of the American Statistical Association 111.516, pp. 1548–1563.
Wood, S. N. and F. Scheipl (2017). gamm4: Generalized Additive Mixed Models using ’mgcv’ and ’lme4’. R package
version 0.2-5. https://CRAN.R-project.org/package=gamm4.
Xiao, L., V. Zipunnikov, D. Ruppert, and C. Crainiceanu (2016). Fast covariance estimation for high-dimensional functional
data. In: Statistics and computing 26.1-2, pp. 409–421.
Yao, F., B. Liu (Authors), H.-G. Mueller, and J.-L. Wang (Coordinators) (2012). PACE: Principal Analysis by Conditional
Expectation, Functional Data Analysis and Empirical Dynamics. MATLAB package version 2.15.
http://anson.ucdavis.edu/~ntyang/PACE/.
Yao, F., H.-G. Müller, and J.-L. Wang (2005). Functional Data Analysis for Sparse Longitudinal Data. In: Journal of the
American Statistical Association 100.470, pp. 577–590.
Zhu, H., P. Brown, and J. Morris (2011). Robust, Adaptive Functional Regression in Functional Mixed Model Framework. In:
Journal of the American Statistical Association 106.495, pp. 1167–1179.
Zhu, H., F. Versace, et al. (2018). Robust and Gaussian spatial functional regression models for analysis of event-related
potentials. In: NeuroImage 181, pp. 501–512.