Predictive Models With Matlab
Predictive Models With Matlab
Smith H.
Conditional Mean Models
In this section...
Unconditional vs. Conditional Mean on page 4-3 Static vs. Dynamic Conditional Mean Models on
page 4-3 Conditional Mean Models for Stationary Processes on page 4-4
,, ,
t1
Past innovations,
,, ,
HH H 12 t1
Ey H(| )1 P\Hiti,
tt
(4-1)i 1
References
[1] Wold, H. A Study in the Analysis of Stationary Time Series. Uppsala, Sweden: Almqvist & Wiksell,
1938.
See Also
arima |
Related Examples
Specify Conditional Mean Models Using arima on page 4-6
AR Model Specifications on page 4-21
MA Model Specifications on page 4-29
ARMA Model Specifications on page 4-38
Concepts
Autoregressive Model on page 4-18
Moving Average Model on page 4-27
Autoregressive Moving Average Model on page 4-35
ARIMA Model on page 4-42
Multiplicative ARIMA Model on page 4-47
In either equation, the default innovation distribution is Gaussian with mean zero and constant variance.
You can specify a model of this form using the shorthand syntax arima(p,D,q). For the input arguments
p, D, and q, enter the number of nonseasonal AR terms (p), the order of nonseasonal integration (D), and
the number of nonseasonal MA terms (q), respectively.
When you use this shorthand syntax, arima creates an arima model with these default property values.
Property Name AR
Beta
Constant
D
Distribution
Property Data Type
Cell vector of NaNs
Empty vector [] of regression coefficients corresponding to exogenous covariates
NaN
Degree of nonseasonal integration, D 'Gaussian'
Property Name MA
P
Q SAR
SMA
Variance
Property Data Type
Cell vector of NaNs
Number of AR terms plus degree of integration, p + D
Number of MA terms, q
Cell vector of NaNs
Cell vector of NaNs
NaN
To assign nondefault values to any properties, you can modify the created model object using dot
notation.
Notice that the inputs D and q are the values arima assigns to properties D and Q. However, the input
argument p is not necessarily the value arima assigns to the model property P. P stores the number of
presample observations needed to initialize the AR component of the model. For nonseasonal models,
the required number of presample observations is p + D.
To illustrate, consider specifying the ARIMA(2,1,1) model
LL Ly c L
TH
1
1221 tt
The created model object, model,has NaNs for all parameters. A NaN value signals that a parameter
needs to be estimated or otherwise specified by the user. All parameters must be specified to forecast or
simulate the model.
To estimate parameters, input the model object (along with data) to estimate. This returns a new fitted
arima model object. The fitted model object has parameter estimates for each input NaN value.
Calling arima without any input arguments returns an ARIMA(0,0,0) model specification with default
property values:
model = arima
model =
ARIMA(0,0,0) Model:
------------------Distribution: Name = 'Gaussian'
P: 0
D: 0
Q: 0
Constant: NaN
AR: {}
SAR: {}
MA: {}
SMA: {}
Variance: NaN
Dependent Gaussian or Students t with a conditional variance process, Vt2. Specify the conditional
variance model using a garch, egarch,or gjr model.
The arima default for the innovations is an iid Gaussian process with constant (scalar) variance.
In order to estimate, forecast, or simulate a model, you must specify the parametric form of the model
(e.g., which lags correspond to nonzero coefficients, the innovation distribution) and any known
parameter values. You can set any unknown parameters equal to NaN, and then input the model to
estimate (along with data) to get estimated parameter values.
arima (and estimate) returns a model corresponding to the model specification. You can modify models
to change or update the specification. Input models (with no NaN values) to forecast or simulate for
forecasting and simulation, respectively. Here are some example specifications using name-value
arguments.
Model Specification
arima('AR',NaN) or arima(1,0,0)yc y IHttt11
HVHttz
zt Gaussian
ytt t tHTH TH
arima('Constant',0,'MA',{NaN,NaN},... 11 2 2 'Distribution','t')
HV
z
Htt
z struct('Name','t','DoF',8))tt.
zt Students t with eight degrees of freedom arima('AR',-0.5,'D',1,'Beta',[-5 2])
(.)( )
105 1
H
LLyx
5
Htt t
Ht ~(,)
You can specify the following name-value arguments to create nonseasonal arima models.
Name-Value Arguments for Nonseasonal ARIMA Models
1' c
AR Nonseasonal AR coefficients,
,,
1IIp
When to Specify
To set equality constraints for the AR coefficients. For example, to specify the AR coefficients in the
model
ARLags Lags
corresponding to nonzero, nonseasonal AR coefficients
Beta Values of the coefficients of the exogenous covariates
yy y
H.. ,tt t t
specify 'AR',{0.8,-0.2}
You only need to specify the nonzero elements of AR. If the nonzero coefficients are at nonconsecutive
lags, specify the corresponding lags using ARLags.
Any coefficients you specify must correspond to a stable AR operator polynomial.
ARLags is not a model property.
Use this argument as a shortcut for specifying AR when the nonzero AR coefficients correspond to
nonconsecutive lags. For example, to specify nonzero AR coefficients at lags 1 and
12, e.g.,yy y , specifytt t tII H
'ARLags',[1,12].
Use AR and ARLags together to specify known nonzero AR coefficients at nonconsecutive lags. For
example, if in the given AR(12)
model I1 06 and I12 ., specify 'AR',{0.6,-0.3},'ARLags',[1,12].
Usethisargumenttospecifythevaluesofthe coefficients of the exogenous variables. For example, use
'Beta',[0.5 7 -2] to specify
05 7
vector.
Model Term(s) in Equation 4-2 Constant Constant term, c
D Degree of nonseasonal differencing, D
Distribution Distribution of the innovation process
MA Nonseasonal MA coefficients,
,,
1TTq
To set equality constraints for c. For example, for a model with no constant term, specify 'Constant',0.
By default, Constant has value NaN.
To specify a degree of nonseasonal differencing greater than zero. For example, to specify one degree of
differencing, specify 'D',1.
By default, D has value 0 (meaning no nonseasonal integration).
Use this argument to specify a Students t innovation distribution. By default, the innovation distribution
is Gaussian.
For example, to specify a t distribution
with unknown degrees of freedom, specify 'Distribution','t'.
To specify a t innovation distribution with known degrees of freedom, assign Distribution a data
structure with fields Name and DoF. For example, for a t distribution with nine degrees of freedom,
specify 'Distribution',struct('Name','t','DoF',9).
To set equality constraints for the MA coefficients. For example, to specify the MA coefficients in the
model
y
05
tt t t
HH H
..,
12 specify 'MA',{0.5,0.2}.
You only need to specify the nonzero elements of MA. If the nonzero coefficients are at nonconsecutive
lags, specify the corresponding lags using MALags.
Model Term(s) in Equation 4-2
MALags Lags
corresponding to nonzero, nonseasonal MA coefficients Any coefficients you specify must correspond
to an invertible MA polynomial.
MALags is not a model property.
Use this argument as a shortcut for specifying MA when the nonzero MA coefficients correspond to
nonconsecutive lags. For example, to specify nonzero MA coefficients at lags 1 and 4, e.g.,
ytt t tHTH TH ,
11 4 4
specify 'MALags',[1,4].
Use MA and MALags together to specify known nonzero MA coefficients at nonconsecutive lags. For
example, if in the given MA(4)
VH
Conditional variance process, V
2t
(4-5)
The innovation series can be an independent or dependent Gaussian or Students t process. The arima
default for the innovation distribution is an iid Gaussian process with constant (scalar) variance.
In addition to the arguments for specifying nonseasonal models (described in Name-Value Arguments
for Nonseasonal ARIMA Models on page 4-11), you can specify these name-value arguments to create a
multiplicative arima model. You can extend an ARIMAX model similarly to include seasonal effects.
Name-Value Arguments for Seasonal ARIMA Models
Argument
SAR
SARLags
Corresponding Model Term(s) in Equation 4-5
Seasonal AR
coefficients,
,,
))ps
Lags corresponding to nonzero seasonal AR coefficients, in the periodicity of the observed series
When to Specify
To set equality constraints for the seasonal AR coefficients. When specifying AR coefficients, use the
sign opposite to what appears in Equation 4-5 (that is, use the sign of the coefficient as it would appear
on the right side of the equation).
Use SARLags to specify the lags of the nonzero seasonal AR coefficients. Specify the lags associated
with the seasonal polynomials in the periodicity of the observed data (e.g., 4, 8,... for quarterly data, or
12, 24,... for monthly data), and not as multiples of the seasonality (e.g., 1, 2,...).
For example, to specify the model
(.)( . ) ,LLy12
tt
specify 'AR',0.8,'SAR',0.2,'SARLags',12 .
Any coefficient values you enter must correspond to a stable seasonal AR polynomial.
SARLags is not a model property.
Use this argument when specifying SAR to indicate the lags of the nonzero seasonal AR coefficients.
For example, to specify the model
12
()( ) ,LLy
IH
specify 'ARLags',1,'SARLags',12.
Term(s) in Equation 4-5
SMA Seasonal MA coefficients,
,,
44qs
SMALags Lags corresponding to the nonzero seasonal MA coefficients, in the periodicity of the
observed series To set equality constraints for the seasonal MA coefficients.
Use SMALags to specify the lags of the nonzero seasonal MA coefficients. Specify the lags associated
with the seasonal polynomials in the periodicity of the observed data (e.g., 4, 8,... for quarterly data, or
12, 24,... for monthly data), and not as multiples of the seasonality (e.g., 1, 2,...).
For example, to specify the model
yL L(.)( . ),12 H
tt
specify 'MA',0.6,'SMA',0.2,'SMALags',12.
Any coefficient values you enter must correspond to an invertible seasonal MA polynomial.
SMALags is not a model property.
Use this argument when specifying SMA to indicate the lags of the nonzero seasonal MA coefficients.
For example, to specify the model
Seasonality Seasonal periodicity, s yL L()( ),4 4
14 ttTH
specify 'MALags',1,'SMALags',4.
To specify the degree of seasonal integration s in the seasonal differencing polynomial s=1
Ls. For example, to specify the periodicity for seasonal integration of monthly data, specify
'Seasonality',12.
If you specify nonzero Seasonality,then
Term(s) in Equation 4-5
the degree of the whole seasonal differencing polynomial is one. By default, Seasonality has value 0
(meaning periodicity and no seasonal integration).
Note You cannot assign values to the properties P and Q. For multiplicative ARIMA models,
arima sets P equal to p + D + ps + s
arima sets Q equal to q + qs
See Also
arima | estimate | forecast | simulate |
Related Examples
AR Model Specifications on page 4-21
MA Model Specifications on page 4-29
ARMA Model Specifications on page 4-38
ARIMA Model Specifications on page 4-44
ARIMAX Model Specifications on page 4-61
Concepts
Autoregressive Model on page 4-18
Moving Average Model on page 4-27
Autoregressive Moving Average Model on page 4-35
ARIMA Model on page 4-42
ARIMAX(p,D,q) Model on page 4-58
ARIMA Model Including Exogenous Covariates on page 4-58
Multiplicative ARIMA Model on page 4-47
Autoregressive Model
In this section...
AR(p) Model on page 4-18
Stationarity of the AR Model on page 4-18
AR(p)Model
Many observed time series exhibit serial autocorrelation; that is, linear association between lagged
observations. This suggests past observations might predict current observations. The autoregressive
(AR) process models
the conditional mean of yt as a function of past observations,yy y .tt tp An AR process that depends on p
past observations is called an AR model of degree p, denoted by AR(p).
The form of the AR(p) model in Econometrics Toolbox is
yc y
11
y
tt ptpt
,
(4-6)
where Ht is an uncorrelated innovation process with mean zero.
In lag operator polynomial notation, Lyitti . Define the degree p AR lag
operator polynomial () (1LL Lp) . You can write the AR(p) model as 1II Ip
()Ly c . (4-7)IHtt
The signs of the coefficients in the AR lag operator polynomial, I() ,are opposite to the right side of
Equation 4-6. When specifying and interpreting AR coefficients in Econometrics Toolbox, use the form
in Equation 4-6.
tt t
where
c
P II
1
is the unconditional mean of the process, and \() is an infinite-degree lag operator polynomial,()LL2 .\\
12
Note The Constant property of an arima model object corresponds to c, and not the unconditional mean .
By Wolds decomposition [1], Equation 4-8 corresponds to a stationary stochastic process provided the
coefficients \i are absolutely summable. This
is the case when the AR polynomial, I() ,is stable, meaning all its roots lie outside the unit circle.
Econometrics Toolbox enforces stability of the AR polynomial. When you specify an AR model using
arima, you get an error if you enter coefficients that do not correspond to a stable polynomial. Similarly,
estimate imposes stationarity constraints during estimation.
References
[1] Wold, H. A Study in the Analysis of Stationary Time Series. Uppsala, Sweden: Almqvist & Wiksell,
1938.
See Also
arima | estimate |
Related Examples
Specify Conditional Mean Models Using arima on page 4-6
AR Model Specifications on page 4-21
Plot Impulse Response Function on page 4-86
Concepts
Conditional Mean Models on page 4-3
Autoregressive Moving Average Model on page 4-35
AR Model Specifications
In this section...
Default AR Model on page 4-21
AR Model with No Constant Term on page 4-22 AR Model with Nonconsecutive Lags on page
4-23 AR Model with Known Parameter Values on page 4-24 AR Model with a t Innovation
Distribution on page 4-25
Default AR Model
This example shows how to use the shorthand arima(p,D,q) syntax to specify the default AR(p) model,
yc y y .
By default, all parameters in the created model object have unknown values, and the innovation
distribution is Gaussian with constant variance.
Specify the default AR(2) model:
model = arima(2,0,0)
model =
ARIMA(2,0,0) Model:
------------------Distribution: Name = 'Gaussian'
P: 2
D: 0
Q: 0
Constant: NaN
AR: {NaN NaN} at Lags [1 2] SAR: {}
MA: {}
SMA: {}
Variance: NaN
The output shows that the created model object, model, has NaNs for all model parameters: the constant
term, the AR coefficients, and the variance. You can modify the created model object using dot notation,
or input it (along with data) to estimate.
tt t t
ARIMA(2,0,0) Model:
------------------Distribution: Name = 'Gaussian'
P: 2
D: 0
Q: 0
Constant: 0
AR: {NaN NaN} at Lags [1 2] SAR: {}
MA: {}
SMA: {}
Variance: NaN
The ARLags name-value argument specifies the lags corresponding to nonzero AR coefficients. The
property Constant in the created model object is equal to 0, as specified. The model object has default
values for all other properties, including NaNs as placeholders for the unknown parameters: the AR
coefficients and scalar variance.
You can modify the created model object using dot notation, or input it (along with data) to estimate.
tt t t
tt t t
where the innovations follow a Students t distribution with unknown degrees of freedom.
model = arima('Constant',0,'ARLags',1:2,'Distribution','t')
model =
ARIMA(2,0,0) Model:
------------------Distribution: Name = 't', DoF = NaN
P: 2
D: 0
Q: 0
Constant: 0
AR: {NaN NaN} at Lags [1 2] SAR: {}
MA: {}
SMA: {}
Variance: NaN
The value of Distribution is a struct array with field Name equal to 't' and field DoF equal to NaN.The
NaN value indicates the degrees of freedom are unknown, and need to be estimated using estimate or
otherwise specified by the user.
Concepts
Autoregressive Model on page 4-18
MA(q)Model
The moving average (MA) model captures serial autocorrelation in a time series yt by expressing the
conditional mean of yt as a function of past innovations,HH H,,,tt tq . An MA model that depends on q
past
12
In lag operator polynomial notation, Lyitti . Define the degree q MA lag operator polynomial () (1LL Lq).
You can write the MA(q) modelas1TT Tq
yLPT H() . tt
References
[1] Wold, H. A Study in the Analysis of Stationary Time Series. Uppsala, Sweden: Almqvist & Wiksell,
1938.
[2] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also
arima | estimate |
Related Examples
Specify Conditional Mean Models Using arima on page 4-6
MA Model Specifications on page 4-29
Plot Impulse Response Function on page 4-86
Concepts
Conditional Mean Models on page 4-3
Autoregressive Moving Average Model on page 4-35
MA Model Specifications
In this section...
Default MA Model on page 4-29
MA Model with No Constant Term on page 4-30 MA Model with Nonconsecutive Lags on page
4-31 MA Model with Known Parameter Values on page 4-32 MA Model with a t Innovation
Distribution on page 4-33
Default MA Model
This example shows how to use the shorthand arima(p,D,q) syntax to specify the default MA(q) model,
yctt t qtq HTH TH .
By default, all parameters in the created model object have unknown values, and the innovation
distribution is Gaussian with constant variance.
Specify the default MA(3) model:
model = arima(0,0,3)
model =
ARIMA(0,0,3) Model:
------------------Distribution: Name = 'Gaussian'
P: 0
D: 0
Q: 3
Constant: NaN
AR: {}
SAR: {}
MA: {NaN NaN NaN} at Lags [1 2 3] SMA: {}
Variance: NaN
The output shows that the created model object, model, has NaNs for all model parameters: the constant
term, the MA coefficients, and the variance. You can modify the created model object using dot
notation, or input it (along with data) to estimate.
P: 0
D: 0
Q: 2
Constant: 0
AR: {}
SAR: {}
MA: {NaN NaN} at Lags [1 2] SMA: {}
Variance: NaN
The MALags name-value argument specifies the lags corresponding to nonzero MA coefficients. The
property Constant in the created model object is equal to 0, as specified. The model object has default
values for all other properties, including NaNs as placeholders for the unknown parameters: the MA
coefficients and scalar variance.
You can modify the created model variable, or input it (along with data) to estimate.
The MA cell array returns four elements. The first and last elements (corresponding to lags 1 and 4)
have value NaN, indicating these coefficients
arenonzeroandneedtobeestimatedorotherwisespecifiedbytheuser. arima
setsthecoefficientsatinterimlagsequaltozerotomaintainconsistency with MATLAB cell array indexing.
where the innovation process follows a Students t distribution with eight degrees of freedom.
tdist = struct('Name','t','DoF',8);
model = arima('Constant',0,'MALags',1:2,'Distribution',tdist)
model =
ARIMA(0,0,2) Model:
------------------Distribution: Name = 't', DoF = 8
P: 0
D: 0
Q: 2
Constant: 0
AR: {}
SAR: {}
MA: {NaN NaN} at Lags [1 2] SMA: {}
Variance: NaN
The value of Distribution is a struct array with field Name equal to 't' and field DoF equal to 8. When
you specify the degrees of freedom, they arent estimated if you input the model to estimate.
Concepts
Moving Average Model on page 4-27
ARMA(p,q)Model
For some observed time series, a very high-order AR or MA model is needed to model the underlying
process well. In this case, a combined autoregressive moving average (ARMA) model can sometimes be
a more parsimonious choice.
An ARMA model expresses the conditional mean of yt as a function of both past observations,yy , and
past innovations,1,, .ttq Thettp HH
number of past observations that yt depends on, p, is the AR degree. The number of past innovations that
yt depends on, q,istheMAdegree. In general, these models are denoted by ARMA(p,q).
yIIHTH TH
1 1, (4-10)
where Ht is an uncorrelated innovation process with mean zero.
In lag operator polynomial notation, Lyitti . Define the degree p AR lag
operator polynomial () (LL Lp). Define the degree q MA lag
II Ip
operator polynomial
() (
LL L
q
q
is the unconditional mean of the process, and \() is a rational, infinite-degree lag operator
polynomial,()LL2 .\\
12
Note The Constant property of an arima model object corresponds to c, and not the unconditional mean .
By Wolds decomposition [1], Equation 4-12 corresponds to a stationary stochastic process provided the
coefficients \i are absolutely summable. This
is the case when the AR polynomial, I() ,is stable, meaning all its roots lie outside the unit circle.
Additionally, the process is causal provided the MA polynomial is invertible, meaning all its roots lie
outside the unit circle.
Econometrics Toolbox enforces stability and invertibility of ARMA processes. When you specify an
ARMA model using arima, you get an error if you enter coefficients that do not correspond to a stable
AR polynomial or invertible MA polynomial. Similarly, estimate imposes stationarity and invertibility
constraints during estimation.
References
[1] Wold, H. A Study in the Analysis of Stationary Time Series. Uppsala, Sweden: Almqvist & Wiksell,
1938.
See Also
arima | estimate |
Related Examples
Specify Conditional Mean Models Using arima on page 4-6
ARMA Model Specifications on page 4-38
Plot Impulse Response Function on page 4-86
Concepts
Conditional Mean Models on page 4-3
Autoregressive Model on page 4-18
Moving Average Model on page 4-27
ARIMA Model on page 4-42
By default, all parameters in the created model object have unknown values, and the innovation
distribution is Gaussian with constant variance.
Specify the default ARMA(1,1) model:
model = arima(1,0,1)
model =
ARIMA(1,0,1) Model:
------------------Distribution: Name = 'Gaussian'
P: 1
D: 0
Q: 1
Constant: NaN
AR: {NaN} at Lags [1] SAR: {}
MA: {NaN} at Lags [1] SMA: {}
Variance: NaN
The output shows that the created model object, model, has NaNs for all model parameters: the constant
term, the AR and MA coefficients, and the variance. You can modify the created model object using dot
notation, or input it (along with data) to estimate.
tt t t t11 2 2 1 1
The ArLags and MaLags name-value arguments specify the lags corresponding to nonzero AR and MA
coefficients, respectively. The property Constant in the created model object is equal to 0, as specified.
The model object has default values for all other properties, including NaNsasplaceholdersforthe
unknown parameters: the AR and MA coefficients, and scalar variance.
You can modify the created model object using dot notation, or input it (along with data) to estimate.
where the innovation distribution is Students twith 8 degrees of freedom, and constant variance 0.15.
tdist = struct('Name','t','DoF',8);
model = arima('Constant',0.3,'AR',0.7,'MA',0.4,... 'Distribution',tdist,'Variance',0.15)
model =
ARIMA(1,0,1) Model:
------------------Distribution: Name = 't', DoF = 8
P: 1
D: 0
Q: 1
Constant: 0.3
AR: {0.7} at Lags [1] SAR: {}
MA: {0.4} at Lags [1] SMA: {}
Variance: 0.15
Because all parameter values are specified, the created model object has no NaNs. The functions
simulate and forecast dont accept input models with NaN values.
See Also
arima | estimate | forecast | simulate | struct
Related Examples
Specify Conditional Mean Models Using arima on page 4-6
Modify Properties of Conditional Mean Model Objects on page 4-65
Specify Conditional Mean Model Innovation Distribution on page 4-72
yc yDtpDtp t
'' '
t
II HTHTH11 qt q
,
(4-13)
where 'Dyt denotes a Dth differenced time series, and Ht is an uncorrelated innovation process with mean
zero.
In lag operator notation, Lyitti . You can write the ARIMA(p,D,q) model as
*
()
Ly
()( )
t LII TH L y cDtt() . (4-14)
Here,*() is an unstable AR operator polynomial with exactly I
D unit roots. You can factor this polynomial as I()( ) , D
where
() (
LL L
p
p
References
[1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd
ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also
arima |
Related Examples
Nonseasonal Differencing on page 2-18
Specify Conditional Mean Models Using arima on page 4-6
ARIMA Model Specifications on page 4-44
Concepts
Trend-Stationary vs. Difference-Stationary Processes on page 2-7
Autoregressive Moving Average Model on page 4-35
Multiplicative ARIMA Model on page 4-47
P: 2
D: 1
Q: 1
Constant: NaN
AR: {NaN} at Lags [1] SAR: {}
MA: {NaN} at Lags [1] SMA: {}
Variance: NaN
The output shows that the created model object, model, has NaNs for all model parameters: the constant
term, the AR and MA coefficients, and the variance. You can modify the created model object using dot
notation, or input it (along with data) to estimate.
The property P has value 2 (p+D). This is the number of presample observations needed to initialize the
AR model.
05. ,'''
12 1
where the innovation distribution is Students t with 10 degrees of freedom, and constant variance 0.15.
tdist = struct('Name','t','DoF',10);
model = arima('Constant',0.4,'AR',{0.8,-0.3},'MA',0.5,... 'D',1,'Distribution',tdist,'Variance',0.15)
model =
ARIMA(2,1,1) Model:
------------------Distribution: Name = 't', DoF = 10
P: 3
D: 1
Q: 1
Constant: 0.4
AR: {0.8 -0.3} at Lags [1 2] SAR: {}
MA: {0.5} at Lags [1] SMA: {}
Variance: 0.15
The name-value argument D specifies the degree of nonseasonal integration (D).
Because all parameter values are specified, the created model object has no NaNs. The functions
simulate and forecast dont accept input models with NaN values.
See Also
arima | estimate | forecast | simulate |
Related Examples
Specify Conditional Mean Models Using arima on page 4-6
Modify Properties of Conditional Mean Model Objects on page 4-65
Specify Conditional Mean Model Innovation Distribution on page 4-72
Concepts
ARIMA Model on page 4-42
References
[1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd
ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also
arima |
Related Examples
Nonseasonal and Seasonal Differencing on page 2-22
Multiplicative ARIMA Model Specifications on page 4-49
Specify Conditional Mean Models Using arima on page 4-6
Specify Multiplicative ARIMA Model on page 4-53
Model Seasonal Lag Effects Using Indicator Variables on page 4-113
Concepts
Autoregressive Moving Average Model on page 4-35
ARIMA Model on page 4-42
)4ITH
tt
where the innovation distribution is Gaussian with constant variance. Here, ()1 is the first degree
nonseasonal differencing operator and() L12 is the first degree seasonal differencing operator with
periodicity 12.
model = arima('Constant',0,'ARLags',1,'SARLags',12,'D',1,...
'Seasonality',12,'MALags',1,'SMALags',12)
model =
ARIMA(1,1,1) Model Seasonally Integrated with Seasonal AR(12) and MA(12):
-------------------------------------------------------------------------Distribution: Name = 'Gaussian'
P: 26
D: 1
Q: 13
Constant: 0
AR: {NaN} at Lags [1]
SAR: {NaN} at Lags [12]
MA: {NaN} at Lags [1]
SMA: {NaN} at Lags [12]
Seasonality: 12
Variance: NaN
44
LL L Ly L LttH
where the innovation distribution is Gaussian with constant variance 0.15.
Here,
()1 is the nonseasonal differencing operator and() L4 is the first degree seasonal differencing operator
with periodicity 4.
model = arima('Constant',0,'AR',{0.5},'SAR',-0.7,'SARLags',...
4,'D',1,'Seasonality',4,'MA',0.3,'SMA',-0.2,... 'SMALags',4,'Variance',0.15)
model =
ARIMA(1,1,1) Model Seasonally Integrated with Seasonal AR(4) and MA(4):
-------------------------------------------------------------------------Distribution: Name = 'Gaussian'
P: 10
D: 1
Q: 5
Constant: 0
AR: {0.5} at Lags [1]
SAR: {-0.7} at Lags [4]
MA: {0.3} at Lags [1]
SMA: {-0.2} at Lags [4]
Seasonality: 4
Variance: 0.15
The output specifies the nonseasonal and seasonal AR coefficients with opposite signs compared to the
lag polynomials. This is consistent with the difference equation form of the model. The output specifies
the lags of the seasonal AR and MA coefficients using SARLags and SMALags, respectively. D
specifies the degree of nonseasonal integration. Seasonality = 4 specifies quarterly data with one degree
of seasonal integration.
All of the parameters in the model have a value. Therefore, the model does not contain any NaNs. The
functions simulate and forecastdo not accept input models with NaN values.
Concepts
Specify Conditional Mean Models Using arima on page 4-6
Multiplicative ARIMA Model on page 4-47
figure(3)
autocorr(dY,50)
The sample ACF of the differenced series shows significant autocorrelation at lags that are multiples of
12. There is also potentially significant autocorrelation at smaller lags.
For this data, Box, Jenkins, and Reinsel [1] suggest the multiplicative seasonal model,
()( ) ( )( ).LLy12 112THtt
Specify this model.
model = arima('Constant',0,'D',1,'Seasonality',12,... 'MALags',1,'SMALags',12)
model =
ARIMA(0,1,1) Model Seasonally Integrated with ... Seasonal MA(12):
--------------------------------------------- ...
----------------Distribution: Name = 'Gaussian'
P: 13
D: 1
Q: 13
Constant: 0
AR: {}
SAR: {}
MA: {NaN} at Lags [1]
SMA: {NaN} at Lags [12]
Seasonality: 12
Variance: NaN
The property P is equal to 13, corresponding to the sum of the nonseasonal and seasonal differencing
degrees (1 + 12). The property Q is also equal to 13, corresponding to the sum of the degrees of the
nonseasonal and seasonal MA polynomials (1 + 12). Parameters that need to be estimated have value
NaN.
References
[1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd
ed. Englewood Cliffs, NJ: Prentice-Hall, 1994.
See Also
arima | autocorrfilter | LagOp |
Related Examples
Estimate Multiplicative ARIMA Model on page 4-109
Simulate Multiplicative ARIMA Models on page 4-158
Forecast Multiplicative ARIMA Model on page 4-177
Check Fit of Multiplicative ARIMA Model on page 3-80
Model Seasonal Lag Effects Using Indicator Variables on page 4-113
Concepts
Multiplicative ARIMA Model on page 4-47
ARIMAX(p,D,q)Model
The autoregressive moving average model including exogenous covariates, ARMAX(p,q), extends the
ARMA(p,q) model by including the linear effect that one or more exogenous series has on the stationary
response series yt.The general form of the ARMAX(p,q) model is
prq
yy x IEHTHj tj,
(4-16)
and it has the following condensed form in lag operator notation:
()Ly c xc () .
i
11 1
IETH
(4-17)
tt t
In Equation 4-17, the vector xt holds the values of the r exogenous, time-varying predictors at time t,
with coefficients denoted .
You can use this model to check if a set of exogenous variables has an effect on a linear time series. For
example, suppose you want to measure how the previous weeks average price of oil, xt, affects this
weeks United States exchange rate yt. The exchange rate and the price of oil are time series, so an
ARMAX model can be appropriate to study their relationships.
coefficients that do not correspond to a stable polynomial. Similarly, estimate imposes stationarity
constraints during estimation.
The software differences the response series ytbefore including the exogenous covariates if you specify
the degree of integration D.Inother words, the exogenous covariates enter a model with a stationary
response. Therefore, the ARIMAX(p,D,q) model is
()
Ly c x
IETH() , (4-18)
tt t
where c* = c/(1 L)D and *(L) = (L)/(1 L)D. Subsequently, the interpretation of has changed to the
expected effect a unit increase in the predictor has on the difference between current and lagged values
of the response (conditional on those lagged values).
You should assess whether the predictor series xt are stationary. Difference all predictor series that are
not stationary with diff during the data preprocessing stage. If xt is nonstationary, then a test for the
significance of can produce a false negative. The practical interpretation of changes if you difference the
predictor series.
The software uses maximum likelihood estimation for conditional mean models such as ARIMAX
models. You can specify either a Gaussian or Students t for the distribution of the innovations.
You can include seasonal components in an ARIMAX model (see Multiplicative ARIMA Model on
page 4-47) which creates a SARIMAX(p,D,q)(ps,Ds,qs)s model. Assuming that the response series yt is
stationary, the model has the form
() ()LLy cx L Lc () () , )4
where (L) and (L) are the seasonal lag polynomials. If yt is not stationary, then you can specify degrees
of nonseasonal or seasonal integration using arima.Ifyouspecify Seasonality 0, then the software
applies degree one seasonal differencing (Ds = 1) to the response. Otherwise, Ds = 0. The software
includes the exogenous covariates after it differences the response.
The software treats the exogenous covariates as fixed during estimation
and inference.
References
[1] Wold, H. A Study in the Analysis of Stationary Time Series. Uppsala, Sweden: Almqvist & Wiksell,
1938.
See Also
arima |
Related Examples
Specify ARMAX Model Using Dot Notation on page 4-62
Specify ARIMAX Model Using Name-Value Pairs on page 4-61
Concepts
Autoregressive Moving Average Model on page 4-35
Specify Conditional Mean Models Using arima on page 4-6
3
.~(,.)
ttt12 1HHH..;.
Modify model:
model.Beta=[3 -2]; model.Variance=0.1
model =
ARIMAX(2,0,1) Model:
-------------------Distribution: Name = 'Gaussian'
P: 2
D: 0
Q: 1
Constant: 6
AR: {0.2 -0.3} at Lags [1 2] SAR: {}
MA: {0.1} at Lags [1] SMA: {}
Beta: [3 -2]
Variance: 0.1
For more details on how to modify properties of an arima model using dot notation, see Modify
Properties of Conditional Mean Model Objects on page 4-65.
See Also
arima | estimate | forecast | simulate | struct
Related Examples
Specify Conditional Mean and Variance Model on page 4-78
Specify Nonseasonal Models Using Name-Value Pairs on page 4-8
Specify Multiplicative ARIMA Model on page 4-53
Dot Notation
A model object created by arima has values assigned to all model object properties. To change any of
these property values, you do not need to reconstruct the whole model. You can modify property values
of an existing model object using dot notation. That is, type the object name, then the property name,
separated by '.' (a period).
P: 2
D: 0
Q: 0
Constant: NaN
AR: {NaN NaN} at Lags [1 2] SAR: {}
MA: {}
SMA: {}
Variance: NaN
To modify the property value of AR,assign AR a cell array. Here, assign known AR coefficient values:
model.AR = {0.8,-0.4}
model =
ARIMA(2,0,0) Model:
------------------Distribution: Name = 'Gaussian'
P: 2
D: 0
Q: 0
Constant: NaN
AR: {0.8 -0.4} at Lags [1 2] SAR: {}
MA: {}
SMA: {}
Variance: NaN
The updated model object now has AR coefficients with the specified equality constraints.
Similarly, the data type of Distribution is a data structure. The default data structure has only one field,
Name,withvalue 'Gaussian'.
model.Distribution
ans =
Name: 'Gaussian
To modify the innovation distribution, assign Distributionanewname or data structure. The data structure
can have up to two fields, Name and DoF. The second field corresponds to the degrees of freedom for a
Students t distribution, and is only required if Name has the value 't'.
To specify a Students t distribution with unknown degrees of freedom, enter:
model.Distribution = 't'
model =
ARIMA(2,0,0) Model:
------------------Distribution: Name = 't', DoF = NaN
P: 2
D: 0
Q: 0
Constant: NaN
AR: {0.8 -0.4} at Lags [1 2] SAR: {}
MA: {}
SMA: {}
Variance: NaN
The updated model object has a Students t distribution with NaN degrees of freedom. To specify a t
distribution with eight degrees of freedom, say:
model.Distribution = struct('Name','t','DoF',8)
model =
ARIMA(2,0,0) Model:
------------------Distribution: Name = 't', DoF = 8
P: 2
D: 0
Q: 0
Constant: NaN
AR: {0.8 -0.4} at Lags [1 2] SAR: {}
MA: {}
SMA: {}
Variance: NaN
The degrees of freedom in the model object are updated. Note that the DoF field of Distribution is not
directly assignable. For example, model.Distribution.DoF = 8 is not a valid assignment. However, you
can get the individual fields:
model.Distribution.DoF
ans = 8
You can modify model to include, for example, two coefficients 1=0.2and 2 = 4 corresponding to two
exogenous covariate time series. Since Beta has not been specified yet, you have not seen it in the
output. To include it, enter:
model.Beta=[0.2 4]
model =
ARIMAX(2,0,0) Model:
------------------Distribution: Name = 't', DoF = 8
P: 2
D: 0
Q: 0
Constant: NaN
AR: {0.8 -0.4} at Lags [1 2] SAR: {}
MA: {}
SMA: {}
Beta: [0.2 4]
Variance: NaN
Nonmodifiable Properties
Not all model properties are modifiable. You cannot change these properties in an existing model:
P. This property updates automatically when any of p (degree of the nonseasonal AR operator), ps
(degree of the seasonal AR operator), D (degree of nonseasonal differencing), or s (degree of seasonal
differencing) changes.
Q. This property updates automatically when either q (degree of the nonseasonal MA operator), or qs
(degree of the seasonal MA operator) changes.
Not all name-value arguments you can use for model creation are properties of the created model object.
Specifically, you can specify the arguments ARLags, MALags, SARLags,and SMALags during model
creation. These are not, however, properties of arima model objects. This means you cannot retrieve or
modify them in an existing model object.
The nonseasonal and seasonal AR and MA lags update automatically if you add any elements to (or
remove from) the coefficient cell arrays AR, MA, SAR,or SMA.
For example, specify an AR(2) model:
model = arima(2,0,0)
model =
ARIMA(2,0,0) Model:
------------------Distribution: Name = 'Gaussian'
P: 2
D: 0
Q: 0
Constant: NaN
AR: {NaN NaN} at Lags [1 2] SAR: {}
MA: {}
SMA: {}
Variance: NaN
The model output shows nonzero AR coefficients at lags 1 and 2.
Add a new AR term at lag 12:
model.AR{12} = NaN
model =
ARIMA(12,0,0) Model:
------------------Distribution: Name = 'Gaussian'
P: 12
D: 0
Q: 0
Constant: NaN
AR: {NaN NaN NaN} at Lags [1 2 12] SAR: {}
MA: {}
SMA: {}
Variance: NaN
The three nonzero coefficients at lags 1, 2, and 12 now display in the model output. However, the cell
array assigned to AR returns twelve elements:
model.AR
ans =
[NaN] [NaN] [0] [0] [0] [0] [0] [0] [0] [0] [0] [NaN]
AR has zero coefficients at all the interim lags to maintain consistency with traditional MATLAB cell
array indexing.
See Also
arima | struct
Related Examples
Specify Conditional Mean Models Using arima on page 4-6
Specify Conditional Mean Model Innovation Distribution on page 4-72
Concepts
Conditional Mean Models on page 4-3
ytt itiPH \H .
i1
If
22
t
VVH for all times t,then Ht is an independent process with constant variance,2. VH
The default value for Variance is NaN, meaning constant variance with unknown value. You can
alternatively assign Variance any positive scalar value, or estimate it using estimate.
A time series can exhibit volatility clustering, meaning a tendency for large changes to follow large
changes, and small changes to follow small changes. You can model this behavior with a conditional
variance modela dynamic
model describing the evolution of the process variance,2, conditional Vton past innovations and variances.
Set Variance equal to one of the three conditional variance model objects available in Econometrics
Toolbox (garch, egarch,or gjr). This creates a composite conditional mean and variance model variable.
zTQ,
Q
where TQ follows a Students t distribution with > 2 degrees of freedom.
The t distribution is useful for modeling time series with more extreme values than expected under a
Gaussian distribution. Series with larger values than expected under normality are said to have excess
kurtosis.
Tip It is good practice to assess the distributional properties of model residuals to determine if a
Gaussian innovation distribution (the default distribution) is appropriate for your data.
ARIMA(0,0,2) Model:
------------------Distribution: Name = 't', DoF = NaN
P: 0
D: 0
Q: 2
Constant: NaN
AR: {}
SAR: {}
MA: {NaN NaN} at Lags [1 2] SMA: {}
Variance: NaN
The output shows that Distribution is a data structure with two fields. Field Name has the value
't',andfield DoF has the value NaN.
If the degrees of freedom are known, and you want to set an equality constraint, assign a struct array to
Distribution with fields Name and DoF. In this case, if the model object is input to estimate, the degrees
of freedom wont be estimated (the equality constraint is upheld).
SpecifyanMA(2)modelwithaniidStudents t innovation process with eight degrees of freedom:
model = arima('MALags',1:2,'Distribution',struct('Name','t','DoF',8))
model =
ARIMA(0,0,2) Model:
------------------Distribution: Name = 't', DoF = 8 P: 0
D: 0
Q: 2
Constant: NaN
AR: {}
SAR: {}
MA: {NaN NaN} at Lags [1 2]
SMA: {}
Variance: NaN
The output shows the specified innovation distribution.
StartwithanMA(2)model:
model = arima(0,0,2);
To change the distribution of the innovation process in an existing model object to a Students t
distribution with unknown degrees of freedom, type:
model.Distribution = 't'
To change the distribution to a t distribution with known degrees of freedom, use a data structure:
model.Distribution = struct('Name','t','DoF',8)
You can get the individual Distribution fields:
model.Distribution.DoF
ans =
8
To change the innovation distribution from a Students t back to a Gaussian distribution, type:
model.Distribution = 'Gaussian'
model =
ARIMA(0,0,2) Model:
------------------Distribution: Name = 'Gaussian'
P: 0
D: 0
Q: 2
Constant: NaN
AR: {}
SAR: {}
MA: {NaN NaN} at Lags [1 2] SMA: {}
Variance: NaN
The Name field is updated to 'Gaussian',andthereisnolongera DoF field.
References
[1] Wold, H. A Study in the Analysis of Stationary Time Series. Uppsala, Sweden: Almqvist and Wiksell,
1938.
See Also
arima | egarch | garch | gjr | struct
Related Examples
Specify Conditional Mean Models Using arima on page 4-6
Modify Properties of Conditional Mean Model Objects on page 4-65
Specify Conditional Mean and Variance Model on page 4-78
Concepts
Conditional Mean Models on page 4-3
Load the NASDAQ data included with the toolbox. Convert the daily close composite index series to a
return series.
load Data_EquityIdx
nasdaq = Dataset.NASDAQ;
r = price2ret(nasdaq);
N = length(r);
figure(1)
plot(r)
xlim([0 N])
title('NASDAQ Daily Returns')
The returns appear to exhibit some volatility clustering.
Step 2. Check the series for autocorrelation.
Plot the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) for the
return series.
figure(2)
subplot(2,1,1) autocorr(r)
subplot(2,1,2) parcorr(r)
The autocorrelation functions suggests there is significant autocorrelation at lag one.
Conduct a Ljung-Box Q-test at lag 5.
[h,p] = lbqtest(r,'Lags',5)
h=
1
p=
0.0120
The null hypothesis that all autocorrelations are 0 up to lag 5 is rejected (h= 1).
Step 3. Check the series for conditional heteroscedasticity.
Plot the sample ACF and PACF of the squared return series.
figure(3)
subplot(2,1,1) autocorr(r.^2) subplot(2,1,2) parcorr(r.^2)
The
autocorrelation functions show significant serial dependence.
Conduct an Engles ARCH test. Test the null hypothesis of no conditional heteroscedasticity against the
alternative hypothesis of an ARCH model with two lags (which is locally equivalent to a GARCH(1,1)
model).
[h,p] = archtest(r-mean(r),'lags',2)
h=
1
p=
0
The null hypothesis is rejected in favor of the alternative hypothesis (h= 1).
Step 4. Specify a conditional mean and variance model.
Specify an AR(1) model for the conditional mean of the NASDAQ returns, and a GARCH(1,1) model
for the conditional variance. This is a model of the form
rc r IH,ttt11
whereHVz ,ttt
2 112 11 ,VNJV DH
tt t
ARIMA(1,0,0) Model:
------------------Distribution: Name = 'Gaussian'
P: 1
D: 0
Q: 0
Constant: NaN
AR: {NaN} at Lags [1] SAR: {}
MA: {}
SMA: {}
Variance: [GARCH(1,1) Model]
The model output shows that a garch model object is stored in the Variance property of the arima object,
model.
See Also
archtestarima | | autocorrgarch | | lbqtest | parcorr
Related Examples
Estimate Conditional Mean and Variance Models on page 4-124
Simulate Conditional Mean and Variance Models on page 4-162
Forecast Conditional Mean and Variance Model on page 4-181
Concepts
Multiplicative ARIMA Model on page 4-47
is nonstationary. In this
References
[1] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
[2] Wold, H. A Study in the Analysis of Stationary Time Series. Uppsala, Sweden: Almqvist & Wiksell,
1938.
Related Examples
Plot Impulse Response Function on page 4-86
Concepts
Conditional Mean Models on page 4-3
impulse(modelMA)
For an MA
model, the impulse response function cuts off after q periods. For this example, the last nonzero
coefficient is at lag q=3.
Autoregressive Model
This example shows how to compute and plot the impulse response function for an autoregressive (AR)
model. The AR(p) model is given by
1
yL
() ,
PI H
where I() is a p-degree AR operator polynomial,()1LLIIp p .
An AR process is stationary provided that the AR operator polynomial is stable, meaning all its roots lie
outside the unit circle. In this case, the infinite-degree inverse polynomial,\I() () 1 , has absolutely
The impulse
function decays in a sinusoidal pattern.
ARMA Model
This example shows how to plot the impulse response function for an autoregressive moving average
(ARMA) model. The ARMA(p,q) model is given by
y
PT
() ttH, I()
where T() is a q-degree MA operator polynomial,()1LLTTq q ,and
I
()
is a
p
-degree AR operator polynomial,
()
p
1IIpLL .
An ARMA process is stationary provided that the AR operator polynomial is stable, meaning all its
roots lie outside the unit circle. In this case, the infinite-degree inverse polynomial,\TI() () () , has
absolutely summable coefficients, and the impulse response function decays to zero.
Step 1. Specify an ARMA model.
modelARMA = arima('AR',{0.6,-0.3},'MA',0.4);
Step 2. Plot the impulse response function.
The time series is the log quarterly Australian Consumer Price Index (CPI) measured from 1972 to
1991.
Step 1. Load the data.
The series is nonstationary, with a clear upward trend. This suggests differencing the data before using a
stationary model (as suggested by the Box-Jenkins methodology), or fitting a nonstationary ARIMA
model directly.
Step 2. Estimate an ARIMA model.
ARIMA(2,1,0) Model:
------------------Conditional Probability Distribution: Gaussian
Standard t
Parameter Value Error Statistic
----------------------------------------Constant 0.0100723 0.00328015 3.07069
AR{1} 0.212059 0.0954278 2.22219
AR{2} 0.337282 0.103781 3.24994
Variance 9.23017e-05 1.11119e-05 8.30659
LL LyttH
withtheoppositesignontheARcoefficients.
Step 3. Difference the data before estimating.
Take the first difference of the data. Estimate an AR(2) model using the differenced data.
dY = diff(Y);
modAR = arima(2,0,0);
fitAR = estimate(modAR,dY);
ARIMA(2,0,0) Model:
------------------Conditional Probability Distribution: Gaussian
Standard t Parameter Value Error Statistic
-----------------------------------------Constant 0.0104289 0.00380427 2.74137
AR{1} 0.201194 0.101463 1.98293
AR{2} 0.32299 0.118035 2.7364
Variance 9.42421e-05 1.16259e-05 8.10622
The parameter point estimates are very similar to those in Step 2. The standard errors, however, are
larger when the data is differenced before estimation.
Forecasts made using the fitted AR model will be on the differenced scale. Forecasts made using the
ARIMA model in Step 2 will be on the same scale as the original data.
References
[1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd
ed. Englewood Cliffs, NJ: Prentice-Hall, 1994.
See Also
arima | estimate |
Related Examples
Box-Jenkins Model Selection on page 3-5
Infer Residuals for Diagnostic Checking on page 4-133
Concepts
Box-Jenkins Methodology on page 3-3
ARIMA Model on page 4-42
where zt can be standardized Gaussian or Students t with Q! 2 degrees of freedom. Specify your
distribution choice in the arima model object Distribution property.
ttt
characterized by a conditional variance model. Specify the form of the conditional variance using the
Variance property. If you specify a conditional variance model, the parameters of that model are
estimated with the conditional mean model parameters simultaneously.
Given a stationary model,
yL
P\ H
() ,
tt
tt
Loglikelihood Functions
Given the history of a process, innovations are conditionally independent. Let Ht denote the history of a
process available at time t, t = 1,...,N. The likelihood function for theinnovationseriesisgivenby
N
fHfH(, , , | )NN1 ( | ),
HH H
12 tt1
t1
N2 2 H21 N
SV
log
t 2.
t V 2 2t 1 t 1 t
If zt has a standardized Students t distribution with Q! 2 degrees of freedom, then the loglikelihood
function is
*Q1 N 2 1 N H2
1 LLF Nlog 2 1 t 2
2 t 1 log () .
t 1 tVQ2
SQ()*
estimate performs covariance matrix estimation for maximum likelihood estimates using the outer
product of gradients (OPG) method.
See Also
arima | estimate |
Related Examples
Estimate Multiplicative ARIMA Model on page 4-109
Estimate Conditional Mean and Variance Models on page 4-124
Concepts
Conditional Mean Model Estimation with Equality Constraints on page 4-98
Presample Data for Conditional Mean Model Estimation on page 4-100
Initial Values for Conditional Mean Model Estimation on page 4-103
Optimization Settings for Conditional Mean Model Estimation on page 4-106
Maximum Likelihood Estimation for Conditional Variance Models on page 5-66
Constant
Nonzero AR coefficients at positive lags (AR)
Nonzero seasonal AR coefficients at positive lags (SAR)
Nonzero MA coefficients at positive lags (MA)
Nonzero seasonal MA coefficients at positive lags (SMA)
Regression coefficients (when you specify X)
Variance parameters (scalar for constant-variance models, vector of additional parameters otherwise)
Degrees of freedom (t innovation distribution only)
If any parameter known to the optimizer has an equality constraint, then the corresponding row and
column of the variance-covariance matrix has all 0s.
In addition to user-specified equality constraints, estimate sets any AR or MA coefficient with an
estimate less than 1e-12 in magnitude equal to 0.
t1
The number of past responses and innovations that a current innovation depends on is determined by the
degree of the AR or MA operators, and any differencing. For example, in an AR(2) model, each
innovation depends on the two previous responses,
HII .tt t t11 2 2
In ARIMAX models, the current innovation also depends on the current value of the exogenous
covariate (unlike distributed lag models). For example, in an ARX(2) model with one exogenous
covariate, each innovation depends on the previous two responses and the current value of the covariate,
HII .
tt t t t11 2 2
In general, the likelihood contribution of the first few innovations is conditional on historical
information that might not be observable. How do you estimate the parameters without all the data? In
the ARX(2) example, H2 explicitly depends on y1, y0, and x2, and H1 explicitly depends on y0, y1, and
x1. Implicitly, H2 depends on x1 and x0, and H depends on x0 and x1. However, you cannot observe y0,
y1, x0, and x1.
The amount of presample data that you need to initialize a model depends onthedegreeofthemodel.
Theproperty P of an arima model specifies the number of presample responses and exogenous data that
you need to initialize the AR portion of a conditional mean model. For example,P= 2 in an ARX(2)
model. Therefore, you need two responses and two data points from each exogenous covariate series to
initialize the model.
Oneoptionistousethefirst P data from the response and exogenous covariate series as your presample,
and then fit your model to the remaining data. This results in some loss of sample size. If you plan to
compare multiple potential models, be aware that you can only use likelihood-based measures of fit
(including the likelihood ratio test and information criteria) to compare models fit to the same data (of
the same sample size). If you specify your own presample data, then you must use the largest required
number of presample responses across all models that you want to compare.
The property Q of an arima model specifies the number of presample innovations needed to initialize the
MA portion of a conditional mean model. You can get presample innovations by dividing your data into
two parts. Fit a model to the first part, and infer the innovations. Then, use the inferred innovations as
presample innovations for estimating the second part of the data.
For a model with both an autoregressive and moving average component, you can specify both
presample responses and innovations, one or the other, or neither.
By default, estimate generates automatic presample response and innovation data. The software:
Generates presample responses by backward forecasting.
Sets presample innovations to zero.
Does not generate presample exogenous data. One option is to backward forecast each exogenous
series to generate a presample during data preprocessing.
See Also
arima | estimate |
Related Examples
Estimate Multiplicative ARIMA Model on page 4-109
Concepts
Maximum Likelihood Estimation for Conditional Mean Models on page 4-95
Conditional Mean Model Estimation with Equality Constraints on page 4-98
Initial Values for Conditional Mean Model Estimation on page 4-103
Optimization Settings for Conditional Mean Model Estimation on page 4-106
OLS
MA Constant
Terms
RegressionNot in coefficientsModel
OLS constant OLS constant
OLS N/A
Constant variance Population variance of OLS residuals Population variance of OLS residuals
AR
coefficients OLS
Constant OLS constant
Solve Yule-Walker equations, as described in Box, Jenkins, and Reinsel [1].
Mean of AR-filtered series (using initial AR coefficients)
Regression Terms in coefficients Model
Constant variance Population variance of OLS residuals OLS N/A
MA
References
[1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd
ed. Englewood Cliffs, NJ: Prentice-Hall, 1994.
See Also
arima | estimate | fmincon
Concepts
Maximum Likelihood Estimation for Conditional Mean Models on page 4-95
Conditional Mean Model Estimation with Equality Constraints on page 4-98
Presample Data for Conditional Mean Model Estimation on page 4-100
Optimization Settings for Conditional Mean Model Estimation on page 4-106
Initial Values for Conditional Variance Model Estimation on page 5-72
fmincon.Algorithm value, then estimate displays warning. During estimation, fmincon temporarily sets
Algorithm to active-set by default to satisfy the constraints required by estimate.
estimate sets a constraint level of TolCon so constraints are not violated. Be aware that an estimate with
an active constraint has unreliable standard errors since variance-covariance estimation assumes the
likelihood function is locally quadratic around the maximum likelihood estimate.
Use the first 13 observations as presample data, and the remaining 131 observations for estimation.
Y0 = Y(1:13);
[fit,VarCov] = estimate(model,Y(14:end),'Y0',Y0)
ARIMA(0,1,1) Model Seasonally Integrated with Seasonal ... MA(12):
------------------------------------------------------ ...
-------Conditional Probability Distribution: Gaussian
Standard t Parameter Value Error Statistic
-----------------------------------------Constant 0 Fixed Fixed MA{1} -0.377161 0.0734258 -5.13662
SMA{12} -0.572379 0.0939327 -6.0935 Variance 0.00138874 0.000152417 9.1115
fit =
ARIMA(0,1,1) Model Seasonally Integrated with Seasonal ... MA(12):
------------------------------------------------------ ...
-------Distribution: Name = 'Gaussian'
P: 13
D: 1
Q: 13
Constant: 0
AR: {}
SAR: {}
MA: {-0.377161} at Lags [1]
SMA: {-0.572379} at Lags [12]
Seasonality: 12
Variance: 0.00138874
VarCov =
0000
0 0.0054 -0.0015 -0.0000
0 -0.0015 0.0088 0.0000
0 -0.0000 0.0000 0.0000
The fitted model is
''
12
yL L12
H
tt
References
[1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd
ed. Englewood Cliffs, NJ: Prentice-Hall, 1994.
See Also
arima | estimate | infer |
Related Examples
Specify Multiplicative ARIMA Model on page 4-53
Simulate Multiplicative ARIMA Models on page 4-158
Concepts
Conditional Mean Model Estimation with Equality Constraints on page 4-98
Presample Data for Conditional Mean Model Estimation on page 4-100
Navigate to the folder containing sample data, and load the data set Data_Airline.
cd(matlabroot)
cd('help/toolbox/econ/examples')
load Data_Airline
dat = log(Data); % Transform to logarithmic scale T = size(dat,1);
y = dat(1:103); % estimation sample
y is the part of dat used for estimation, and the rest of dat is the holdout sample to compare the two
models forecasts. For details about model selection and specification for this data, see Specify
Multiplicative ARIMA Model on page 4-53.
Step 2. Define and fit the model specifying seasonal lags.
,THtt
where t is an independent and identically distributed normally distributed series with mean 0 and
variance2.Use estimate to fit model1 to y.
model1 = arima('MALags', 1, 'D', 1, 'SMALags', 12,... 'Seasonality',12, 'Constant', 0);
fit1 = estimate(model1,y);
ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12):
--------------------------------------------------------------Conditional Probability Distribution: Gaussian
Standard t Parameter Value Error Statistic
-----------------------------------------Constant 0 Fixed Fixed
Create an ARIMAX(0,1,1) model with period 12 seasonal differencing and a regression component,
()( )12 cLLyx L1 t.
ETH tt
{xt; t = 1,...,T} is a series of T column vectors having length 12 that indicate in which month the tth
observation was measured. A 1 in the ith row of xtindicates that the observation was measured in the ith month, the
rest of
()( )LLyx
tt
c(. ) ,
EHt
where t is an iid normally distributed series with mean 0 and variance 0.0017 andE is a column vector
with the values Beta1 Beta12. Note that the estimates MA{1} and Variance between model1 and
model2 are not equal.
Step 4. Forecast using both models.
Use forecast to forecast both models 41 periods into the future from July 1957. Plot the holdout sample
using these forecasts.
yF1 = forecast(fit1,41,'Y0',y);
yF2 = forecast(fit2,41,'Y0',y,'X0',X(1:103,:),...
'XF',X(104:end,:));
l1 = plot(100:T,dat(100:end),'k','LineWidth',3); hold on
l2 = plot(104:144,yF1,'-r','LineWidth',2); l3 = plot(104:144,yF2,'-b','LineWidth',2); hold off
title('Passenger Data: Actual vs. Forecasts') xlabel('Month')
ylabel('Logarithm of Monthly Passenger Data') legend({'Actual Data','Polynomial Forecast',...
'Regression Forecast'},'Location','NorthWest')
Though they overpredict the holdout observations, the forecasts of both models are almost equivalent.
One main difference between the models is that model1 is more parsimonious than model2.
References
[1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd
ed. Englewood Cliffs, NJ: Prentice-Hall, 1994.
See Also
arima | estimate | forecast | dummyvar
Related Examples
Specify Multiplicative ARIMA Model on page 4-53
Estimate Multiplicative ARIMA Model on page 4-109
Forecast Multiplicative ARIMA Model on page 4-177
Check Fit of Multiplicative ARIMA Model on page 3-80
Forecast IGD Rate Using ARIMAX Model on page 4-118
Concepts
Multiplicative ARIMA Model on page 4-47
ARIMA Model Including Exogenous Covariates on page 4-58
Conditional Mean Model Estimation with Equality Constraints on page 4-98
MMSE Forecasting of Conditional Mean Models on page 4-167
Load the Credit Defaults data set, assign the response IGD to Y and the predictors AGE, CPF, and SPR
to the matrix X, and obtain the sample size T. To avoid distraction from the purpose of this example,
assume that all exogenous series are stationary.
load Data_CreditDefaults
X = Data(:,[1 3:4]);
T = size(X,1);
Y = Data(:,5);
Step 2. Process response and exogenous data.
Divide the response and exogenous data into estimation and holdout series. Assume that each exogenous
covariate series is AR(1), and fit each one to that model. Forecast exogenous data at a 10-year horizon.
Yest = Y(2:(T-10)) % response data for estimation; Xest = X(1:(T-10),:) % exogenous data for
estimation;
modelX = arima(1,0,0); % model for the exogenous covariates X1fit =
estimate(modelX,Xest(:,1),'print',false); X2fit = estimate(modelX,Xest(:,2),'print',false); X3fit =
estimate(modelX,Xest(:,3),'print',false);
X1fore = forecast(X1fit,10,'Y0',Xest(:,1)); X2fore = forecast(X2fit,10,'Y0',Xest(:,2)); X3fore =
forecast(X3fit,10,'Y0',Xest(:,3)); XF = [X1fore X2fore X3fore];
XF holds the forecasts for the three exogenous covariate series.
Step 3. Estimate response model and infer residuals.
Assume that the response series is ARX(1), and fit it to that model including the exogenous covariate
series. Infer the residuals Eest from the fitted response model Yfit.
modelY = arima(1,0,0);
Yfit = estimate(modelY,Yest,'X',Xest,... 'print',false,'Y0',Y(1));
Eest = infer(Yfit,Yest,'Y0',Y(1),'X',Xest);
Step 5. MMSE forecast responses.
Forecast responses using the MMSE method at a 10-year horizon. Calculate prediction intervals for the
forecasts assuming that they are normally distributed.
[Yfore,YMSE] = forecast(Yfit,10,'Y0',Y(1:(T-10)),... 'X0',Xest,'XF',XF);
cil = Yfore - 1.96*sqrt(YMSE);
ciu = Yfore + 1.96*sqrt(YMSE);
Step 6. Plot MMSE forecasted responses.
Plot the response series using their MMSE forecasts and prediction intervals.
figure(1)
l1 = plot(dates,Y,'ko-','LineWidth',2);
xlabel('Year')
ylabel('IGD (%)')
hold on
l2 = plot(dates((T-9):T),cil,'r:','LineWidth',2); plot(dates((T-9):T),ciu,'r:','LineWidth',2) l3 =
plot(dates((T-9):T),Yfore,'k:','LineWidth',2);
plot(dates([T-10 T-9]),[Y(T-10) Yfore(1)],'k:') plot(dates([T-10 T-9]),[Y(T-10) cil(1)],'r:')
plot(dates([T-10 T-9]),[Y(T-10) ciu(1)],'r:')
legend([l1 l2 l3],'Observed Time Series','95% Interval',... 'Forecast','Location','NorthWest')
title('Default Rate of Investment Grade Corporate Bonds:,... MMSE Forecasts')
axis tight hold off
The forecasts
seem reasonable, but there are outlying observations in 2000 and 2001.
Step 7. Monte Carlo forecast responses.
Forecast responses using the Monte Carlo method at a 10year horizon by simulating 100 paths using
the model Yfit. Set the estimation responses to Y0 and the inferred residuals to E0 as preforecast data.
Set the forecasted exogenous data XF to X. Calculate simulation statistics.
nsim = 100;
rng(1);
Ymcfore = simulate(Yfit,10,'numPaths',nsim,'Y0',...
Y(1:(T-10)),'E0',Eest,'X',XF);
Ymcforebar = mean(Ymcfore,2);
mc_cil = quantile(Ymcfore',0.025);
mc_ciu = quantile(Ymcfore',0.975);
Step 8. Plot Monte Carlo forecasted responses.
Plot the response series with their MMSE forecasts and prediction intervals.
figure(2)
xlabel('Year') ylabel('IGD (%)')
hold on
l4 = plot(dates((T-9):T),Ymcfore(:,1),'Color',[0.7 0.7 0.7]); plot(dates((T-9):T),Ymcfore,'Color',[0.7 0.7
References
[1] Helwege, J., and P. Kleiman. Understanding Aggregate Default Rates of High Yield Bonds.
Current Issues in Economics and Finance.Vol.2, Number. 6, 1996, pp. 16.
[2] Loeffler, G., and P. N. Posch. Credit Risk Modeling Using Excel and VBA. West Sussex, England:
Wiley Finance, 2007.
See Also
arima | estimate | forecast | infer | simulate |
Related Examples
Specify Multiplicative ARIMA Model on page 4-53
Estimate Multiplicative ARIMA Model on page 4-109
Simulate Multiplicative ARIMA Models on page 4-158
Model Seasonal Lag Effects Using Indicator Variables on page 4-113
Check Fit of Multiplicative ARIMA Model on page 3-80
Model Seasonal Lag Effects Using Indicator Variables on page 4-113
Concepts
MMSE Forecasting of Conditional Mean Models on page 4-167
Monte Carlo Forecasting of Conditional Mean Models on page 4-166
Load the NASDAQ data included with the toolbox. Convert the daily close composite index series to a
return series. Specify an AR(1) and GARCH(1,1) composite model. This is a model of the form
rc r IH,ttt11
whereHVz ,ttt
2 112 11 ,VNJV DH
tt t
Fit the model object, model, to the return series, r,using estimate.Usethe presample observations that
estimate automatically generates.
fit = estimate(model,r);
ARIMA(1,0,0) Model:
------------------Conditional Probability Distribution: Gaussian Standard t
Parameter Value Error Statistic
----------------------------------------Constant 0.000716654 0.000179112 4.00116 AR{1} 0.137371 0.0198323 6.92662
GARCH(1,1) Conditional Variance Model:
---------------------------------------Conditional Probability Distribution: Gaussian
Standard t Parameter Value Error Statistic
-----------------------------------------Constant 2.32077e-06 5.39077e-07 4.30508 GARCH{1} 0.86984
0.00918896 94.6614
ARCH{1} 0.121008 0.00856049 14.1357
The estimation display shows the five estimated parameters and their corresponding standard errors (the
AR(1) conditional mean model has two parameters, and the GARCH(1,1) conditional variance model
has three parameters).
The fitted model is
rr4.. ,1 Httt
whereHVz andttt
26 2 2
23 10..
ttt 1
..
All t statistics are greater than two, suggesting all parameters are statistically significant.
Step 3. Infer the conditional variances and residuals.
Infer and plot the conditional variances and standardized residuals. Also output the loglikelihood
objective function value.
[res,V,LogL] = infer(fit,r); figure(2)
subplot(2,1,1)
plot(V)
xlim([0,N])
title('Conditional Variance')
subplot(2,1,2)
plot(res./sqrt(V))
xlim([0,N])
title('Standardized Residuals')
The conditional
variances increase after observation 2000. This corresponds to the increased volatility seen in the
original return series.
The standardized residuals have more large values (larger than 2 or 3 in absolute value) than expected
under a standard normal distribution. This suggests a Students t distribution might be more appropriate
for the innovation distribution.
Step 4. Fit a model with a t innovation distribution.
Modify the model object so that it has a Students t innovation distribution. Fit the modified model to the
NASDAQ return series. Specify an initial value for the variance model constant term.
model.Distribution = 't';
fitT = estimate(model,r,'Variance0',{'Constant0',0.001});
ARIMA(1,0,0) Model:
------------------Conditional Probability Distribution: t
Parameter
---------Constant AR{1} DoF Standard Value Error
---------------------0.00101662 0.000169538 0.144693 0.0191552
7.48082 0.909391 t
Statistic
----------5.99642 7.5537 8.22619
GARCH(1,1) Conditional Variance Model:
---------------------------------------Conditional Probability Distribution: t
Standard Parameter Value Error
-------------------------------Constant 1.58272e-06 6.35438e-07 GARCH{1} 0.894434 0.0116374
ARCH{1} 0.101241 0.0119247
DoF 7.48082 0.909391
t
Statistic
----------2.49075 76.8586 8.49005 8.22619
The coefficient estimates change slightly when the t distribution is used for the innovations. The second
model fit has one additional parameter estimate, the t distribution degrees of freedom. The estimated
degrees of freedom are relatively small (about 8), indicating significant departure from normality.
Step 5. Compare the model fits.
Compare the two model fits (Gaussian and t innovation distribution) using the Akaike information
criterion (AIC) and Bayesian information criterion (BIC). First, obtain the loglikelihood objective
function value for the second fit.
[resT,VT,LogLT] = infer(fitT,r); [aic,bic] = aicbic([LogL,LogLT],[5,6],N)
aic =
1.0e+04 *
-1.8387 -1.8498
bic =
1.0e+04 *
-1.8357 -1.8462
The second model has six parameters compared to five in the first model (because of the t distribution
degrees of freedom). Despite this, both information criteria favor the model with the Students t
distribution. The AIC and BIC values are smaller (more negative) for the t innovation distribution.
See Also
aicbicarima | estimate | infer |
Related Examples
Specify Conditional Mean and Variance Model on page 4-78
Specify Conditional Mean Model Innovation Distribution on page 4-72
Concepts
Initial Values for Conditional Mean Model Estimation on page 4-103
Information Criteria on page 3-62
Plot the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) for the
simulated data.
figure(2)
subplot(2,1,1) autocorr(Y)
subplot(2,1,2) parcorr(Y)
Both the sample ACF and PACF decay relatively slowly. This is consistent with an ARMA model. The
ARMA lags cannot be selected solely by looking at the ACF and PACF, but it seems no more than four
AR or MA terms are needed.
Step 3. Fit ARMA(p,q) models.
To identify the best lags, fit several models with different lag choices. Here, fit all combinations of p =
1,...,4 and q = 1,...,4 (a total of 16 models). Store the loglikelihood objective function and number of
coefficients for each fitted model.
LOGL = zeros(4,4); %Initialize
PQ = zeros(4,4);
forp =1:4
for q = 1:4
mod = arima(p,0,q);
[fit,~,logL] = estimate(mod,Y,'print',false); LOGL(p,q) = logL; PQ(p,q) = p+q;
end
end
Step 4: Calculate the BIC.
Calculate the BIC for each fitted model. The number of parameters in a model is p + q + 1 (for the AR
and MA coefficients, and constant term). The number of observations in the data set is 100.
LOGL = reshape(LOGL,16,1); PQ = reshape(PQ,16,1);
[~,bic] = aicbic(LOGL,PQ+1,100); reshape(bic,4,4)
ans =
108.6241 105.9489 109.4164 113.8443
99.1639 101.5886 105.5203 109.4348
102.9094 106.0305 107.6489 99.6794
107.4045 100.7072 102.5746 102.0209
In the output BIC matrix, the rows correspond to the AR degree (p)andthe columns correspond to the
MA degree (q). The smallest value is best.
The smallest BIC value is 99.1639 in the (2,1) position. This corresponds to an ARMA(2,1) model,
matching the model that generated the data.
See Also
aicbicarima | estimate | simulate | | autocorr | parcorr
Related Examples
Detect Autocorrelation on page 3-19
Estimate Conditional Mean and Variance Models on page 4-124
Concepts
Autoregressive Moving Average Model on page 4-35
Information Criteria on page 3-62
Load the Australian CPI data. Take first differences, then plot the series.
load Data_JAustralian
Y = Dataset.PAU;
N = length(Y);
dY = diff(Y);
figure(1)
plot(2:N,dY)
xlim([0,N])
title('Differenced Australian CPI')
The differenced series looks relatively stationary.
Step 2. Plot the sample ACF and PACF.
Plot the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) to look for
autocorrelation in the differenced series.
figure(2)
subplot(2,1,1) autocorr(dY) subplot(2,1,2) parcorr(dY)
The sample ACF decays more slowly than the sample PACF. The latter cuts off after lag 2. This, along
with the first-degree differencing, suggests an ARIMA(2,1,0) model.
Step 3. Estimate an ARIMA(2,1,0) model.
Specify, and then estimate, an ARIMA(2,1,0) model. Infer the residuals for diagnostic checking.
model = arima(2,1,0);
fit = estimate(model,Y); [res,~,LogL] = infer(fit,Y);
Notice that the model is fit to the original series, and not the differenced series. Themodelobjecttobefit,
model, has property D equal to 1.This accounts for the one degree of differencing.
This specification assumes a Gaussian innovation distribution. infer returns the value of the
loglikelihood objective function (Logl) along with the residuals (res).
Step 4. Perform residual diagnostic checks.
Standardize the inferred residuals, and check for normality and any unexplained autocorrelation.
stdr = res/sqrt(fit.Variance);
figure(3)
subplot(2,2,1)
plot(stdr)
title('Standardized Residuals') subplot(2,2,2)
hist(stdr)
title('Standardized Residuals') subplot(2,2,3)
autocorr(stdr)
subplot(2,2,4)
parcorr(stdr)
The residuals appear uncorrelated and approximately normally distributed. There is some indication that
there is an excess of large residuals.
Step 5. Modify the innovation distribution.
To explore possible excess kurtosis in the innovation process, fit an ARIMA(2,1,0) model with a
Students t distribution to the original series. Return the value of the loglikelihood objective function so
you can use the Bayesian information criterion (BIC) to compare the fit of the two models.
model.Distribution = 't';
[fitT,~,LogLT] = estimate(model,Y); [~,bic] = aicbic([LogLT,LogL],[5,4],N)
bic =
-492.5317 -479.4691
The model with the t innovation distribution has one extra parameter (the degrees of freedom of the t
distribution).
According to the BIC, the ARIMA(2,1,0) model with a Students t innovation distribution is the better
choice because it has a smaller (more negative) BIC value.
See Also
arima | estimate | infer | aicbic
Related Examples
Box-Jenkins Differencing vs. ARIMA Estimation on page 4-91
Specify Conditional Mean Model Innovation Distribution on page 4-72
Concepts
Information Criteria on page 3-62
Goodness of Fit on page 3-86
Residual Diagnostics on page 3-88
Note Some extensions of Monte Carlo simulation rely on generating dependent random draws, such as
Markov Chain Monte Carlo (MCMC). The simulate method in Econometrics Toolbox generates
independent realizations.
Some applications of Monte Carlo simulation are:
Demonstrating theoretical results
Forecasting future events
Estimating the probability of future events
Generate responses by recursively applying the specified AR and MA polynomial operators. The AR
polynomial operator can include differencing.
For example, consider an AR(2) process,
3
yc y y .
tt tt
Given presample responses y0 and y1, and simulated innovations1,, , HHNrealizations of the process are
recursively generated:
yc y y II H
yc y y II H
yc y yII H
yc y y
NN NNII
You can reduce the Monte Carlo error of the probability estimate by increasing the number of
realizations. If you know the desired precision of your estimate, you can solve for the number of
realizations needed to achieve that level of precision.
See Also
arima | simulate |
Related Examples
Simulate Stationary Processes on page 4-144
Simulate Trend-Stationary and Difference-Stationary Processes on page 4-153
Simulate Multiplicative ARIMA Models on page 4-158
Simulate Conditional Mean and Variance Models on page 4-162
Forecast IGD Rate Using ARIMAX Model on page 4-118
Concepts
Presample Data for Conditional Mean Model Simulation on page 4-142
Transient Effects in Conditional Mean Model Simulations on page 4-143
See Also
arima | simulate |
Related Examples
Simulate Stationary Processes on page 4-144
Simulate Trend-Stationary and Difference-Stationary Processes on page 4-153
Simulate Multiplicative ARIMA Models on page 4-158
Simulate Conditional Mean and Variance Models on page 4-162
Concepts
Monte Carlo Simulation of Conditional Mean Models on page 4-139
Transient Effects in Conditional Mean Model Simulations on page 4-143
Related Examples
Simulate Stationary Processes on page 4-144
Simulate Trend-Stationary and Difference-Stationary Processes on page 4-153
Concepts
Monte Carlo Simulation of Conditional Mean Models on page 4-139
Presample Data for Conditional Mean Model Simulation on page 4-142
Simulate an AR Process
This example shows how to simulate sample paths from a stationary AR(2) process without specifying
presample observations.
Step 1. Specify a model.
Generate one sample path (with 50 observations) from the specified model, and plot.
rng('default')
Y = simulate(model,50);
figure(1)
plot(Y)
xlim([0,50])
Because
presample data was not specified, simulate sets the two required presample observations equal to the
unconditional mean of the process,
c 05 10.
1II (. .)
The simulation mean is constant over time. This is consistent with the definition of a stationary process.
The process variance is not constant over time, however. There are transient effects at the beginning of
()() 2 2II1222
by around the 50th observation.
Step 4. Oversample the process.
To reduce transient effects, one option is to oversample the process. For example, to sample 50
observations, you can generate paths with more than 50 observations, and discard all but the last 50
observations as burn-in. Here, simulate paths of length 150, and discard the first 100 observations.
rng('default')
Y = simulate(model,150,'numPaths',1000); Y = Y(101:end,:);
figure(3)
subplot(2,1,1)
plot(Y,'Color',[.85,.85,.85])
title('Simulated AR(2) Process')
hold on
h=plot(mean(Y,2),'k','LineWidth',2);
legend(h,'Simulation Mean','Location','NorthWest') hold off
subplot(2,1,2)
plot(var(Y,0,2),'r','LineWidth',2)
xlim([0,50])
title('Process Variance')
hold on
plot(1:50,.83*ones(50,1),'k--','LineWidth',1.5) legend('Simulation','Theoretical',...
'Location','SouthEast')
hold off
The realizations now look like draws from a stationary stochastic process. The simulation variance
fluctuates (due to Monte Carlo error) around the theoretical variance.
Simulate an MA Process
This example shows how to simulate sample paths from a stationary MA(12) process without specifying
presample observations.
Step 1. Specify a model.
.. .,HH H112
where the innovation distribution is Gaussian with variance 0.2.
model = arima('Constant',0.5,'MA',{0.8,0.2},...
'MALags',[1,12],'Variance',0.2);
Step 2. Generate sample paths.
figure(1)
plot(Y,'Color',[.85,.85,.85])
hold on
h = plot(mean(Y,2),'k','LineWidth',2)
legend(h,'Simulation Mean','Location','NorthWest') title('MA(12) Process')
hold off
For an MA
process, the constant term is the unconditional mean. The simulation mean is around 0.5, as expected.
Step 3. Plot the simulation variance.
TTV u
1
Because the model is stationary, the unconditional variance should be constant across all times. Plot the
simulation variance, and compare it to the theoretical variance.
figure(2)
plot(var(Y,0,2),'Color',[.75,.75,.75],'LineWidth',1.5) xlim([0,60])
title('Unconditional Variance')
hold on
plot(1:60,.336*ones(60,1),'k--','LineWidth',2)
legend('Simulation','Theoretical',...
'Location','SouthEast')
hold off
There appears to be a short burn-in period at the beginning of the simulation. During this time, the
simulation variance is lower than expected. Afterwards, the simulation variance fluctuates around the
theoretical variance.
Step 4. Generate more sample paths.
Simulate 10,000 paths from the model, each with length 1000. Look at the simulation variance.
rng('default')
YM = simulate(model,1000,'numPaths',10000);
figure(3)
plot(var(YM,0,2),'Color',[.75,.75,.75],'LineWidth',1.5) ylim([0.3,0.36])
title('Unconditional Variance')
hold on
plot(1:1000,.336*ones(1000,1),'k--','LineWidth',2) legend('Simulation','Theoretical',...
'Location','SouthEast')
hold off
See Also
Related Examples
Concepts
The Monte Carlo error is reduced when more realizations are generated. There is much less variability in
the simulation variance, which tightly fluctuates around the theoretical variance.
arima | simulate |
Simulate Trend-Stationary and Difference-Stationary Processes on page 4-153
Simulate Multiplicative ARIMA Models on page 4-158
Simulate Conditional Mean and Variance Models on page 4-162
Autoregressive Model on page 4-18
Moving Average Model on page 4-27
Monte Carlo Simulation of Conditional Mean Models on page 4-139
Transient Effects in Conditional Mean Model Simulations on page 4-143
12
where the innovation process is Gaussian with variance 8. After specifying the model, simulate 50
sample paths of length 200. Use 100 burn-in simulations.
t = [1:200]'; trend = 0.5*t;
model = arima('Constant',0,'MA',{1.4,0.8},'Variance',8); rng('default')
u = simulate(model,300,'numPaths',50);
Yt = repmat(trend,1,50) + u(101:300,:);
figure(1)
plot(Yt,'Color',[.85,.85,.85])
hold on
h1=plot(t,trend,'r','LineWidth',5);
xlim([0,200])
title('Trend-Stationary Process')
h2=plot(mean(Yt,2),'k--','LineWidth',2); legend([h1,h2],'Trend','Simulation Mean',...
'Location','NorthWest')
hold off
.. .,HH H11
where the innovation distribution is Gaussian with variance 8. After specifying the model, simulate 50
sample paths of length 200. No burn-in is needed because all sample paths should begin at zero. This is
the simulate default starting point for nonstationary processes with no presample data.
model = arima('Constant',0.5,'D',1,'MA',{1.4,0.8},... 'Variance',8);
rng('default')
Yd = simulate(model,200,'numPaths',50);
figure(2)
plot(Yd,'Color',[.85,.85,.85])
hold on
h1=plot(t,trend,'r','LineWidth',5);
xlim([0,200])
title('Difference-Stationary Process') h2=plot(mean(Yd,2),'k--','LineWidth',2);
legend([h1,h2],'Trend','Simulation Mean',... 'Location','NorthWest')
hold off
The simulation
average is close to the trend line with slope 0.5. The variance of the sample paths grows over time.
Step 3. Difference the sample paths.
A difference-stationary process is stationary when differenced appropriately. Take the first differences of
the sample paths from the difference-stationary process, and plot the differenced series. One observation
is lost as a result of the differencing.
diffY = diff(Yd,1,1);
figure(3)
plot(2:200,diffY,'Color',[.85,.85,.85])
xlim([0,200])
title('Differenced Series')
hold on
h = plot(2:200,mean(diffY,2),'k--','LineWidth',2); legend(h,'Simulation Mean','Location','NorthWest')
hold off
The differenced
series looks stationary, with the simulation mean fluctuating around zero.
Concepts
Trend-Stationary vs. Difference-Stationary Processes on page 2-7
Monte Carlo Simulation of Conditional Mean Models on page 4-139
Presample Data for Conditional Mean Model Simulation on page 4-142
load Data_Airline
Y = log(Dataset.PSSG);
N = length(Y);
model = arima('Constant',0,'D',1,'Seasonality',12,... 'MALags',1,'SMALags',12);
fit = estimate(model,Y);
res = infer(fit,Y);
For details about model selection and specification for this data, see:
Specify Multiplicative ARIMA Model on page 4-53
Estimate Multiplicative ARIMA Model on page 4-109
Step 2. Simulate airline passenger counts.
Use the fitted model to simulate 25 realizations of airline passenger counts over a 60-month (5-year)
horizon. Use the observed series and inferred residuals as presample data.
rng('default')
Ysim = simulate(fit,60,'numPaths',25,'Y0',Y,'E0',res); mn = mean(Ysim,2);
figure(1) plot(Y,'k') hold on
plot(N+1:N+60,Ysim,'Color',[.85,.85,.85]); h = plot(N+1:N+60,mn,'k--','LineWidth',2) xlim([0,N+60])
title('Simulated Airline Passenger Counts') legend(h,'Simulation Mean','Location','NorthWest') hold off
The simulated
forecasts show growth and seasonal periodicity similar to the observed series.
Step 3. Estimate the probability of a future event.
Use simulations to estimate the probability that log airline passenger counts will meet or exceed the
value 7 sometime during the next 5 years. Calculate the Monte Carlo error associated with the estimated
probability.
rng('default')
Ysim = simulate(fit,60,'numPaths',1000,'Y0',Y,'E0',res); g7 = sum(Ysim >= 7) > 0;
phat = mean(g7)
err = sqrt(phat*(1-phat)/1000)
phat =
0.3910
err =
0.0154
There is approximately a 39% chance that the (log) number of airline passengers will meet or exceed 7
in the next 5 years. The Monte Carlo standard error of the estimate is about 0.02.
Step 4. Plot the distribution of passengers at a future time.
Use the simulations to plot the distribution of (log) airline passenger counts 60 months into the future.
figure(2)
hist(Ysim(60,:))
title('Distribution of Passenger Counts in 60 months')
See Also
Related Examples
Concepts
arima | estimate |
infer | simulate |
Specify Multiplicative ARIMA Model on page 4-53
Estimate Multiplicative ARIMA Model on page 4-109
Forecast Multiplicative ARIMA Model on page 4-177
Check Fit of Multiplicative ARIMA Model on page 3-80
Monte Carlo Simulation of Conditional Mean Models on page 4-139
Presample Data for Conditional Mean Model Simulation on page 4-142
Monte Carlo Forecasting of Conditional Mean Models on page 4-166
Load the NASDAQ data included with the toolbox. Fit a conditional mean and variance model to the
data.
load Data_EquityIdx
nasdaq = Dataset.NASDAQ;
r = price2ret(nasdaq);
N = length(r);
model = arima('ARLags',1,'Variance',garch(1,1),... 'Distribution','t');
fit = estimate(model,r,'Variance0',{'Constant0',0.001});
[E0,V0] = infer(fit,r);
For details about model selection and specification for this data, see:
Specify Conditional Mean and Variance Model on page 4-78
Estimate Conditional Mean and Variance Models on page 4-124
Step 2. Simulate returns, innovations, and conditional variances.
Use simulate to generate 100 sample paths for the returns, innovations, and conditional variances for a
1000-period future horizon. Use the observed returns and inferred residuals and conditional variances as
presample data.
rng('default')
[Y,E,V] = simulate(fit,1000,'numPaths',100,... 'Y0',r,'E0',E0,'V0',V0);
figure(1)
plot(r)
hold on
plot(N+1:N+1000,Y) xlim([0,N+1000]) title('Simulated Returns') hold off
The simulation
shows increased volatility over the forecast horizon.
Step 3. Plot conditional variances.
figure(2)
plot(V0)
hold on
plot(N+1:N+1000,V)
xlim([0,N+1000])
title('Simulated Conditional Variances')
hold off
The increased volatility in the simulated returns is due to larger conditional variances over the forecast
horizon.
Step 4. Plot standardized innovations.
Standardize the innovations using the square root of the conditional variance process. Plot the
standardized innovations over the forecast horizon.
figure(3)
plot(E./sqrt(V))
xlim([0,1000])
title('Simulated Standardized Innovations')
The fitted model assumes the standardized innovations follow a standardized Students t distribution.
Thus, the simulated innovations have more larger values than would be expected from a Gaussian
innovation distribution.
See Also
arima | estimate | infer | simulate |
Related Examples
Specify Conditional Mean and Variance Model on page 4-78
Estimate Conditional Mean and Variance Models on page 4-124
Forecast Conditional Mean and Variance Model on page 4-181
Concepts
Monte Carlo Simulation of Conditional Mean Models on page 4-139
Presample Data for Conditional Mean Model Simulation on page 4-142
Monte Carlo Forecasting of Conditional Mean Models on page 4-166
See Also
arima | estimate | forecast | simulate |
Related Examples
Simulate Multiplicative ARIMA Models on page 4-158
Simulate Conditional Mean and Variance Models on page 4-162
Concepts
Monte Carlo Simulation of Conditional Mean Models on page 4-139
Presample Data for Conditional Mean Model Simulation on page 4-142
MMSE Forecasting of Conditional Mean Models on page 4-167
,...,
y
N
NN Nh
Let yt1 denote a forecast for the process at time t + 1, conditional on the history of the process up to
time t, Ht, and the exogenous covariate series up to time t+1, Xt+1, if a regression component is included
in the model.
The minimum mean square error (MMSE) forecast is the forecast yt1 that minimizes expected square
loss,
Ey
(|,).2
1
automatically infers presample innovations. In general, the longer the presample response series you
provide, the better the inferred presample innovations will be. If you provide presample responses and
exogenous covariate data, but not enough, forecast sets presample innovations equal to zero.
Consider generating forecasts for an AR(2) process,
yc y y .
tt tt
Given presample observations yN1 and yN, forecasts are recursively generated as follows:
ycy y
NNN11 21
ycy y
NNN
ycy yII
NNN 31221
For a stationary AR process, this recursion converges to the unconditional mean of the process,
c
.P II
12
12,tt t t...
you need 12 presample innovations to initialize the forecasts. All innovations from time N + 1 and
greater are set to their expectation, zero. Thus, for an
MA(12)process,theforecastforanytimemorethan12stepsinthefutureis the unconditional mean, .
Forecast Error
The forecast mean square error for an s-step ahead forecast is given by
MSE
Ey
ts ts ts
(|,).ts 2
1
where
()
LLL
2
\ Vs122H
For stationary processes, the coefficients of the infinite lag operator polynomial are absolutely
summable, and the MSE converges to the unconditional variance of the process.
For nonstationary processes, the series does not converge, and the forecast error grows over time.
See Also
arima | forecast |
Related Examples
Forecast Multiplicative ARIMA Model on page 4-177
Convergence of AR Forecasts on page 4-171
Concepts
Monte Carlo Forecasting of Conditional Mean Models on page 4-166
Convergence of AR Forecasts
This example shows how to forecast a stationary AR(12) process using forecast. Evaluate the asymptotic
convergence of the forecasts, and compare forecasts made with and without using presample data.
Step 1. Specify an AR(12) model.
where the innovations are Gaussian with variance 2. Generate a realization of length 300 from the
process. Discard the first 250 observations as burn-in.
model = arima('Constant',3,'AR',{0.7,0.25},'ARLags',[1,12],... 'Variance',2);
rng('default')
Y = simulate(model,300); Y = Y(251:300);
figure(1)
plot(Y)
xlim([0,50])
title('Simulated AR(12) Process')
Step 2. Forecast the process using presample data.
Generate forecasts (and forecast errors) for a 150-step time horizon. Use the simulated series as
presample data.
[Yf,YMSE] = forecast(model,150,'Y0',Y); upper = Yf + 1.96*sqrt(YMSE);
lower = Yf - 1.96*sqrt(YMSE);
figure(2)
plot(Y,'Color',[.75,.75,.75])
hold on
plot(51:200,Yf,'r','LineWidth',2)
plot(51:200,[upper,lower],'k--','LineWidth',1.5) xlim([0,200])
hold off
The MMSE forecast sinusoidally decays, and begins converging to the unconditional mean, given by
c 3 60.
P II112 (. .)
process ( V2 ). You can calculate the variance using the impulse responseH
function. The impulse response function is based on the infinite-degree MA representation of the AR(2)
process.
The last few values of YMSE show the convergence toward the unconditional variance.
ARpol = LagOp({1,-.7,-.25},'Lags',[0,1,12]); IRF = cell2mat(toCellArray(1/ARpol)); sig2e = 2;
variance = sum(IRF.^2)*sig2e
variance =
7.9938
YMSE(145:end)
ans =
7.8870
7.8899
7.8926
7.8954
7.8980
7.9006
Convergence is not reached within 150 steps, but the forecast MSE is approaching the theoretical
unconditional variance.
Step 4. Forecast without using presample data.
See Also
Related Examples
figure(3)
plot(Y,'Color',[.75,.75,.75])
hold on
plot(51:200,Yf2,'r','LineWidth',2)
plot(51:200,[upper2,lower2],'k--','LineWidth',1.5) xlim([0,200])
hold off
The convergence of the forecast MSE is the same without using presample data. However, all MMSE
forecasts are the unconditional mean. This is because forecast initializes the AR model with the
unconditional mean when you do not provide presample data.
arima | forecast | LagOp | simulate | toCellArray |
Simulate Stationary Processes on page 4-144
Forecast Multiplicative ARIMA Model on page 4-177
load Data_Airline
Y = log(Dataset.PSSG);
N = length(Y);
model = arima('Constant',0,'D',1,'Seasonality',12,... 'MALags',1,'SMALags',12);
fit = estimate(model,Y);
For details about model selection and specification for this data, see:
Specify Multiplicative ARIMA Model on page 4-53
Estimate Multiplicative ARIMA Model on page 4-109
Step 2. Forecast airline passenger counts.
Use the fitted model to generate MMSE forecasts and corresponding mean square errors over a
60-month (5-year) horizon. Use the observed series as presample data. By default, forecast infers
presample innovations using the specified model and observations.
[Yf,YMSE] = forecast(fit,60,'Y0',Y); upper = Yf + 1.96*sqrt(YMSE); lower = Yf - 1.96*sqrt(YMSE);
figure(1)
plot(Y,'Color',[.75,.75,.75]) hold on
h1 = plot(N+1:N+60,Yf,'r','LineWidth',2);
h2 = plot(N+1:N+60,upper,'k--','LineWidth',1.5);
plot(N+1:N+60,lower,'k--','LineWidth',1.5)
xlim([0,N+60])
title('Forecast and 95% Forecast Interval')
legend([h1,h2],'Forecast','95% Interval','Location','NorthWest') hold off
The MMSE forecast shows airline passenger counts continuing to grow over the forecast horizon. The
confidence bounds show that a decline in passenger counts is plausible, however. Because this is a
nonstationary process, the width of the forecast intervals grows over time.
Step 3. Compare MMSE and Monte Carlo forecasts.
Simulate 500 sample paths over the same forecast horizon. Compare the simulation mean to the MMSE
forecast.
rng('default')
res = infer(fit,Y);
Ysim = simulate(fit,60,'numPaths',500,'Y0',Y,'E0',res);
Ybar = mean(Ysim,2);
simU = prctile(Ysim,97.5,2); simL = prctile(Ysim,2.5,2);
figure(2)
h1=plot(Yf,'Color',[.85,.85,.85],'LineWidth',5); hold on
h2 = plot(Ybar,'k--','LineWidth',1.5);
xlim([0,60])
plot([upper,lower],'Color',[.85,.85,.85],'LineWidth',5) plot([simU,simL],'k--','LineWidth',1.5)
title('Comparison of MMSE and Monte Carlo Forecasts') legend([h1,h2],'MMSE','Monte
Carlo','Location','NorthWest') hold off
See Also
Related Examples
Concepts
The MMSE forecast and simulation mean are virtually indistinguishable. There are slight discrepancies
between the theoretical 95% forecast intervals and the simulation-based 95% forecast intervals.
arima | estimate | forecast | infer | simulate |
Specify Multiplicative ARIMA Model on page 4-53
Estimate Multiplicative ARIMA Model on page 4-109
Simulate Multiplicative ARIMA Models on page 4-158
Model Seasonal Lag Effects Using Indicator Variables on page 4-113
Check Fit of Multiplicative ARIMA Model on page 3-80
MMSE Forecasting of Conditional Mean Models on page 4-167
Monte Carlo Forecasting of Conditional Mean Models on page 4-166
Load the NASDAQ data included with the toolbox. Fit a conditional mean and variance model to the
data.
load Data_EquityIdx
nasdaq = Dataset.NASDAQ;
r = price2ret(nasdaq);
N = length(r);
model = arima('ARLags',1,'Variance',garch(1,1),... 'Distribution','t');
fit = estimate(model,r,'Variance0',{'Constant0',0.001});
[E0,V0] = infer(fit,r);
For details about model selection and specification for this data, see:
Specify Conditional Mean and Variance Model on page 4-78
Estimate Conditional Mean and Variance Models on page 4-124
Step 2. Forecast returns and conditional variances.
Use forecast to compute MMSE forecasts of the returns and conditional variances for a 1000-period
future horizon. Use the observed returns and inferred residuals and conditional variances as presample
data.
[Y,YMSE,V] = forecast(fit,1000,'Y0',r,'E0',E0,'V0',V0); upper = Y + 1.96*sqrt(YMSE);
lower = Y - 1.96*sqrt(YMSE);
figure(1)
subplot(2,1,1)
plot(r,'Color',[.75,.75,.75]) hold on
plot(N+1:N+1000,Y,'r','LineWidth',2) plot(N+1:N+1000,[upper,lower],'k--','LineWidth',1.5)
xlim([0,N+1000])
title('Forecasted Returns')
hold off
subplot(2,1,2)
plot(V0,'Color',[.75,.75,.75])
hold on
plot(N+1:N+1000,V,'r','LineWidth',2);
xlim([0,N+1000])
title('Forecasted Conditional Variances')
hold off
The conditional variance forecasts converge to the asymptotic variance of the GARCH conditional
variance model. The forecasted returns converge to the estimated model constant (the unconditional
mean of the AR conditional mean model).
Concepts
MMSE Forecasting of Conditional Mean Models on page 4-167
Obtain statistics from the properties and methods of mdl. For example, see the mdl.Diagnostics and
mdl.Residuals properties.
robustfit into LinearModel.fit
Previous Syntax.
[b,stats] = robustfit(X,y,wfun,tune,const)
Current Syntax.
mdl = LinearModel.fit(X,y,'robust','on') % bisquare
Or to use the wfun weight and the tune tuning parameter:
opt.RobustWgtFun = ' wfun';
opt.Tune = tune; % optional
mdl = LinearModel.fit(X,y,'robust',opt)
Obtain statistics from the properties and methods of mdl. For example, see the mdl.Diagnostics and
mdl.Residuals properties.
stepwisefit into LinearModel.stepwise
Previous Syntax.
[b,se,pval,inmodel,stats,nextstep,history] = stepwisefit(X,y,Name,Value)
Current Syntax.
mdl = LinearModel.stepwise(ds,modelspec,Name,Value)
or
mdl = LinearModel.stepwise(X,y,modelspec,Name,Value)
Obtain statistics from the properties and methods of mdl. For example, see the mdl.Diagnostics and
mdl.Residuals properties.
glmfit into GeneralizedLinearModel.fit
Previous Syntax.
[b,dev,stats] = glmfit(X,y,distr,param1,val1,...)
Current Syntax.
mdl = GeneralizedLinearModel.fit(X,y,distr,...)
Obtain statistics from the properties and methods of mdl. For example, the deviance is mdl.Deviance,
and to compare mdl against a constant model, use devianceTest(mdl).
nlinfit into NonLinearModel.fit
Previous Syntax.
[beta,r,J,COVB,mse] = nlinfit(X,y,fun,beta0,options)
Current Syntax.
mdl = NonLinearModel.fit(X,y,fun,beta0,'Options',options)
Equivalent values of the previous outputs:
beta mdl.Coefficients.Estimate
r mdl.Residuals.Raw
covb mdl.CoefficientCovariance
mse mdl.mse
mdl does not provide the Jacobian (J) output. The primary purpose of J was to pass it into nlparci or
nlpredci to obtain confidence intervals for the estimated coefficients (parameters) or predictions. Obtain
those confidence intervals as:
parci = coefCI(mdl)
[pred,predci] = predict(mdl)
where
yi is the ith response.
k is the kth coefficient, where0 is the constant term in the model. Sometimes, design matrices might
include information about the constant term. However, LinearModel.fit or LinearModel.stepwise by
default includes a constant term in the model, so you must not enter a column of 1s into your design
matrix X.
Xij is the ith observation on the jth predictor variable, j = 1, ..., p.
i is the ith noise term, that is, random error.
In general, a linear regression model can be a model of the form
K
where f (.) is a scalar-valued function of the independent variables, Xijs. The functions, f (X), might be in
any form including nonlinear functions or polynomials. The linearity, in the linear regression models,
refers to the linearity of the coefficients k. That is, the response variable, y,isalinear function of the
coefficients, k.
Some examples of linear models are:
yXXX
i iiiiEE E E H
yXXXX
iiiii011 22 31 3 422 Hi
EE E E E H
yXXXX X
iiiii ii
The following, however, are not linear models since they are not linear in the unknown coefficients, k.
logyXXEE E H
iiii
1 EXX
i
i
K Ey E
EHi
k 0 K
Ekk i
k0
K
Ekk i
k0
and Vy V
K2
ii EHHV
k 0
So the variance of yi is the same for all levels of Xij.
The responses yi are uncorrelated.
The fitted linear function is K
yb bf X X X i n,,, , ,,,
ikkiiip
k1
where yi is the estimated response and bks are the fitted coefficients. The coefficients are estimated so as
to minimize the mean squared difference between the prediction vector bf(X) and the true response
vector y, that
is yy. This method is called the method of least squares.Underthe assumptions on the noise terms, these
coefficients also maximize the likelihood of the prediction vector.
In a linear regression model of the form y =1X1 +2X2+... + pXp,the coefficient k expresses the impact of
a one-unit change in predictor variable, Xj, on the mean of the response, E(y) provided that all other
variables are held constant. The sign of the coefficient gives the direction of the effect. For example, if
the linear model is E(y) = 1.8 2.35X1 + X2, then 2.35 indicates a 2.35 unit decrease in the mean
response with a one-unit increase in X1,given X2 is held constant. If the model is E(y)=1.1+1.5X12 + X2,
the coefficient of X12 indicates a 1.5 unit increase in the mean of Y with a one-unit increase in X12 given
all else held constant. However, in the case of E(y)=1.1+2.1X1 +1.5X12, it is difficult to interpret the
coefficients similarly, since it is not possible to hold X1 constant when X12 changes or vice versa.
References
[1] Neter, J., M. H. Kutner, C. J. Nachtsheim, and W. Wasserman. Applied Linear Statistical Models.
IRWIN, The McGraw-Hill Companies, Inc., 1996.
[2] Seber, G. A. F. Linear Regression Analysis. Wiley Series in Probability and Mathematical Statistics.
John Wiley and Sons, Inc., 1977.
See Also
LinearModel | LinearModel.fit | LinearModel.stepwise |
Related Examples
Interpret Linear Regression Results on page 9-63
Regression Using Dataset Arrays on page 9-50
Linear Regression with Interaction Effects on page 9-53
Regression with Categorical Covariates on page 2-59
Linear Regression Workflow on page 9-43
Linear Regression
In this section...
Prepare Data on page 9-11
Choose a Fitting Method on page 9-13
Choose a Model or Range of Models on page 9-14 Fit Model to Data on page 9-20
Examine Quality and Adjust the Fitted Model on page 9-20 Predict or Simulate Responses to New
Data on page 9-39 Share Fitted Models on page 9-42
Linear Regression Workflow on page 9-43
Prepare Data
To begin fitting a regression, put your data into a form that fitting functions expect. All regression
techniques begin with input data in an array X and response data in a separate vector y, or input data in a
dataset array ds and response data as a column in ds. Each row of the input data represents one
observation. Each column represents one predictor (variable).
For a dataset array ds, indicate the response variable with the 'ResponseVar' name-value pair:
mdl = LinearModel.fit(ds,'ResponseVar','BloodPressure'); %or
mdl = GeneralizedLinearModel.fit(ds,'ResponseVar','BloodPressure');
The response variable is the last column by default.
You can use numeric categorical predictors. A categorical predictor is one that takes values from a fixed
set of possibilities.
For a numeric array X, indicate the categorical predictors using the 'Categorical' name-value pair. For
example, to indicate that predictors 2 and 3 out of six are categorical:
mdl = LinearModel.fit(X,y,'Categorical',[2,3]);
%or
mdl = GeneralizedLinearModel.fit(X,y,'Categorical',[2,3]); % or equivalently
mdl = LinearModel.fit(X,y,'Categorical',logical([0 1 1 0 0 0]));
For a dataset array ds, fitting functions assume that these data types are categorical:
- Logical
- Categorical (nominal or ordinal)
- String or character array
If you want to indicate that a numeric predictor is categorical, use the 'Categorical' name-value pair.
Represent missing numeric data as NaN. To represent missing data for other data types, see Missing
Group Values on page 2-53.
Dataset Array for Input and Response Data For example, to create a dataset array from an Excel
spreadsheet:
ds = dataset('XLSFile','hospital.xls',... 'ReadObsNames',true);
To create a dataset array from workspace variables:
load carsmall
ds = dataset(MPG,Weight); ds.Year = ordinal(Model_Year);
NumericMatrix for Input Data, Numeric Vector for Response For example, to create numeric arrays
from workspace variables:
load carsmall
X = [Weight Horsepower Cylinders Model_Year]; y = MPG;
To create numeric arrays from an Excel spreadsheet:
[X Xnames] = xlsread('hospital.xls'); y = X(:,4); % response y is systolic pressure X(:,4) = []; % remove
y from the X matrix
For LinearModel.stepwise, the model specification you give is the starting model, which the stepwise
procedure tries to improve. If you do not give a model specification, the default starting model is
'constant',andthedefault upper bounding model is 'interactions'. Change the upper bounding model using
the Upper name-value pair.
Note Thee are other ways of selecting models, such as using lasso, lassoglm, sequentialfs,or plsregress.
Brief String
String
'constant' 'linear'
'interactions'
'purequadratic'
'quadratic'
'polyijk'
Model Type
Model contains only a constant (intercept) term.
Model contains an intercept and linear terms for each predictor.
Model contains an intercept, linear terms, and all products of pairs of distinct predictors (no squared
terms).
Model contains an intercept, linear terms, and squared terms.
Model contains an intercept, linear terms, interactions, and squared terms.
Model is a polynomial with all terms up to degree i in the first predictor, degree j in the second predictor,
etc. Use numerals 0 through 9.For example, 'poly2111' has a constant plus all linear and product terms,
and also contains terms with predictor 1 squared.
For example, to specify an interaction model using LinearModel.fit with matrix predictors:
mdl = LinearModel.fit(X,y,'interactions');
To specify a model using LinearModel.stepwise and a dataset array ds of predictors, suppose you want
to start from a constant and have a linear model upper bound. Assume the response variable in ds is in
the third column.
mdl2 = LinearModel.stepwise(ds,'constant',... 'Upper','linear','ResponseVar',3);
Terms Matrix
A terms matrix is a T-byP+1 matrix specifying terms in a model, where T is the number of terms, P is
the number of predictor variables, and plus one is for the response variable. The value of T(i,j) is the
exponent of variable j in term i. For example, if there are three predictor variables A, B,and C:
[0 0 0 0] % constant term or intercept [0 1 0 0] % B; equivalently, A^0 * B^1 * C^0 [1 0 1 0] % A*C
[2 0 0 0] % A^2
[0 1 2 0] % B*(C^2)
The 0 at the end of each term represents the response variable. In general,
If you have the variables in a dataset array, then a 0 must represent the response variable depending on
the position of the response variable in the dataset array. For example:
Load sample data and define the dataset array.
load hospital
ds = dataset(hospital.Sex,hospital.BloodPressure(:,1),hospital.Age,...
hospital.Smoker,'VarNames',{'Sex','BloodPressure','Age','Smoker'});
Represent the linear model 'BloodPressure ~ 1 + Sex + Age + Smoker' in a terms matrix. The response
variable is in the second column of the data set array, so there must be a column of zeros for the
response variable in the second column of the term matrix.
T= [0 00 0;1 00 0;0 0 1 0;00 01]
T=
000 0
100 0
001 0
000 1
Redefine the dataset array.
ds = dataset(hospital.BloodPressure(:,1),hospital.Sex,hospital.Age,...
hospital.Smoker,'VarNames',{'BloodPressure','Sex','Age','Smoker'});
Now, the response variable is the first term in the data set array. Specify the same linear model,
'BloodPressure ~ 1 + Sex + Age + Smoker', using a term matrix.
T= [0 00 0;0 10 0;0 0 1 0;00 01]
T=
000 0
010 0
001 0
000 1
If you have the predictor and response variables in a matrix and column vector, then you must include
a 0 for the response variable at the end of each term. For example:
Load sample data and define the matrix of predictors.
load carsmall
X = [Acceleration,Weight];
Specify the model 'MPG ~ Acceleration + Weight +
Acceleration:Weight + Weight^2' using a term matrix and fit the model to data. This model includes the
main effect and two way interaction terms for the variables, Acceleration and Weight,anda second order
term for the variable, Weight.
T= [0 00;1 0 0;0 10;1 10;0 2 0]
T=
000
100
010
110
020
Fit a linear model.
mdl = LinearModel.fit(X,MPG,T)
mdl =
Linear regression model: y~ 1+ x1*x2 +x2^2
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) 48.906 12.589 3.8847 0.00019665 x1 0.54418 0.57125 0.95261
0.34337 x2 -0.012781 0.0060312 -2.1192 0.036857 x1:x2 -0.00010892 0.00017925 -0.6076 0.545 x2^2
9.7518e-07 7.5389e-07 1.2935 0.19917
Number of observations: 94, Error degrees of freedom: 89 Root Mean Squared Error: 4.1
R-squared: 0.751, Adjusted R-Squared 0.739
F-statistic vs. constant model: 67, p-value = 4.99e-26
Only the intercept and x2 term, which corresponds to the Weight variable, are significant at the 5%
significance level.
Now, perform a stepwise regression with a constant model as the starting model and a linear model with
interactions as the upper model.
T= [0 00;1 0 0;0 10;1 10];
mdl = LinearModel.stepwise(X,MPG,[0 0 0],'upper',T)
1. Adding x2, FStat = 259.3087, pValue = 1.643351e-28
mdl =
Linear regression model: y~ 1+ x2
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) 49.238 1.6411 30.002 2.7015e-49 x2 -0.0086119 0.0005348
-16.103 1.6434e-28
Number of observations: 94, Error degrees of freedom: 92 Root Mean Squared Error: 4.13
R-squared: 0.738, Adjusted R-Squared 0.735
F-statistic vs. constant model: 259, p-value = 1.64e-28
The results of the stepwise regression are consistent with the results of LinearModel.fit in the previous
step.
Formula
For a dataset array, specify the response variable using the 'ResponseVar' name-value pair. The default
is the last column in the array.
For example,
mdl = LinearModel.fit(X,y,'linear',... 'RobustOpts','on','CategoricalVars',3);
mdl2 = LinearModel.stepwise(ds,'constant',... 'ResponseVar','MPG','Upper','quadratic');
There is one point with large Cooks distance. Identify it and remove it from the model. You can use the
Data Cursor to click the outlier and identify it, or identify it programmatically:
[~,larg] = max(mdl.Diagnostics.CooksDistance);
mdl2 = LinearModel.fit(ds,'MPG ~ Cylinders*Weight + Weight^2',... 'Exclude',larg);
plotResiduals(mdl,'probability')
The two potential outliers appear on this plot as well. Otherwise, the probability plot seems reasonably
straight, meaning a reasonable fit to normally distributed residuals.
You can identify the two outliers and remove them from the data:
outl = find(mdl.Residuals.Raw > 12)
outl =
90
97
To remove the outliers, use the Exclude name-value pair:
mdl2 = LinearModel.fit(ds,'MPG ~ Cylinders*Weight + Weight^2',... 'Exclude',outl);
Examine a residuals plot of mdl2:
plotResiduals(mdl2)
The new residuals plot looks fairly symmetric, without obvious problems. However, there might be
some serial correlation among the residuals. Create a new plot to see if such an effect exists.
plotResiduals(mdl2,'lagged')
The scatter plot shows many more crosses in the upper-right and lower-left quadrants than in the other
two quadrants, indicating positive serial correlation among the residuals.
Another potential issue is when residuals are large for large observations. See if the current model has
this issue.
plotResiduals(mdl2,'fitted')
There is some tendency for larger fitted values to have larger residuals. Perhaps the model errors are
proportional to the measured values.
Plots to Understand Predictor Effects
This example shows how to understand the effect each predictor has on a regression model using a
variety of available plots.
1
load carsmall
ds = dataset(Weight,MPG,Cylinders);
ds.Cylinders = ordinal(ds.Cylinders);
mdl = LinearModel.fit(ds,'MPG ~ Cylinders*Weight + Weight^2');
Examine a slice plot of the responses. This displays the effect of each predictor separately.
plotSlice(mdl)
2
You can drag the individual predictor values, which are represented by dashed blue vertical lines. You
can also choose between simultaneous and non-simultaneous confidence bounds, which are represented
This plot shows that changing Weight from about 2500 to 4732 lowers
MPGbyabout30(thelocationoftheupperbluecircle). Italsoshowsthat changing the number of cylinders
from 8 to 4 raises MPG by about 10 (the lower blue circle). The horizontal blue lines represent
confidence intervals for these predictions. The predictions come from averaging over one predictor as
the other is changed. In cases such as this, where the two predictors are correlated, be careful when
interpreting the results.
Instead of viewing the effect of averaging over a predictor as the other is changed, examine the joint
interaction in an interaction plot.
4
plotInteraction(mdl,'Weight','Cylinders')
The interaction plot shows the effect of changing one predictor with the other held fixed. In this case, the
plot is much more informative. It shows, for example, that lowering the number of cylinders in a
relatively light car (Weight = 1795) leads to an increase in mileage, but lowering the number of
cylinders in a relatively heavy car (Weight = 4732) leads to a decrease in mileage.
For an even more detailed look at the interactions, look at an interaction plot with predictions. This
plot holds one predictor fixed while varying the other, and plots the effect as a curve. Look at the
interactions for various fixed numbers of cylinders.
5
plotInteraction(mdl,'Cylinders','Weight','predictions')
load carsmall
ds = dataset(Weight,MPG,Cylinders);
ds.Cylinders = ordinal(ds.Cylinders);
mdl = LinearModel.fit(ds,'MPG ~ Cylinders*Weight + Weight^2');
Create an added variable plot with Weight^2 as the added variable.
plotAdded(mdl,'Weight^2')
2
This plot shows the results of fitting both Weight^2 and MPG to the terms other than Weight^2. The
reason to use plotAdded is to understand what additional improvement in the model you get by adding
Weight^2.The coefficient of a line fit to these points is the coefficient of Weight^2 in the full model. The
Weight^2 predictor is just over the edge of significance (pValue < 0.05) as you can see in the
coefficients table display. You can see that in the plot as well. The confidence bounds look like they
could not contain a horizontal line (constant y), so a zero-slope model is not consistent with the data.
Create an added variable plot for the model as a whole.
plotAdded(mdl)
3
The model as a whole is very significant, so the bounds dont come close to containing a horizontal line.
The slope of the line is the slope of a fit to the predictors projected onto their best-fitting direction, or in
other words, the norm of the coefficient vector.
Change Models
There are two ways to change a model:
step Add or subtract terms one at a time, where step chooses the most important term to add or
remove.
addTerms and removeTerms Add or remove specified terms. Give the terms in any of the forms
described in Choose a Model or Range of Models on page 9-14.
If you created a model using LinearModel.stepwise, step can have an effect only if you give different
upper or lower models. step does not work when you fit a model using RobustOpts.
Forexample,startwithalinearmodelofmileagefromthe carbig data:
load carbig
ds = dataset(Acceleration,Displacement,Horsepower,Weight,MPG); mdl =
LinearModel.fit(ds,'linear','ResponseVar','MPG')
mdl =
Linear regression model:
MPG ~ 1 + Acceleration + Displacement + Horsepower + Weight
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) 45.251 2.456 18.424 7.0721e-55 Acceleration -0.023148 0.1256
-0.1843 0.85388 Displacement -0.0060009 0.0067093 -0.89441 0.37166 Horsepower -0.043608
0.016573 -2.6312 0.008849 Weight -0.0052805 0.00081085 -6.5123 2.3025e-10
Number of observations: 392, Error degrees of freedom: 387 Root Mean Squared Error: 4.25
R-squared: 0.707, Adjusted R-Squared 0.704
F-statistic vs. constant model: 233, p-value = 9.63e-102
Trytoimprovethemodelusing step for up to 10 steps:
mdl1 = step(mdl,'NSteps',10)
1. Adding Displacement:Horsepower, FStat = 87.4802, pValue = 7.05273e-19
mdl1 =
Linear regression model: MPG ~ 1 + Acceleration + Weight + Displacement*Horsepower
Estimated Coefficients: Estimate SE tStat pValue (Intercept) 61.285 2.8052 21.847 1.8593e-69 Acceleration -0.34401 0.11862 -2.9 0.0039445
Displacement -0.081198 0.010071 -8.0623 9.5014e-15 Horsepower -0.24313 0.026068 -9.3265 8.6556e-19 Weight -0.0014367 0.00084041
-1.7095 0.088166 Displacement:Horsepower 0.00054236 5.7987e-05 9.3531 7.0527e-19
Number of observations: 392, Error degrees of freedom: 386 Root Mean Squared Error: 3.84 R-squared: 0.761, Adjusted R-Squared 0.758 Fstatistic vs. constant model: 246, p-value = 1.32e-117
mdl2 =
Linear regression model: MPG ~ 1 + Displacement*Horsepower
Estimated Coefficients: Estimate SE tStat pValue (Intercept) 53.051 1.526 34.765 3.0201e-121 Displacement -0.098046 0.0066817 -14.674
4.3203e-39 Horsepower -0.23434 0.019593 -11.96 2.8024e-28 Displacement:Horsepower 0.00058278 5.193e-05 11.222 1.6816e-25
Number of observations: 392, Error degrees of freedom: 388 Root Mean Squared Error: 3.94 R-squared: 0.747, Adjusted R-Squared 0.745 Fstatistic vs. constant model: 381, p-value = 3e-115
mdl2 uses just Displacement and Horsepower, and has nearly as good a fit to the data as mdl1 in the
Adjusted R-Squared metric.
load carbig
X = [Acceleration,Displacement,Horsepower,Weight]; mdl = LinearModel.fit(X,MPG);
Create a three-row array of predictors from the minimal, mean, and maximal values. There are some
NaN values, so use functions that ignore NaN values.
2
Xnew = [nanmin(X);nanmean(X);nanmax(X)]; % new data 3 Find the predicted model responses and
confidence intervals on the predictions.
[NewMPG NewMPGCI] = predict(mdl,Xnew)
NewMPG =
34.1345
23.4078
NewMPGCI =
31.6115 36.6575
22.9859 23.8298 0.6134 8.9367
The confidence bound on the mean response is narrower than those for the minimum or maximum
responses, which is quite sensible.
feval
When you construct a model from a dataset array, feval is often more convenient for predicting mean
responses than predict. However, feval does not provide confidence bounds.
This example shows how to predict mean responses using the feval method.
1 Load the carbig data and make a default linear model of the response MPG to the Acceleration,
Displacement, Horsepower,and Weight predictors.
load carbig
ds = dataset(Acceleration,Displacement,Horsepower,Weight,MPG); mdl =
LinearModel.fit(ds,'linear','ResponseVar','MPG');
Create a three-row array of predictors from the minimal, mean, and maximal values. There are some
NaN values, so use functions that ignore NaN values.
2
37.7959
24.7615
-0.7783
NewMPG = random(mdl,Xnew)
NewMPG =
32.2931
24.8628
19.9715
Clearly, the predictions for the third (maximal) row of Xnew are not reliable.
mdl.Formula
ans = MPG ~ 1 + Acceleration + Displacement + Horsepower + Weight
hospital.xls is an Excel spreadsheet containing patient names, sex, age, weight, blood pressure, and dates
of treatment in an experimental protocol. First read the data into a dataset array.
patients = dataset('XLSFile','hospital.xls',... 'ReadObsNames',true);
Examine the first row of data.
patients(1,:)
ans = name sex age wgt smoke YPL-320 'SMITH' 'm' 38 176 1
sys dia trial1 trial2 trial3 trial4 YPL-320 124 93 18 -99 -99 -99
The sex and smoke fields seem to have two choices each. So change these fields to nominal.
patients.smoke = nominal(patients.smoke,{'No','Yes'}); patients.sex = nominal(patients.sex);
Step 2. Create a fitted model.
Your goal is to model the systolic pressure as a function of a patients age, weight, sex, and smoking
status. Create a linear formula for 'sys' as a function of 'age', 'wgt', 'sex',and 'smoke' .
modelspec = 'sys ~ age + wgt + sex + smoke'; mdl = LinearModel.fit(patients,modelspec)
mdl =
Linear regression model:
sys ~ 1 + sex + age + wgt + smoke
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) 118.28 7.6291 15.504 9.1557e-28 sex_m 0.88162 2.9473 0.29913
0.76549 age 0.08602 0.06731 1.278 0.20438 wgt -0.016685 0.055714 -0.29947 0.76524 smoke_Yes
9.884 1.0406 9.498 1.9546e-15
Number of observations: 100, Error degrees of freedom: 95 F-statistic vs. constant model: 24.5, p-value
= 5.99e-14
The sex, age, and weight predictors have rather high p-values, indicating that some of these predictors
might be unnecessary.
Step 3. Locate and remove outliers.
See if there are outliers in the data that should be excluded from the fit. Plot the residuals.
plotResiduals(mdl)
Try to obtain a simpler model, one with fewer predictors but the same predictive accuracy. step looks for
a better model by adding or removing one term at a time. Allow step take up to 10 steps.
mdl1 = step(mdl,'NSteps',10)
1. Removing wgt, FStat = 4.6001e-05, pValue = 0.9946 2. Removing sex, FStat = 0.063241, pValue =
0.80199
mdl1 =
Linear regression model: sys ~ 1 + age + smoke
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) 115.11 2.5364 45.383 1.1407e-66 age 0.10782 0.064844 1.6628
0.09962 smoke_Yes 10.054 0.97696 10.291 3.5276e-17 Number of observations: 99, Error degrees of
freedom: 96 Root Mean Squared Error: 4.61
R-squared: 0.536, Adjusted R-Squared 0.526
F-statistic vs. constant model: 55.4, p-value = 1.02e-16
step took two steps. This means it could not improve the model further by adding or subtracting a single
term.
Plot the effectiveness of the simpler model on the training data.
plotResiduals(mdl1)
Suppose you have four new people, aged 25, 30, 40, and 65, and the first and third smoke. Predict their
systolic pressure using mdl1.
ages = [25;30;40;65];
smoker = {'Yes';'No';'Yes';'No'}; systolicnew = feval(mdl1,ages,smoker)
systolicnew =
127.8561
118.3412
129.4734
122.1149
To make predictions, you need only the variables that mdl1 uses.
Step 6. Share the model.
You might want others to be able to use your model for prediction. Access the terms in the linear model.
coefnames = mdl1.CoefficientNames
coefnames =
'(Intercept)' 'age' 'smoke_Yes'
View the model formula.
mdl1.Formula
ans =
load imports-85
Store predictor and response variables in dataset array.
ds = dataset(X(:,7),X(:,8),X(:,9),X(:,15),'Varnames',... {'curb_weight','engine_size','bore','price'});
Fitlinear regression model.
Fit a linear regression model that explains the price of a car in terms of its curb weight, engine size, and
bore.
LinearModel.fit(ds,'price~curb_weight+engine_size+bore')
ans =
Linear regression model:
price ~ 1 + curb_weight + engine_size + bore
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) 64.095 3.703 17.309 2.0481e-41 curb_weight -0.0086681
0.0011025 -7.8623 2.42e-13 engine_size -0.015806 0.013255 -1.1925 0.23452 bore -2.6998 1.3489
-2.0015 0.046711
Number of observations: 201, Error degrees of freedom: 197 Root Mean Squared Error: 3.95
R-squared: 0.674, Adjusted R-Squared 0.669
F-statistic vs. constant model: 136, p-value = 1.14e-47
The following command also returns the same result because LinearModel.fit, by default, assumes the
predictor variable is in the last column of the dataset array ds.
LinearModel.fit(ds)
Recreate dataset array and repeat analysis.
This time, put the response variable in the first column of the dataset array.
ds = dataset(X(:,15),X(:,7),X(:,8),X(:,9),'Varnames',... {'price','curb_weight','engine_size','bore'});
When the response variable is in the first column of ds, define its location. For example,
LinearModel.fit, by default, assumes that bore is the response variable. You can define the response
variable in the model using either:
LinearModel.fit(ds,'ResponseVar','price');
or
LinearModel.fit(ds,'ResponseVar',logical([1 0 0 0]));
Perform stepwise regression.
LinearModel.stepwise(ds,'quadratic','lower','price~1',... 'ResponseVar','price')
1. Removing bore^2, FStat = 0.01282, pValue = 0.90997
2. Removing engine_size^2, FStat = 0.078043, pValue = 0.78027
3. Removing curb_weight:bore, FStat = 0.70558, pValue = 0.40195
ans =
Linear regression model:
price ~ 1 + curb_weight*engine_size + engine_size*bore + curb_weight^2
Estimated Coefficients: Estimate SE tStat pVal (Intercept) 131.13 14.273 9.1873 6.23 curb_weight
-0.043315 0.0085114 -5.0891 8.46 engine_size -0.17102 0.13844 -1.2354 0 bore -12.244 4.999 -2.4493
0. curb_weight:engine_size -6.3411e-05 2.6577e-05 -2.386 0. engine_size:bore 0.092554 0.037263
2.4838 0. curb_weight^2 8.0836e-06 1.9983e-06 4.0451 7.54
Number of observations: 201, Error degrees of freedom: 194 Root Mean Squared Error: 3.59
R-squared: 0.735, Adjusted R-Squared 0.726
F-statistic vs. constant model: 89.5, p-value = 3.58e-53
The initial model is a quadratic formula, and the lowest model considered is the constant. Here,
LinearModel.stepwise performs a backward elimination technique to determine the terms in the model.
The final model is price ~ 1 + curb_weight*engine_size + engine_size*bore + curb_weight^2, which
corresponds to
CE B CE EB C2
P 0EE E E E E E H
CE B CE EB
where P is price, C is curb weight, E is engine size, B is bore, i is the coefficient for the corresponding
term in the model, and is the error term. The final model includes all three main effects, the interaction
effects for curb weight and engine size and engine size and bore, and the second-order term for curb
weight.
See Also
LinearModel | LinearModel.fit | LinearModel.stepwise |
Related Examples
Examine Quality and Adjust the Fitted Model on page 9-20
Interpret Linear Regression Results on page 9-63
Concepts
Linear Regression Output and Diagnostic Statistics on page 9-71
load hospital
To retain only the first column of blood pressure, store data in a new dataset array.
ds = dataset(hospital.Sex,hospital.Age,hospital.Weight,hospital.Smoker,...
hospital.BloodPressure(:,1),'Varnames',{'Sex','Age','Weight','Smoker',... 'BloodPressure'});
Perform stepwise linear regression.
For the initial model, use the full model with all terms and their pairwise interactions.
mdl = LinearModel.stepwise(ds,'interactions')
1. Removing Sex:Smoker, FStat = 0.050738, pValue = 0.8223
2. Removing Weight:Smoker, FStat = 0.07758, pValue = 0.78124
3. Removing Age:Weight, FStat = 1.9717, pValue = 0.16367
4. Removing Sex:Age, FStat = 0.32389, pValue = 0.57067
5. Removing Age:Smoker, FStat = 2.4939, pValue = 0.11768
mdl =
Linear regression model:
BloodPressure ~ 1 + Age + Smoker + Sex*Weight
Estimated Coefficients: Estimate SE tStat pValue (Intercept) 133.17 10.337 12.883 1.76e-22 Sex_Male
-35.269 17.524 -2.0126 0.047015 Age 0.11584 0.067664 1.712 0.090198 Weight -0.1393 0.080211
-1.7367 0.085722 Smoker_1 9.8307 1.0229 9.6102 1.2391e-15 Sex_Male:Weight 0.2341 0.11192
2.0917 0.039162
Number of observations: 100, Error degrees of freedom: 94 Root Mean Squared Error: 4.72
R-squared: 0.53, Adjusted R-Squared 0.505
F-statistic vs. constant model: 21.2, p-value = 4e-14
The final model in formula form is BloodPressure ~ 1 + Age + Smoker + Sex*Weight. This model
includes all four main effects (Age, Smoker, Sex, Weight) and the two-way interaction between Sex and
Weight. This model corresponds to
BP
0
A
X
A
Sm
I
Sm
S
I
X
S W WEE
E E E ESWXWIS H, where
SmIS
BP0 EE E E E EAXAW X
Sm
BP0 EE E E E X
SAA W SWW
BP 97 901.. .0 11826X
AW
0 (Female) 0 (Nonsmoker) BP
X
0EE EAA W WX
BP 133 17.. .0 1393X AW
As seen from these models, Sm and S show how much the intercept of the response function changes
when the indicator variable takes the value 1 compared to when it takes the value 0. SW, however, shows
the effect of the Weight variable on the response variable when the indicator variable for sex takes the
value 1 compared to when it takes the value 0. You can explore the main and interaction effects in the
final model using the methods of the LinearModelclassasfollows.
Plot prediction slice plots.
figure()
plotSlice(mdl)
This plot shows the main effects for all predictor variables. The green line in each panel shows the
change in the response variable as a function of the predictor variable when all other predictor variables
are held constant. For example, for a smoking male patient aged 37.5, the expected blood pressure
increases as the weight of the patient increases, given all else the same.
The dashed red curves in each panel show the 95% confidence bounds for the predicted response values.
The horizontal dashed blue line in each panel shows the predicted response for the specific value of the
predictor variable corresponding to the vertical dashed blue line. You can drag these lines to get the
predicted response values at other predictor values, as shown next.
For example, the predicted value of the response variable is 118.3497 when a patient is female,
nonsmoking, age 40.3788, and weighs 139.9545 pounds. The values in the square brackets, [114.621,
122.079], show the lower and upper limits of a 95% confidence interval for the estimated response. Note
that, for a nonsmoking female patient, the expected blood pressure decreases as the weight increases,
given all else is held constant.
Plot main effects.
figure()
plotEffects(mdl)
This plot displays the main effects. The circles show the magnitude of the effect and the blue lines show
the upper and lower confidence limits for the main effect. For example, being a smoker increases the
expected blood pressure by 10 units, compared to being a nonsmoker, given all else is held constant.
Expected blood pressure increases about two units for males
compared to females, again, given other predictors held constant. An increase in age from 25 to 50
causes an expected increase of 4 units, whereas a change in weight from 111 to 202 causes about a
4-unit decrease in the expected blood pressure, given all else held constant.
Plot interaction effects.
figure()
plotInteraction(mdl,'Sex','Weight')
This plot displays the impact of a change in one factor given the other factor is fixed at a value.
Note Be cautious while interpreting the interaction effects. When there is not enough data on all factor
combinations or the data is highly correlated, it might be difficult to determine the interaction effect of
changing one factor while keeping the other fixed. In such cases, the estimated interaction effect is an
extrapolation from the data.
The blue circles show the main effect of a specific term, as in the main effects plot. The red circles show
the impact of a change in one term for fixed values of the other term. For example, in the bottom half of
this plot, the red circles show the impact of a weight change in female and male patients, separately. You
can see that an increase in a females weight from 111 to 202 pounds causes about a 14-unit decrease in
the expected blood pressure, while an increase of the same amount in the weight of a male patient causes
about a 5-unit increase in the expected blood pressure, again given other predictors are held constant.
Plot prediction effects.
figure()
plotInteraction(mdl,'Sex','Weight','predictions')
This plot shows the effect of changing one variable as the other predictor variable is held constant. In
this example, the last figure shows the response variable, blood pressure, as a function of weight, when
the variable sex is fixed at males and females. The lines for males and females are crossing which
indicates a strong interaction between weight and sex. You can see that the expected blood pressure
increases as the weight of a male patient increases, but decreases as the weight of a female patient
increases.
load carsmall
X = [Weight,Horsepower,Acceleration];
Fitlinear regression model.
lm = LinearModel.fit(X,MPG,'linear')
lm =
Linear regression model: y~ 1+ x1 +x2+ x3
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) 47.977 3.8785 12.37 4.8957e-21 x1 -0.0065416 0.0011274 -5.8023
9.8742e-08 x2 -0.042943 0.024313 -1.7663 0.08078 x3 -0.011583 0.19333 -0.059913 0.95236
Number of observations: 93, Error degrees of freedom: 89 Root Mean Squared Error: 4.09
R-squared: 0.752, Adjusted R-Squared 0.744
F-statistic vs. constant model: 90, p-value = 7.38e-27
This linear regression outputs display shows the following.
y ~1+ x1 + x2 +x3
Linear regression model in the formula form using Wilkinson notation. Here it corresponds to:
yXXX
.EE E E H
First column
(under Estimated Coefficients)
Estimate
SE tStat
pValue
Number of
observations
Error degrees of freedom
Root mean
squared error
Terms included in the model.
Coefficient estimates for each corresponding term in the model. For example, the estimate for the
constant term (intercept) is 47.977.
Standard error of the coefficients.
t -statistic for each coefficient to test the null hypothesis that the corresponding coefficient is zero
against the alternative that it is different from zero, given the other predictors in the model. Note that
tStat = Estimate/SE. For example, the t-statistic for the intercept is 47.977/3.8785 = 12.37.
p -value for the F statistic of the hypotheses test that the corresponding coefficient is equal to zero or
not. For example, the p-value of the F-statistic for x2 is greater than 0.05, so this term is not significant
at the 5% significance level given the other terms in the model.
Number of rows without any NaN values. For example, Number of observations is 93 because the MPG
data vector has 6 NaN values and one of the data vectors, Horsepower, has one NaN value for a different
observation.
n p,where n is the number of observations, and p is the number of coefficients in the model, including
the intercept. For example, the model has four predictors, so the Error degrees of freedomis934=89.
Square root of the mean squared error, which estimates the standard deviation of the error distribution.
R-squared
and Adjusted R-squared
F-statistic vs. constant model
p-value
Coefficient of determination and adjusted coefficient of determination, respectively. For example, the Rsquared value suggests that the model explains approximately 75% of the variability in the response
variable MPG.
Test statistic for the F-test on the regression model. It tests for a significant linear regression relationship
between the response variable and the predictor variables.
p-value for the F-test on the model. For example, the model is significant with a p-value of 7.3816e-27.
You can request this display by using disp. For example, if you name your model lm, then you can
display the outputs using disp(lm).
Perform analysisofvariance (ANOVA) for the model.
anova(lm,'summary')
ans =
SumSq DF MeanSq F pValue Total 6004.8 92 65.269
Model 4516 3 1505.3 89.987 7.3816e-27 Residual 1488.8 89 16.728
This ANOVA display shows the following. SumSq Sum of squares for the regression model, Model, the
error term, Residual, and the total, Total.
DF Degrees of freedom for each term. Degrees of freedom is n1 for the total, p 1 for the model, and n
p for the error term, where n is the number of observations, and p is the number of coefficients in the
model, including the intercept. For example, MPG data vector has six NaN values and one of the data
vectors, Horsepower,hasone NaN value for a different observation, so the total degrees of freedom is 93
1 = 92. There are four coefficients in the model, so the model DFis41=3,andthe DF for error
termis934=89.
MeanSq Mean squared error for each term. Note that MeanSq = SumSq/DF. For example, the mean
squared error for the error term is 1488.8/89 = 16.728. The square root of this value is the root mean
squared error in the linear regression display, or 4.09.
F F-statistic value, which is the same as F-statistic vs. constant model in the linear regression display. In
this example, it is 89.987, and in the linear regression display this F-statistic value is rounded up to 90.
pValue p-value for the F-test on the model. In this example, it is 7.3816e-27.
Note If there are higher-order terms in the regression model, anova partitions the model SumSq into the
part explained by the higher-order terms and the rest. The corresponding F-statistics are for testing the
significance of the linear terms and higher-order terms as separate groups.
If the data includes replicates, or multiple measurements at the same predictor values, then the anova
partitions the error SumSq into the part for the replicates and the rest. The corresponding F-statistic is
for testing the lack-of-fit by comparing the model residuals with the model-free variance estimate
computed on the replicates.
See the anova method for details.
Decompose ANOVA table for model terms.
anova(lm)
ans =
SumSq DF MeanSq F pValue x1 563.18 1 563.18 33.667 9.8742e-08
x2 52.187 1 52.187 3.1197 0.08078
x3 0.060046 1 0.060046 0.0035895 0.95236
Error 1488.8 89 16.728
This anova display shows the following:
First Terms included in the model.
column
SumSq Sum of squared error for each term except for the constant.
DF Degrees of freedom. In this example, DF is 1 for each term in the model and n p for the error term,
where n is the number of observations, and p is the number of coefficients in the model, including the
intercept. For example, the DF for the errorterminthismodelis934=89.
If any of the variables in the model is a categorical variable, the DF for that variable is the number of
indicator variables created for its categories (number of categories 1).
MeanSq Mean squared error for each term. Note that MeanSq = SumSq/DF. For example, the mean
squared error for the error term is 1488.8/89 = 16.728.
F F-values for each coefficient. The F-value is the ratio of the mean squared of each term and mean
squared error, that is, F = MeanSq(xi)/MeanSq(Error). Each F-statistic has an F distribution, with the
numerator degrees of freedom, DF value for the corresponding term, and the denominator degrees of
freedom, n p. n is the number of observations, and p is the number of coefficients in the model. In this
example, each F-statistic has an F(1, 89) distribution.
pValue p-value for each hypothesis test on the coefficient of the corresponding term in the linear model.
For example, the p-value for the F-statistic coefficient of x2 is 0.08078, and is not significant at the 5%
significance level given the other terms in the model.
Display coefficient confidence intervals.
coefCI(lm)
ans =
40.2702 55.6833
-0.0088 -0.0043
-0.0913 0.0054
-0.3957 0.3726
Thevaluesineachrowarethelowerandupperconfidencelimits,respectively, for the default 95% confidence
intervals for the coefficients. For example, the first row shows the lower and upper limits, 40.2702 and
55.6833, for the intercept,0. Likewise, the second row shows the limits for1 and so on. Confidence
intervals provide a measure of precision for linear regression coefficient estimates. A 100(1)%
confidence interval gives the range the corresponding regression coefficient will be in with 100(1)%
confidence.
You can also change the confidence level. Find the 99% confidence intervals for the coefficients.
coefCI(lm,0.01)
ans = 37.7677 58.1858
-0.0095 -0.0036
-0.1069 0.0211
-0.5205 0.4973
Perform hypothesis test on coefficients.
Test the null hypothesis that all predictor variable coefficients are equal to zero versus the alternate
hypothesis that at least one of them is different from zero.
[p,F,d] = coefTest(lm)
p=
7.3816e-27
F=
89.9874
d=
3
Here, coefTest performs an F-test for the hypothesis that all regression coefficients (except for the
intercept) are zero versus at least one differs from zero, which essentially is the hypothesis on the model.
It returns p,the p-value, F, the F-statistic, and d, the numerator degrees of freedom. The F-statistic and pvalue are the same as the ones in the linear regression display and ANOVA for the model. The degrees
of freedom is 4 1 = 3 because there are four predictors (including the intercept) in the model.
Now, perform a hypothesis test on the coefficients of the first and second predictor variables.
H= [0 10 0; 00 10]; [p,F,d] = coefTest(lm,H)
p=
5.1702e-23
F=
96.4873
d=
2
The numerator degrees of freedom is the number of coefficients tested, which is 2 in this example. The
results indicate that at least one of2 and3 differs from zero.
See Also
LinearModel | LinearModel.fit | LinearModel.stepwise | anova |
Related Examples
Examine Quality and Adjust the Fitted Model on page 9-20
Concepts
Linear Regression Output and Diagnostic Statistics on page 9-71
Residuals :
Raw, Pearson, Studentized, Standardized
tstats
s2_i
dwstat
fstat
hatmat
leverage
r, studres, standres tstat
Cooks Distance
Purpose
Cooks distance is useful for identifying outliers in the X values (observations
for predictor variables). It also shows the influence of each observation on the fitted response values. An
observation with Cooks distance larger than three times the mean Cooks distance might be an outlier.
Definition
Cooks distance is the scaled change in fitted values. Each element in CooksDistance is the normalized
change in the vector of coefficients due to the deletion of an observation. The Cooks distance, Di, of
observation i is
n 2
D
jji
j1
i
pMSE,
where
yj is the jth fitted response value.
y () is the jth fitted response value, where the fit does not include
observation i.
MSE is the mean squared error.
p is the number of coefficients in the regression model.
Cooks distance is algebraically equivalent to the following expression:
r
D
2h
,iiii
pMSE 12
where ri is the ith residual, and hii is the ith leverage value. CooksDistance is an n-by-1 column vector in
the Diagnostics dataset array of the LinearModel object.
How To
After obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can:
Display the Cooks distance values by indexing into the property using dot notation,
mdl.Diagnostics.CooksDistance
Plot the Cooks distance values using
plotDiagnostics(mdl,'cookd')
For details, see the plotDiagnostics method of the LinearModel class.
Example
Load the sample data and define the independent and response variables.
load hospital
X = double(hospital(:,2:5)); y = hospital.BloodPressure(:,1);
Fit the linear regression model.
mdl = LinearModel.fit(X,y);
Plot the Cooks distance values.
plotDiagnostics(mdl,'cookd')
The dashed line in the figure corresponds to the recommended threshold value,
3*mean(mdl.Diagnostics.CooksDistance). The plot has some
observations with Cooks distance values greater than the threshold value, which for this example is
3*(0.0108) = 0.0324. In particular, there are two Cooks distance values that are relatively higher than
the others, which exceed the threshold value. You might want to find and omit these from your data and
rebuild your model.
Find the observations with Cooks distance values that exceed the threshold value.
find((mdl.Diagnostics.CooksDistance)>3*mean(mdl.Diagnostics.CooksDistance))
ans =
2
13
28
44
58
70
71
84
93
95
Find the observations with Cooks distance values that are relatively larger than the other observations
with Cooks distances exceeding the threshold value.
find((mdl.Diagnostics.CooksDistance)>5*mean(mdl.Diagnostics.CooksDistance))
ans =
2 84 Return to Summary of Measures.
bt12/,npSEb,
r
where bi is the coefficient estimate, SE(bi) is the standard error of the coefficient estimate, and t(1/2,np)
is the 100(1/2) percentile of t-distribution with n p degrees of freedom. n is the number of
observations and p is the number of regression coefficients.
How To
After obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can obtain
the default 95% confidence intervals for coefficients using
coefCI(mdl)
You can also change the confidence level using
coefCI(mdl,alpha)
For details, see the coefCI and coefTest methods of LinearModel class.
Example
Load the sample data and fit a linear regression model.
load hald
mdl = LinearModel.fit(ingredients,heat);
Display the 95% coefficient confidence intervals.
coefCI(mdl)
ans =
-99.1786 223.9893
-0.1663 3.2685
-1.1589 2.1792
-1.6385 1.8423
-1.7791 1.4910
Thevaluesineachrowarethelowerandupperconfidencelimits,respectively, for the default 95% confidence
intervals for the coefficients. For example, the first row shows the lower and upper limits, 99.1786 and
223.9893, for the intercept,0. Likewise, the second row shows the limits for1 and so on. Display the 90%
confidence intervals for the coefficients ( = 0.1).
coefCI(mdl,0.1)
ans =
-67.8949 192.7057
0.1662 2.9360
-0.8358 1.8561
-1.3015 1.5053
-1.4626 1.1745
The confidence interval limits become narrower as the confidence level decreases.
Return to Summary of Measures.
Coefficient Covariance
Purpose
Estimated coefficient variances and covariances capture the precision of regression coefficient estimates.
The coefficient variances and their square root, the standard errors, are useful in testing hypotheses for
coefficients.
Definition
The estimated covariance matrix is
MSE X Xc1,
where MSE is the mean squared error, and X is the matrix of observations on the predictor variables.
CoefficientCovariance, a property of the fitted model, is a p-byp covariance matrix of regression
coefficient estimates. p is the number of coefficients in the regression model. The diagonal elements are
the variances of the individual coefficients.
How To
After obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can display
the coefficient covariances using
mdl.CoefficientCovariance
Example
Load the sample data and define the predictor and response variables.
load hospital
y = hospital.BloodPressure(:,1); X = double(hospital(:,2:5));
Fit a linear regression model.
mdl = LinearModel.fit(X,y);
Display the coefficient covariance matrix.
mdl.CoefficientCovariance
ans =
27.5113 11.0027 -0.1542 -0.2444 0.2702
11.0027 8.6864 0.0021 -0.1547 -0.0838
-0.1542 0.0021 0.0045 -0.0001 -0.0029
-0.2444 -0.1547 -0.0001 0.0031 -0.0026
0.2702 -0.0838 -0.0029 -0.0026 1.0829
Return to Summary of Measures.
SSE is the sum of squared error, SSR is the sum of squared regression, SST is the sum of squared total, n
is the number of observations, and p is the number of regression coefficients (including the intercept).
Because R-squared increases with added predictor variables in the regression model, the adjusted Rsquared adjusts for the number of predictor variables in the model. This makes it more useful for
comparing models with a different number of predictors.
How To
After obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can obtain
either R-squared value as a scalar by indexing into the property using dot notation, for example,
mdl.Rsquared.Ordinary mdl.Rsquared.Adjusted
You can also obtain the SSE, SSR, and SST using the properties with the same name.
mdl.SSE mdl.SSR mdl.SST
Example
Load the sample data and define the response and independent variables.
load hospital
y = hospital.BloodPressure(:,1); X = double(hospital(:,2:5));
Fit a linear regression model.
mdl = LinearModel.fit(X,y)
mdl =
Linear regression model: y~ 1+ x1 +x2+ x3 +x4
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) 117.4 5.2451 22.383 1.1667e-39 x1 0.88162 2.9473 0.29913
0.76549 x2 0.08602 0.06731 1.278 0.20438 x3 -0.016685 0.055714 -0.29947 0.76524 x4 9.884 1.0406
9.498 1.9546e-15
Number of observations: 100, Error degrees of freedom: 95 Root Mean Squared Error: 4.81
R-squared: 0.508, Adjusted R-Squared 0.487
F-statistic vs. constant model: 24.5, p-value = 5.99e-14
The R-squared and adjusted R-squared values are 0.508 and 0.487, respectively. Model explains about
50% of the variability in the response variable.
Access the R-squared and adjusted R-squared values using the property of the fitted LinearModel object.
mdl.Rsquared.Ordinary
ans =
0.5078
mdl.Rsquared.Adjusted
ans =
0.4871
The adjusted R-squared value is smaller than the ordinary R-squared value. Return to Summary of
Measures.
covratio^`.
det
MSE X Xc1
CovRatio is an n-by-1 vector in the Diagnostics dataset array of the fitted LinearModel object. Each
element is the ratio of the generalized variance of the estimated coefficients when the corresponding
element is deleted to the generalized variance of the coefficients using all the data.
How To
After obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can:
Display the CovRatio by indexing into the property using dot notation
mdl.Diagnostics.CovRatio
Plot the delete-1 change in covariance using
plotDiagnostics(mdl,'CovRatio')
For details, see the plotDiagnostics method of the LinearModel class.
Example
Load the sample data and define the response and predictor variables.
load hospital
y = hospital.BloodPressure(:,1); X = double(hospital(:,2:5));
For this example, the threshold limits are 1 + 3*5/100 = 1.15 and13*5/100 = 0.85. There are a few
points beyond the limits, which might be influential points.
Find the observations that are beyond the limits.
find((mdl.Diagnostics.CovRatio)>1.15|(mdl.Diagnostics.CovRatio)<0.85)
ans =
2
14
84
93
96
Return to Summary of Measures.
coefficient. The absolute value of a Dfbetas indicates the magnitude of the difference relative to the
estimated standard deviation of the regression coefficient. A Dfbetas value larger than 3/sqrt(n) in
absolute value indicates that the observation has a large influence on the corresponding coefficient.
Definition
Dfbetas for coefficient j and observation i is the ratio of the difference in theestimateofcoefficient j using
all observations and the one obtained by removing observation i, and the standard error of the coefficient
estimate obtained by removing observation i. The Dfbetas for coefficient j and observation i is
jbbji
ij
Dfbetas
MSE 1,
where bj is the estimate for coefficient j, bj(i) is the estimate for coefficient j by removing observation i,
MSE(i) is the mean squared error of the regression fitbyremovingobservation i,and hii is the leverage
value for observation i. Dfbetas is an n-byp matrix in the Diagnostics dataset array of the fitted
LinearModel object. Each cell of Dfbetas corresponds to the Dfbetas value for the corresponding
coefficient obtained by removing the corresponding observation.
How To
After obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can obtain
the Dfbetas values as an n-byp matrix by indexing into the property using dot notation,
mdl.Diagnostics.Dfbetas
Example
Load the sample data and define the response and independent variables.
load hospital
y = hospital.BloodPressure(:,1); X = double(hospital(:,2:5));
Fit a linear regression model.
mdl = LinearModel.fit(X,y);
Find the Dfbetas values that are high in absolute value.
[row,col] = find(abs(mdl.Diagnostics.Dfbetas)>3/sqrt(100)); disp([row col])
21
28 1
84 1
93 1
22
13 3
84 3
24
84 4
Return to Summary of Measures.
hii ,hii
where sriis the studentized residual, and hii is the leverage value of the fitted LinearModel object. Dffits
is an n-by-1 column vector in the Diagnostics dataset array of the fitted LinearModel object. Each
element in Dffits is the change in the fitted value caused by deleting the corresponding observation and
scaling by the standard error.
How To
After obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can:
Display the Dffits values by indexing into the property using dot notation
mdl.Diagnostics.Dffits
Plot the delete-1 scaled change in fitted values using
plotDiagnostics(mdl,'Dffits')
For details, see the plotDiagnostics method of the LinearModel class for details.
Example
Load the sample data and define the response and independent variables.
load hospital y = hospital.BloodPressure(:,1); X = double(hospital(:,2:5));
Fit a linear regression model.
mdl = LinearModel.fit(X,y);
Plot the Dffits values.
plotDiagnostics(mdl,'Dffits')
The influential threshold limit for the absolute value of Dffits in this example is 2*sqrt(5/100) = 0.45.
Again, there are some observations with Dffits values beyond the recommended limits.
Find the Dffits values that are large in absolute value.
find(abs(mdl.Diagnostics.Dffits)>2*sqrt(4/100))
ans =
2
13
28
44
58
70
71
84
93
95
Return to Summary of Measures.
yy
ji ji
Si MSE_, np1 where yj is the jth observed response value. S2_i is an n-by-1 vector in the Diagnostics
dataset array of the fitted LinearModel object. Each element in S2_i is the mean squared error of the
regression obtained by deleting that observation.
How To
After obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can:
Display the S2_i vector by indexing into the property using dot notation
mdl.Diagnostics.S2_i
Plot the delete-1 variance values using
plotDiagnostics(mdl,'S2_i') For details, see the plotDiagnostics method of the LinearModel class.
Example
Load the sample data and define the response and independent variables.
load hospital
y = hospital.BloodPressure(:,1); X = double(hospital(:,2:5));
Fit a linear regression model.
mdl = LinearModel.fit(X,y);
Display the MSE value for the model.
mdl.MSE
ans =
23.1140
Plot the S2_i values.
plotDiagnostics(mdl,'S2_i')
This plot makes it easy to compare the S2_i values to the MSE value of 23.114, indicated by the
horizontal dashed lines. You can see how deleting one observation changes the error variance.
Return to Summary of Measures.
Durbin-Watson Test
Purpose
The Durbin-Watson test assesses whether there is autocorrelation among the residuals or not.
Definition
The Durbin-Watson test statistic, DW,is n1 2rr
ii
DWi 1 .n 2ri
i1
dwtest(mdl)
For details, see the dwtest method of the LinearModel class.
Example
Load the sample data and fit a linear regression model.
load hald
mdl = LinearModel.fit(ingredients,heat);
Perform a two-sided Durbin-Watson test to determine if there is any autocorrelation among the residuals
of the linear model, mdl.
[p,DW] = dwtest(mdl,'exact','both')
p=
0.6285
DW =
2.0526
The value of the Durbin-Watson test statistic is 2.0526. The p-value of 0.6285 suggest that the residuals
are not autocorrelated.
F-statistic
Purpose
In linear regression, the F-statistic is the test statistic for the analysis of variance (ANOVA) approach to
test the significance of the model or the components in the model.
Definition
The F-statistic in the linear model output display is the test statistic for testing the statistical significance
of the model. The F-statistic values in the anova display are for assessing the significance of the terms or
components in the model.
How To
After obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can:
Find the F-statistic vs. constant model in the output display or by using
disp(mdl)
Display the ANOVA for the model using
anova(mdl,'summary')
Obtain the F-statistic values for the components, except for the constant term using
anova(mdl)
For details, see the anova method of the LinearModel class.
Example
Load the sample data.
load carbig
ds = dataset(Acceleration,Cylinders,Weight,MPG); ds.Cylinders = ordinal(Cylinders);
Fit a linear regression model.
mdl = LinearModel.fit(ds,'MPG~Acceleration*Weight+Cylinders+Weight^2')
mdl =
This display decomposes the ANOVA table into the model terms. The corresponding F-statistics in the F
column are for assessing the statistical significance of each term. The F-test for Cylinders test whether at
least one of the coefficients of indicator variables for cylinders categories is different from zero or not.
That is, whether different numbers of cylinders have a significant effect on MPG or not. The degrees of
freedom for each model term is the numerator degrees of freedom for the corresponding F-test. Most of
the terms have 1 degree of freedom, but the degrees of freedom for Cylinders is 4. Because there are
four indicator variables for this term.
Return to Summary of Measures.
Hat Matrix
Purpose
The hat matrix provides a measure of leverage. It is useful for investigating whether one or more
observations are outlying with regard to their X values, and therefore might be excessively influencing
the regression results.
Definition
The hat matrix is also known as the projection matrix because it projects the vector of observations, y,
onto the vector of predictions, y , thus putting the "hat" on y. The hat matrix H is defined in terms of the
data matrix X:
H = X(XTX)1XT
and determines the fitted or predicted values since
yHy Xb.
The diagonal elements of H, hii, are called leverages and satisfy 01hii
dd
n
iihp,
i1
where p is the number of coefficients, and n is the number of observations (rows of X) in the regression
model. HatMatrix is an n-byn matrix in the Diagnostics dataset array.
How To
After obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can:
Display the HatMatrix by indexing into the property using dot notation
mdl.Diagnostics.HatMatrix
When n is large, HatMatrix might be computationally expensive. In those cases, you can obtain the
diagonal values directly, using
mdl.Diagnostics.Leverage Return to Summary of Measures.
Leverage
Purpose
Leverage is a measure of the effect of a particular observation on the regression predictions due to the
position of that observation in the space of the inputs. In general, the farther a point is from the center of
the input space, the more leverage it has. Because the sum of the leverage values is p, an observation i
can be considered as an outlier if its leverage substantially exceeds the mean leverage value, p/n, for
example, a value larger than 2*p/n.
Definition
The leverage of observation i is the value of the ith diagonal term, hii,ofthe hat matrix, H,where
H = X(XTX)1XT.
The diagonal terms satisfy
01 ii
dd
n
iihp,
i1
where p is the number of coefficients in the regression model, and n is the number of observations. The
minimum value of hii is 1/n for a model with a constant term. If the fitted model goes through the origin,
then the minimum leverage value is 0 for an observation at x=0.
It is possible to express the fitted values, y , by the observed values, y,since
yHy Xb.
Hence, hii expresses how much the observation yi has impact on yi.Alarge value of hii indicates that the
ith case is distant from the center of all X values for all n cases and has more leverage. Leverage is an nby-1 column vector in the Diagnostics dataset array.
How To
After obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can:
Display the Leverage vector by indexing into the property using dot notation
mdl.Diagnostics.Leverage
Plot the leverage for the values fitted by your model using
plotDiagnostics(mdl)
See the plotDiagnostics method of the LinearModel class for details.
Example
Load the sample data and define the response and independent variables.
load hospital
y = hospital.BloodPressure(:,1); X = double(hospital(:,2:5));
Fit a linear regression model.
mdl = LinearModel.fit(X,y);
Plot the leverage values.
plotDiagnostics(mdl)
Residuals
Purpose
Residuals are useful for detecting outlying y values and checking the linear regression assumptions with
respect to the error term in the regression model. High-leverage observations have smaller residuals
because they often shift the regression line or surface closer to them. You can also use residuals to detect
some forms of heteroscedasticity and autocorrelation.
Definition
The Residuals matrix is an n-by-4 dataset array containing a table of four types of residuals, with one
row for each observation.
Raw Residuals. Observed minus fitted values, that is,
ry y .ii i
Pearson Residuals. Raw residuals divided by the root mean squared error, that is,
pr
r
ii,
MSE
where ri is the raw residual and MSE is the mean squared error.
Standardized Residuals. Standardized residuals are raw residuals divided by their estimated standard
deviation. The standardized residual for observation i is
st
i
ri ,
MSE 1
where MSE is the mean squared error and hii is the leverage value for observation i.
Studentized Residuals. Studentized residuals are the raw residuals divided by an independent estimate
of the residual standard deviation. The residual for observation
iisdividedbyanestimateoftheerrorstandarddeviation based on all observations except for observation i.
sr
i
ri ,
MSE 1
where MSE(i) is the mean squared error of the regression fit calculated by removing observation i,and hii
is the leverage value for observation i.The studentized residual sri has a t-distribution with n p 1
degrees of freedom.
How To
After obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can:
Find the Residuals dataset array under mdl object.
Obtain any of these columns as a vector by indexing into the property using dot notation, for example,
mdl.Residuals.Raw
Plot any of the residuals for the values fitted by your model using
plotResiduals(mdl)
For details, see the plotResiduals method of the LinearModel class.
Example
Load the sample data and store the independent and response variables in a dataset array.
load imports-85
ds = dataset(X(:,7),X(:,8),X(:,9),X(:,15),'Varnames',... {'curb_weight','engine_size','bore','price'}); Fit a
linear regression model.
mdl = LinearModel.fit(ds)
mdl =
Linear regression model:
price ~ 1 + curb_weight + engine_size + bore
Estimated Coefficients:
Estimate SE tStat pValue
(Intercept) 64.095 3.703 17.309 2.0481e-41
curb_weight -0.0086681 0.0011025 -7.8623 2.42e-13
The histogram
shows that the residuals are slightly right skewed.
Plot the box plot of all four types of residuals.
Res = double(mdl.Residuals);
This normal
probability plot also shows the deviation from normality and the skewness on the right tail of the
distribution of residuals.
Plot the residuals versus lagged residuals.
plotResiduals(mdl,'lagged')
This graph shows a trend, which indicates a possible correlation among the residuals. You can further
check this using dwtest(mdl). Serial correlation among residuals usually means that the model can be
improved.
Plot the symmetry plot of residuals.
plotResiduals(mdl,'symmetry')
The increase in
the variance as the fitted values increase suggests possible heteroscedasticity.
Return to Summary of Measures.
t-statistic
Purpose
In linear regression, the t-statistic is useful for making inferences about the regression coefficients. The
hypothesis test on coefficient i tests the null hypothesis that it is equal to zero meaning the
corresponding term is not significant versus the alternate hypothesis that the coefficient is different
from zero.
Definition
For a hypotheses test on coefficient i,with
H0 : i=0
H1 : i?0,
the t-statistic is:
t
bi
SE b (),i
where SE(bi) is the standard error of the estimated coefficient bi.
How To
After obtaining a fitted model, say, mdl,using LinearModel.fit or LinearModel.stepwise, you can:
Find the coefficient estimates, the standard errors of the estimates (SE), and the t-statistic values of
hypothesis tests for the corresponding coefficients (tStat) in the output display.
Call for the display using
display(mdl)
Example
Load the sample data and fit the linear regression model.
load hald
mdl = LinearModel.fit(ingredients,heat)
mdl =
Linear regression model: y~ 1+ x1 +x2+ x3 +x4
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) 62.405 70.071 0.8906 0.39913 x1 1.5511 0.74477 2.0827 0.070822
x2 0.51017 0.72379 0.70486 0.5009 x3 0.10191 0.75471 0.13503 0.89592 x4 -0.14406 0.70905
-0.20317 0.84407
Number of observations: 13, Error degrees of freedom: 8 Root Mean Squared Error: 2.45
R-squared: 0.982, Adjusted R-Squared 0.974
F-statistic vs. constant model: 111, p-value = 4.76e-07
You can see that for each coefficient, tStat = Estimate/SE.The p-values for the hypotheses tests are in the
pValue column. Each t-statistic tests for the significance of each term given other terms in the model.
According to these results, none of the coefficients seem significant at the 5% significance level,
although the R-squared value for the model is really high at 0.97. This often indicates possible
multicollinearity among the predictor variables.
Use stepwise regression to decide which variables to include in the model.
load hald
mdl = LinearModel.stepwise(ingredients,heat)
1. Adding x4, FStat = 22.7985, pValue = 0.000576232 2. Adding x1, FStat = 108.2239, pValue =
1.105281e-06
mdl =
Linear regression model: y~ 1+ x1 +x4
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) 103.1 2.124 48.54 3.3243e-13 x1 1.44 0.13842 10.403 1.1053e-06
x4 -0.61395 0.048645 -12.621 1.8149e-07
Number of observations: 13, Error degrees of freedom: 10 Root Mean Squared Error: 2.73
R-squared: 0.972, Adjusted R-Squared 0.967
F-statistic vs. constant model: 177, p-value = 1.58e-08
In this example, LinearModel.stepwise starts with the constant model (default) and uses forward
selection to incrementally add x4 and x1.Each predictor variable in the final model is significant given
the other one is in the model. The algorithm stops when adding none of the other predictor variables
significantly improves in the model. For details on stepwise regression, see LinearModel.stepwise.
References
[1] Atkinson, A. T. Plots, Transformations, and Regression. An Introduction to Graphical Methods of
Diagnostic Regression Analysis. New York: Oxford Statistical Science Series, Oxford University Press,
1987.
[2] Neter,J.,M.H.Kutner,C.J.Nachtsheim,andW.Wasserman. Applied Linear Statistical Models. IRWIN,
The McGraw-Hill Companies, Inc., 1996.
[3] Belsley, D. A., E. Kuh, and R. E. Welsch. Regression Diagnostics, Identifying Influential Data and
Sources of Collinearity. Wiley Series in Probability and Mathematical Statistics, John Wiley and Sons,
Inc., 1980.
See Also
LinearModel | LinearModel.fit | LinearModel.stepwise | plotDiagnostics | plotResiduals | anova | coefCI
| coefTest | dwtest |
Related Examples
Examine Quality and Adjust the Fitted Model on page 9-20
Interpret Linear Regression Results on page 9-63
Stepwise Regression
In this section...
Stepwise Regression to Select Appropriate Models on page 9-111 Compare large and small stepwise
models on page 9-111
mdl1 =
Linear regression model: MPG ~ 1 + Horsepower*Weight
Estimated Coefficients: Estimate SE tStat pValue (Intercept) 63.558 2.3429 27.127 1.2343e-91 Horsepower -0.25084 0.027279 -9.1952 2.3226e-18
Weight -0.010772 0.00077381 -13.921 5.1372e-36 Horsepower:Weight 5.3554e-05 6.6491e-06 8.0542 9.9336e-15
Number of observations: 392, Error degrees of freedom: 388 Root Mean Squared Error: 3.93 R-squared: 0.748, Adjusted R-Squared 0.746 Fstatistic vs. constant model: 385, p-value = 7.26e-116
Create a mileage model stepwise starting from the full interaction model.
mdl2 = LinearModel.stepwise(ds,'interactions','ResponseVar','MPG')
1. Removing Acceleration:Displacement, FStat = 0.024186, pValue = 0.8765
2. Removing Displacement:Weight, FStat = 0.33103, pValue = 0.56539
3. Removing Acceleration:Horsepower, FStat = 1.7334, pValue = 0.18876
4. Removing Acceleration:Weight, FStat = 0.93269, pValue = 0.33477
5. Removing Horsepower:Weight, FStat = 0.64486, pValue = 0.42245
mdl2 =
Linear regression model: MPG ~ 1 + Acceleration + Weight + Displacement*Horsepower
Estimated Coefficients: Estimate SE tStat pValue (Intercept) 61.285 2.8052 21.847 1.8593e-69 Acceleration -0.34401 0.11862 -2.9 0.0039445
Displacement -0.081198 0.010071 -8.0623 9.5014e-15 Horsepower -0.24313 0.026068 -9.3265 8.6556e-19 Weight -0.0014367 0.00084041
-1.7095 0.088166 Displacement:Horsepower 0.00054236 5.7987e-05 9.3531 7.0527e-19
Number of observations: 392, Error degrees of freedom: 386 Root Mean Squared Error: 3.84 R-squared: 0.761, Adjusted R-Squared 0.758 Fstatistic vs. constant model: 246, p-value = 1.32e-117
Notice that:
mdl1 has four coefficients (the Estimate column), and mdl2 has six coefficients.
The adjusted R-squared of mdl1 is 0.746, which is slightly less (worse) than that of mdl2, 0.758.
Create a mileage model stepwise with a full quadratic model as the upper bound, starting from the full
quadratic model:
mdl3 = LinearModel.stepwise(ds,'quadratic',... 'ResponseVar','MPG','Upper','quadratic');
Compare the three model complexities by examining their formulas.
mdl1.Formula
ans =
MPG ~ 1 + Horsepower*Weight
mdl2.Formula
ans =
The models have similar residuals. It is not clear which fits the data better. Interestingly, the more
complex models have larger maximum deviations of the residuals:
Rrange1 = [min(mdl1.Residuals.Raw),max(mdl1.Residuals.Raw)]; Rrange2 =
[min(mdl2.Residuals.Raw),max(mdl2.Residuals.Raw)]; Rrange3 =
[min(mdl3.Residuals.Raw),max(mdl3.Residuals.Raw)]; Rranges = [range1;range2;range3]
Rranges =
-10.7725 14.7314
-11.4407 16.7562
-12.2723 16.7927
Load the moore data. The data is in the first five columns, and the response in the sixth.
load moore X = [moore(:,1:5)]; y = moore(:,6);
Step 2. Fit robust and nonrobust models.
Fit two linear models to the data, one using robust fitting, one not.
mdl = LinearModel.fit(X,y); % not robust mdlr = LinearModel.fit(X,y,'RobustOpts','on');
Step 3. Examine model residuals.
Find the index of the outlier. Examine the weight of the outlier in the robust fit.
[~,outlier] = max(mdlr.Residuals.Raw); mdlr.Robust.Weights(outlier)
ans =
0.0246
This weight is much less than a typical weight of an observation:
median(mdlr.Robust.Weights)
ans = 0.9718
Ridge Regression
In this section...
Introduction to Ridge Regression on page 9-119 Ridge Regression on page 9-119
TT
XX X y
becomes highly sensitive to random errors in the observed response y, producing a large variance. This
situation of multicollinearity can arise, for example, when data are collected without an experimental
design.
Ridge regression addresses the problem by estimating regression coefficients using
E
()
TT
XX kI X y
=+
where k is the ridge parameter and I is the identity matrix. Small positive values of k improve the
conditioning of the problem and reduce the variance of the estimates. While biased, the reduced variance
of ridge estimates often result in a smaller mean square error when compared to least-squares estimates.
The Statistics Toolbox function ridge carries out ridge regression.
Ridge Regression
For example, load the data in acetylene.mat, with observations of the predictor variables x1, x2, x3, and
the response variable y:
load acetylene
Plot the predictor variables against each other:
subplot(1,3,1)
plot(x1,x2,'.')
xlabel('x1'); ylabel('x2'); grid on; axis square
subplot(1,3,2)
plot(x1,x3,'.')
xlabel('x1'); ylabel('x3'); grid on; axis square
subplot(1,3,3)
plot(x2,x3,'.')
xlabel('x2'); ylabel('x3'); grid on; axis square
The estimates stabilize to the right of the plot. Note that the coefficient of the x2x3 interaction term
changes sign at a value of the ridge parameter 5104.
Elastic net is a related technique. Use elastic net when you have several highly correlated variables.
lasso provides elastic net regularization when you set the Alpha name-value pair to a number strictly
between 0 and 1.
See Lasso and Elastic Net Details on page 9-134.
For lasso regularization of regression ensembles, see regularize.
Lasso Regularization
To see how lasso identifies and discards unnecessary predictors:
1 Generate 200 samples of five-dimensional artificial data X from exponential distributions with various
means:
rng(3,'twister') % for reproducibility X = zeros(200,5);
for ii = 1:5
X(:,ii) = exprnd(ii,200,1); end
2 Generate response dataY= X*r+ eps where r has just two nonzero components, and the noise eps is
normal with standard deviation 0.1:
r = [0;2;0;-3;0];
Y = X*r + randn(200,1)*.1;
3 Fit a cross-validated sequence of models with lasso, and plot the result:
[b fitinfo] = lasso(X,Y,'CV',10);
lassoPlot(b,fitinfo,'PlotType','Lambda','XScale','log');
The plot shows the nonzero coefficients in the regression for various values of the Lambda
regularization parameter. Larger values of Lambda appear on the left side of the graph, meaning more
regularization, resulting in fewer nonzero regression coefficients.
The dashed vertical lines represent the Lambda value with minimal mean squared error (on the right),
and the Lambda value with minimal mean squared error plus one standard deviation. This latter value is
a recommended setting for Lambda. These lines appear only when you perform cross validation. Cross
validate by setting the 'CV' name-value pair. This example uses 10-fold cross validation.
The upper part of the plot shows the degrees of freedom (df), meaning the number of nonzero
coefficients in the regression, as a function of Lambda. On the left, the large value of Lambda causes all
but one coefficient to be 0. On the right all five coefficients are nonzero, though the plot shows only two
clearly. The other three coefficients are so small that you cannot visually distinguish them from 0.
For small values of Lambda (toward the right in the plot), the coefficient values are close to the leastsquares estimate. See step 5 on page 9-126.
Find the Lambda value of the minimal cross-validated mean squared error plus one standard deviation.
Examine the MSE and coefficients of the fit at that Lambda:
4
Calculate the
Reference: Kalivas, John H., "Two Data Sets of Near Infrared Spectra," Chemometrics and Intelligent Laboratory Systems, v.37 (1997) pp.255 259
tic
[b fitinfo] = lasso(NIR,octane,'CV',10); % A time-consuming operation
toc
Elapsed time is 226.876926 seconds.
5 Plot the result:
lassoPlot(b,fitinfo,'PlotType','Lambda','XScale','log');
You can see the suggested value of Lambda is over 1e-2, and the Lambda with minimal MSE is under
1e-2. These values are in the fitinfo structure:
fitinfo.LambdaMinMSE ans =
0.0057
fitinfo.Lambda1SE ans =
0.0190
6 Examine the quality of the fit for the suggested value of Lambda:
lambdaindex = fitinfo.Index1SE; fitinfo.MSE(lambdaindex)
ans =
0.0532
fitinfo.DF(lambdaindex)
ans = 11
The fit uses just 11 of the 401 predictors, and achieves a cross-validated MSE of 0.0532.
7 Examine the plot of cross-validated MSE:
lassoPlot(b,fitinfo,'PlotType','CV');
% Use a log scale for MSE to see small MSE values better set(gca,'YScale','log');
As Lambda increases (toward the left), MSE increases rapidly. The coefficients are reduced too much
and they do not adequately fit the responses.
As Lambda decreases, the models are larger (have more nonzero coefficients). The increasing MSE
suggests that the models are overfitted.
The default set of Lambda values does not include values small enough to include all predictors. In this
case, there does not appear to be a reason to look at smaller values. However, if you want smaller values
than the default, use the LambdaRatio parameter, or supply a sequence of Lambda values using the
Lambda parameter. For details, see the lasso reference page.
To compute the cross-validated lasso estimate faster, use parallel computing (available with a Parallel
Computing Toolbox license):
matlabpool open
Starting matlabpool using the 'local' configuration ... connected to 4 labs.
8
opts = statset('UseParallel',true);
tic;
[b fitinfo] = lasso(NIR,octane,'CV',10,'Options',opts); toc
Elapsed time is 107.539719 seconds.
Computing in parallel is more than twice as fast on this problem using a quad-core processor.
constrains the size of the estimated coefficients. Therefore, it resembles ridge regression. Lasso is a
shrinkage estimator:it generates coefficient estimates that are biased to be small. Nevertheless, a lasso
estimator can have smaller mean squared error than an ordinary least-squares estimator when you apply
it to new data.
Unlike ridge regression, as the penalty term increases, lasso sets more coefficients to zero. This means
that the lasso estimator is a smaller model, with fewer predictors. As such, lasso is an alternative to
stepwise regression and other model selection and dimensionality reduction techniques.
Elastic net is a related technique. Elastic net is a hybrid of ridge regression and lasso regularization. Like
lasso, elastic net can generate reduced models by generating zero-valued coefficients. Empirical studies
have suggested that the elastic net technique can outperform lasso on data with highly correlated
predictors.
Definition of Lasso
The lasso technique solves this regularization problem. For a given value of , a nonnegative parameter,
lasso solves the problem
min
1N 2p ,
,
EE
EEOE
2Nyxj
i 11
where
N is the number of observations.
yi is the response at observation i.
xi is data, a vector of p values at observation i.
is a positive regularization parameter corresponding to one value of Lambda.
The parameters0 and are scalar and p-vector respectively.
As increases, the number of nonzero components of decreases.
The lasso problem involves the L1 norm of , as contrasted with the elastic net algorithm.
Definition of Elastic Net
The elastic net technique solves this regularization problem. For an strictly between 0 and 1, and a
nonnegative , elastic net solves the problem
min
1N 2 ,
,EE2
Nyx P
i 1 where
PD
() p () 2DEDE .2 2 1 jj j 1 2
Elastic net is the same as lasso when =1. As shrinks toward 0, elastic net approaches ridge regression.
For other values of , the penalty term P ( ) interpolates between the L1 norm of and the squared L2 norm
of .
References
[1] Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical
Society, Series B, Vol 58, No. 1, pp. 267288, 1996.
[2] Zou,H.andT.Hastie. Regularization and variable selection via the elastic net. Journal of the Royal
Statistical Society, Series B, Vol. 67, No. 2, pp. 301320, 2005.
[3] Friedman, J., R. Tibshirani, and T. Hastie. Regularization paths for generalized linear models via
coordinate descent. Journal of Statistical Software, Vol 33, No. 1, 2010. http://www.jstatsoft.org/v33/i01
[4] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, 2nd edition.
Springer, New York, 2008.
In this section...
Introduction to Partial Least Squares on page 9-137 Partial Least Squares on page 9-138
PLS finds combinations of the predictors that have a large covariance with theresponsevalues.
PLS therefore combines information about the variances of both the predictors and the responses, while
also considering the correlations among them.
PLS shares characteristics with other regression and feature transformation techniques. It is similar to
ridge regression in that it is used in situations with correlated predictors. It is similar to stepwise
regression (or more general feature selection techniques) in that it can be used to select a smaller set of
model terms. PLS differs from these methods, however, by transforming the original predictor space into
the new component space.
The Statistics Toolbox function plsregress carries out PLS regression. For example, consider the data on
biochemical oxygen demand in moore.mat, padded with noisy versions of the predictors to introduce
correlations:
load moore
y = moore(:,6); % Response
X0 = moore(:,1:5); % Original predictors X1 = X0+10*randn(size(X0)); % Correlated predictors X =
[X0,X1];
Use plsregress to perform PLS regression with the same number of components as predictors, then plot
the percentage variance explained in the response as a function of the number of components:
[XL,yl,XS,YS,beta,PCTVAR] = plsregress(X,y,10);
plot(1:10,cumsum(100*PCTVAR(2,:)),'-bo'); xlabel('Number of PLS components'); ylabel('Percent
Variance Explained in y');
Choosing the number of components in a PLS model is a critical step. The plot gives a rough indication,
showing nearly 80% of the variance in y explained by the first component, with as many as five
additional components making significant contributions.
The following computes the six-component model:
[XL,yl,XS,YS,beta,PCTVAR,MSE,stats] = plsregress(X,y,6); yfit = [ones(size(X,1),1) X]*beta;
plot(y,yfit,'o')
The scatter shows a reasonable correlation between fitted and observed responses, and this is confirmed
by the R2 statistic:
TSS = sum((y-mean(y)).^2); RSS = sum((y-yfit).^2); Rsquared = 1 - RSS/TSS Rsquared =
0.8421
A plot of the weights of the ten predictors in each of the six components shows that two of the
components (the last two computed) explain the majority of the variance in X:
plot(1:10,stats.W,'o-');
legend({'c1','c2','c3','c4','c5','c6'},'Location','NW') xlabel('Predictor');
ylabel('Weight');
A plot of the mean-squared errors suggests that as few as two components may provide an adequate
model:
[axes,h1,h2] = plotyy(0:6,MSE(1,:),0:6,MSE(2,:)); set(h1,'Marker','o')
set(h2,'Marker','o')
legend('MSE Predictors','MSE Response')
xlabel('Number of Components')
The calculation of mean-squared errors by plsregress is controlled by optional parameter name/value
pairs specifying cross-validation type and the number of Monte Carlo repetitions.
Prepare Data
To begin fitting a regression, put your data into a form that fitting functions expect. All regression
techniques begin with input data in an array X and response data in a separate vector y, or input data in a
dataset array ds and response data as a column in ds. Each row of the input data represents one
observation. Each column represents one predictor (variable).
For a dataset array ds, indicate the response variable with the 'ResponseVar' name-value pair:
mdl = LinearModel.fit(ds,'ResponseVar','BloodPressure'); %or
mdl = GeneralizedLinearModel.fit(ds,'ResponseVar','BloodPressure');
The response variable is the last column by default.
You can use numeric categorical predictors. A categorical predictor is one that takes values from a fixed
set of possibilities.
For a numeric array X, indicate the categorical predictors using the 'Categorical' name-value pair. For
example, to indicate that predictors 2 and 3 out of six are categorical:
mdl = LinearModel.fit(X,y,'Categorical',[2,3]);
%or
mdl = GeneralizedLinearModel.fit(X,y,'Categorical',[2,3]); % or equivalently
mdl = LinearModel.fit(X,y,'Categorical',logical([0 1 1 0 0 0]));
For a dataset array ds, fitting functions assume that these data types are categorical:
- Logical
- Categorical (nominal or ordinal)
- String or character array
If you want to indicate that a numeric predictor is categorical, use the 'Categorical' name-value pair.
Represent missing numeric data as NaN. To represent missing data for other data types, see Missing
Group Values on page 2-53.
For a 'binomial' model with data matrix X, the response y can be:
- Binary column vector Each entry represents success (1)orfailure(0).
- Two-column matrix of integers The first column is the number of successes in each observation, the
second column is the number of trials in that observation.
For a 'binomial' model with dataset ds:
- Use the ResponseVar name-value pair to specify the column of ds that gives the number of successes in
each observation.
- Use the BinomialSize name-value pair to specify the column of ds that gives the number of trials in
each observation.
Dataset Array for Input and Response Data For example, to create a dataset array from an Excel
spreadsheet:
ds = dataset('XLSFile','hospital.xls',... 'ReadObsNames',true);
To create a dataset array from workspace variables:
load carsmall
ds = dataset(MPG,Weight); ds.Year = ordinal(Model_Year);
NumericMatrix for Input Data, Numeric Vector for Response For example, to create numeric arrays
from workspace variables:
load carsmall
X = [Weight Horsepower Cylinders Model_Year]; y = MPG;
To create numeric arrays from an Excel spreadsheet:
[X Xnames] = xlsread('hospital.xls'); y = X(:,4); % response y is systolic pressure X(:,4) = []; % remove
y from the X matrix Notice that the nonnumeric entries, such as sex, do not appear in X.
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) -7.3628 0.66815 -11.02 3.0701e-28 x1 0.0023039 0.00021352
10.79 3.8274e-27
12 observations, 10 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 241, p-value = 2.25e-54
You can perform the same fit using a custom link function that performs identically to the 'probit' link
function:
s = {@norminv,@(x)1./normpdf(norminv(x)),@normcdf}; g = GeneralizedLinearModel.fit(x,[y n],...
'linear','distr','binomial','link',s)
g=
Generalized Linear regression model: link(y) ~ 1 + x1
Distribution = Binomial
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) -7.3628 0.66815 -11.02 3.0701e-28 x1 0.0023039 0.00021352
10.79 3.8274e-27
12 observations, 10 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 241, p-value = 2.25e-54
Thetwomodelsarethesame.
Equivalently, you can write s as a structure instead of a cell array of function handles:
s.Link = @norminv;
s.Derivative = @(x) 1./normpdf(norminv(x)); s.Inverse = @normcdf;
g = GeneralizedLinearModel.fit(x,[y n],...
'linear','distr','binomial','link',s)
g=
Generalized Linear regression model: link(y) ~ 1 + x1
Distribution = Binomial
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) -7.3628 0.66815 -11.02 3.0701e-28 x1 0.0023039 0.00021352
10.79 3.8274e-27
12 observations, 10 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 241, p-value = 2.25e-54
Model is a polynomial with all terms up to degree i in the first predictor, degree j in the second predictor,
etc. Use numerals 0 through 9.For example, 'poly2111' has a constant plus all linear and product terms,
and also contains terms with predictor 1 squared.
Terms Matrix A terms matrix is a T-byP+1 matrix specifying terms in a model, where T is the number
of terms, P is the number of predictor variables, and plus one is for the response variable. The value of
T(i,j) is the exponent of variable j in term i. For example, if there are three predictor variables A, B,and
C:
[0 0 0 0] % constant term or intercept [0 1 0 0] % B; equivalently, A^0 * B^1 * C^0 [1 0 1 0] % A*C
[2 0 0 0] % A^2
[0 1 2 0] % B*(C^2)
The 0 at the end of each term represents the response variable. In general,
If you have the variables in a dataset array, then a 0 must represent the response variable depending on
the position of the response variable in the dataset array. For example:
Load sample data and define the dataset array.
load hospital
ds = dataset(hospital.Sex,hospital.BloodPressure(:,1),hospital.Age,...
hospital.Smoker,'VarNames',{'Sex','BloodPressure','Age','Smoker'});
Represent the linear model 'BloodPressure ~ 1 + Sex + Age + Smoker' in a terms matrix. The response
variable is in the second column of the data set array, so there must be a column of zeros for the
response variable in the second column of the term matrix.
T= [0 00 0;1 00 0;0 0 1 0;00 01]
T=
000 0
100 0
001 0
000 1
Redefine the dataset array.
ds = dataset(hospital.BloodPressure(:,1),hospital.Sex,hospital.Age,...
hospital.Smoker,'VarNames',{'BloodPressure','Sex','Age','Smoker'});
Now, the response variable is the first term in the data set array. Specify the same linear model,
'BloodPressure ~ 1 + Sex + Age + Smoker', using a term matrix.
T= [0 00 0;0 10 0;0 0 1 0;00 01]
T=
000 0
010 0
001 0
000 1
If you have the predictor and response variables in a matrix and column vector, then you must include
a 0 for the response variable at the end of each term. For example:
Load sample data and define the matrix of predictors.
load carsmall
X = [Acceleration,Weight];
Specify the model 'MPG ~ Acceleration + Weight +
Acceleration:Weight + Weight^2' using a term matrix and fit the model to data. This model includes the
main effect and two way interaction terms for the variables, Acceleration and Weight,anda second order
term for the variable, Weight.
T= [0 00;1 0 0;0 10;1 10;0 2 0]
T=
000
100
010
110
020
Fit a linear model.
mdl = LinearModel.fit(X,MPG,T)
mdl =
Linear regression model: y~ 1+ x1*x2 +x2^2
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) 48.906 12.589 3.8847 0.00019665 x1 0.54418 0.57125 0.95261
0.34337 x2 -0.012781 0.0060312 -2.1192 0.036857 x1:x2 -0.00010892 0.00017925 -0.6076 0.545 x2^2
9.7518e-07 7.5389e-07 1.2935 0.19917
Number of observations: 94, Error degrees of freedom: 89 Root Mean Squared Error: 4.1
R-squared: 0.751, Adjusted R-Squared 0.739
F-statistic vs. constant model: 67, p-value = 4.99e-26
Only the intercept and x2 term, which corresponds to the Weight variable, are significant at the 5%
significance level.
Now, perform a stepwise regression with a constant model as the starting model and a linear model with
interactions as the upper model.
T= [0 00;1 0 0;0 10;1 10];
mdl = LinearModel.stepwise(X,MPG,[0 0 0],'upper',T)
1. Adding x2, FStat = 259.3087, pValue = 1.643351e-28
mdl =
Linear regression model: y~ 1+ x2
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) 49.238 1.6411 30.002 2.7015e-49 x2 -0.0086119 0.0005348
-16.103 1.6434e-28
Number of observations: 94, Error degrees of freedom: 92 Root Mean Squared Error: 4.13
R-squared: 0.738, Adjusted R-Squared 0.735
F-statistic vs. constant model: 259, p-value = 1.64e-28
The results of the stepwise regression are consistent with the results of LinearModel.fit in the previous
step.
Formula
A formula for a model specification is a string of the form
'Y ~ terms',
Y is the response name.
terms contains
- Variable names
- + to include the next variable
- to exclude the next variable
- : to define an interaction, a product of terms
- * to define an interaction and all lower-order terms
- ^toraisethepredictortoapower,exactlyasin * repeated, so ^ includes lower order terms as well
- () to group terms
Tip Formulas include a constant (intercept) term by default. To exclude a constant term from the model,
include -1 in the formula.
Examples:
'Y ~A +B+ C' is a three-variable linear model with intercept. 'Y ~A +B+ C- 1' is a three-variable linear
model without intercept. 'Y ~A +B+ C+ B^2' is a three-variable model with intercept and a B^2 term.
'Y ~A +B^2 +C' is the same as the previous example, since B^2 includes a B term.
'Y ~A +B+ C+ A:B' includes an A*B term.
'Y ~A*B+C' is the same as the previous example, sinceA*B= A+ B + A:B.
'Y ~ A*B*C - A:B:C' has all interactions among A, B,and C,exceptthe three-way interaction.
'Y ~ A*(B + C + D)' has all linear terms, plus products of A with each of the other variables.
The pValue for (Intercept), x2 and x3 are larger than 0.01. These three predictors were not used to
create the response data y.The pValue for x3 is just over .05, so might be regarded as possibly
significant.
The display contains the Chi-square statistic.
Diagnostic Plots
Diagnostic plots help you identify outliers, and see other problems in your model or fit. To illustrate
these plots, consider binomial regression with a logistic link function.
The logistic model is useful for proportion data. It defines the relationship between the proportion p and
the weight w by:
log[p/(1 p)] = b1 + b2w
This example fits a binomial model to data. The data are derived from carbig.mat, which contains
measurements of large cars of various weights. Each weight in w has a corresponding number of cars in
total and a corresponding number of poor-mileage cars in poor.
It is reasonable to assume that the values of poor follow binomial distributions, with the number of trials
given by total and the percentage of successes depending on w. This distribution can be accounted for in
the context of a logistic model by using a generalized linear model with link function log(/(1 )) =
Xb. This link function is called 'logit'.
w = [2100 2300 2500 2700 2900 3100 ...
3300 3500 3700 3900 4100 4300]';
total = [48 42 31 34 31 21 23 23 21 16 17 21]';
poor = [1 2 0 3 8 8 14 17 19 15 17 21]';
mdl = GeneralizedLinearModel.fit(w,[poor total],... 'linear','Distribution','binomial','link','logit')
mdl =
Generalized Linear regression model: logit(y) ~ 1 + x1
Distribution = Binomial
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) -13.38 1.394 -9.5986 8.1019e-22 x1 0.0041812 0.00044258 9.4474
3.4739e-21
12 observations, 10 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 242, p-value = 1.3e-54
See how well the model fits the data.
plotSlice(mdl)
This is typical of a regression with points ordered by the predictor variable. The leverage of each point
on the fit is higher for points with relatively extreme predictor values (in either direction) and low for
points with average predictor values. In examples with multiple predictors and with points not ordered
by predictor value, this plot can help you identify which observations have high leverage because they
are outliers as measured by their predictor values.
Residuals Model Quality for Training Data
There are several residual plots to help you discover errors, outliers, or correlations in the model or data.
The simplest residual plots are the default histogram plot, which shows the range of the residuals and
their frequencies, and the probability plot, which shows how the distribution of the residuals compares
to a normal distribution with matched variance.
This example shows residual plots for a fitted Poisson model. The data construction has two out of five
predictors not affecting the response, and no intercept term:
rng('default') % for reproducibility X = randn(100,5);
mu = exp(X(:,[1 4 5])*[2;1;.5]); y = poissrnd(mu);
mdl = GeneralizedLinearModel.fit(X,y,...
'linear','Distribution','poisson');
Examine the residuals:
plotResiduals(mdl)
Create a model from some predictors in artificial data. The data do not use the second and third
columns in X. So you expect the model not to show much dependence on those predictors.
1
The scale of the first predictor is overwhelming the plot. Disable it using the Predictors menu.
Generate some new data, and evaluate the predictions from the data.
Xnew = randn(3,5) + repmat([1 2 3 4 5],[3,1]); % new data [ynew,ynewci] = predict(mdl,Xnew)
ynew =
1.0e+04 *
2
0.1130
1.7375
3.7471
ynewci =
1.0e+04 *
0.0821 0.1555
1.2167 2.4811
2.8419 4.9407
feval
When you construct a model from a dataset array, feval is often more convenient for predicting mean
responses than predict. However, feval does not provide confidence bounds.
This example shows how to predict mean responses using the feval method.
Create a model from some predictors in artificial data. The data do not use the second and third
columns in X. So you expect the model not to show much dependence on these predictors. Construct the
model stepwise to include the relevant predictors automatically.
1
Generate some new data, and evaluate the predictions from the data.
Xnew = randn(3,5) + repmat([1 2 3 4 5],[3,1]); % new data
ynew = feval(mdl,Xnew(:,1),Xnew(:,4),Xnew(:,5)) % only need predictors 1,
ynew =
1.0e+04 *
2
0.1130
1.7375
3.7471
Equivalently,
ynew = feval(mdl,Xnew(:,[1 4 5])) % only need predictors 1,4,5
ynew =
1.0e+04 *
0.1130
1.7375
3.7471
random
The random method generates new random response values for specified predictor values. The
distribution of the response values is the distribution used in the model. random calculates the mean of
the distribution from the predictors, estimated coefficients, and link function. For distributions such as
normal, the model also provides an estimate of the variance of the response. For the binomial and
Poisson distributions, the variance of the response is determined by the mean; random does not use a
separate dispersion estimate.
This example shows how to simulate responses using the random method.
Create a model from some predictors in artificial data. The data do not use the second and third
columns in X. So you expect the model not to show much dependence on these predictors. Construct the
model stepwise to include the relevant predictors automatically.
1
Generate some new data, and evaluate the predictions from the data.
Xnew = randn(3,5) + repmat([1 2 3 4 5],[3,1]); % new data ysim = random(mdl,Xnew)
ysim =
2
1111
17121
37457
The predictions from random are Poisson samples, so are integers.
3 Evaluate the random method again, the result changes.
ysim = random(mdl,Xnew)
ysim =
1175
17320
37126
mdl =
Generalized Linear regression model: log(y) ~ 1 + x1 + x4 + x5 Distribution = Poisson
Estimated Coefficients: Estimate SE tStat pValue (Intercept) 0.17604 0.062215 2.8295 0.004662 x1 1.9122 0.024638 77.614 0 x4 0.98521
0.026393 37.328 5.6696e-305 x5 0.61321 0.038435 15.955 2.6473e-57
100 observations, 96 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 4.97e+04, p-value = 0
You can access the model description programmatically, too. For example,
mdl.Coefficients.Estimate
ans =
0.1760
1.9122
0.9852
0.6132
mdl.Formula
ans = log(y) ~1 +x1+ x4 +x5
Load the Fisher iris data. Extract the rows that have classification versicolor or virginica. These are rows
51 to 150. Create logical response variables that are trueforversicolorflowers.
load fisheriris
X = meas(51:end,:); % versicolor and virginica y = strcmp('versicolor',species(51:end));
Some p-values in the pValue column are not very small. Perhaps the model can be simplified.
See if some 95% confidence intervals for the coefficients include 0. If so, perhaps these model terms
could be removed.
confint = coefCI(mdl)
confint =
-8.3984 93.6740
-2.2881 7.2185
-2.2122 15.5739
-18.8339 -0.0248
-37.6277 1.0554
Only two of the predictors have coefficients whose confidence intervals do not include 0.
The coefficients of 'x1' and 'x2' have the largest p-values. Test whether both coefficients could be zero.
M = [0 1 0 0 0 % picks out coefficient for column 1
0 0 1 0 0]; % picks out coefficient for column 2
p = coefTest(mdl,M)
p=
0.1442 The p-value of about 0.14 is not very small. Drop those terms from the model.
mdl1 = removeTerms(mdl,'x1 + x2')
mdl1 =
Generalized Linear regression model: logit(y) ~ 1 + x3 + x4 Distribution = Binomial
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) 45.272 13.612 3.326 0.00088103 x3 -5.7545 2.3059 -2.4956
0.012576 x4 -10.447 3.7557 -2.7816 0.0054092
100 observations, 97 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 118, p-value = 2.3e-26
mdl2 =
Generalized Linear regression model: logit(y) ~ 1 + x2 + x3 + x4 Distribution = Binomial
Estimated Coefficients: Estimate SE tStat pValue (Intercept) 50.527 23.995 2.1057 0.035227 x2 8.3761 4.7612 1.7592 0.078536 x3 -7.8745 3.8407
-2.0503 0.040334 x4 -21.43 10.707 -2.0014 0.04535
100 observations, 96 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 125, p-value = 5.4e-27
GeneralizedLinearModel.stepwise included 'x2' in the model, because it neither adds nor removes terms
with p-values between 0.05 and 0.10.
Step 4. Look for outliers and exclude them.
See if the model coefficients change when you fit a model excluding this point.
oldCoeffs = mdl2.Coefficients.Estimate;
mdl3 = GeneralizedLinearModel.fit(X,y,'linear',...
Use mdl2 to predict the probability that a flower with average measurements is versicolor. Generate
confidence intervals for your prediction.
[newf newc] = predict(mdl2,mean(X))
newf =
0.5086
newc =
0.1863 0.8239
The model gives almost a 50% probability that the average flower is versicolor, with a wide confidence
interval about this estimate.
For details about lasso and elastic net computations and algorithms, see Generalized Linear Model
Lasso and Elastic Net on page 9-195. For a discussion of generalized linear models, see What Are
Generalized Linear Models? on page 9-143.
The green circle and dashed line locate the Lambda with minimal
cross-validation error. The blue circle and dashed line locate the point with minimal cross-validation
error plus one standard deviation.
Find the nonzero model coefficients corresponding to the two identified points.
minpts = find(B(:,FitInfo.IndexMinDeviance))
minpts =
3
5
6
10
11
15
16
min1pts = find(B(:,FitInfo.Index1SE))
min1pts =
5
10
15
The coefficients from the minimal plus one standard error point are exactly those coefficients used to
create the data.
Find the values of the model coefficients at the minimal plus one standard error point.
B(min1pts,FitInfo.Index1SE)
ans =
0.2903
0.0789
0.2081
The values of the coefficients are, as expected, smaller than the original [0.4,0.2,0.3]. Lasso works by
shrinkage, which biases predictor coefficients toward zero. See Lasso and Elastic Net Details on
page 9-134.
The constant term is in the FitInfo.Intercept vector.
FitInfo.Intercept(FitInfo.Index1SE)
ans =
1.0879 The constant term is near 1, which is the value used to generate the data.
Load the ionosphere data. The response Y is a cell array of 'g' or 'b' strings. Convert the cells to logical
values, with true representing 'g'.Removethe first two columns of X because they have some awkward
statistical properties, which are beyond the scope of this discussion.
load ionosphere
Ybool = strcmp(Y,'g'); X = X(:,3:end);
Construct a regularized binomial regression using 25 Lambda values and 10-fold cross validation. This
process can take a few minutes.
rng('default') % for reproducibility
[B,FitInfo] = lassoglm(X,Ybool,'binomial',... 'NumLambda',25,'CV',10);
Step 3. Examine plots to find appropriate regularization.
lassoPlot can give both a standard trace plot and a cross-validated deviance plot. Examine both plots.
lassoPlot(B,FitInfo,'PlotType','CV');
The plot identifies the minimum-deviance point with a green circle and dashed line as a function of the
regularization parameter Lambda. The blue circled point has minimum deviance plus no more than one
standard deviation.
lassoPlot(B,FitInfo,'PlotType','Lambda','XScale','log');
The trace plot shows nonzero model coefficients as a function of the regularization parameter Lambda.
Because there are 32 predictors and a linear model, there are 32 curves. As Lambda increases to the left,
lassoglm sets various coefficients to zero, removing them from the model.
The trace plot is somewhat compressed. Zoom in to see more detail.
xlim([.01 .1]) ylim([-3 3])
As Lambda increases toward the left side of the plot, fewer nonzero coefficients remain.
Find the number of nonzero model coefficients at the Lambda value with minimum deviance plus one
standard deviation point. The regularized model coefficients are in column FitInfo.Index1SE of the B
matrix.
The constant term is in the FitInfo.Index1SE entry of the FitInfo.Intercept vector. Call that value cnst.
The model is logit(mu) = log(mu/(1 mu))X*B0 + cnst. Therefore, for predictions, mu = exp(X*B0 +
cnst)/(1+exp(x*B0 + cnst)).
The glmval function evaluates model predictions. It assumes that the first model coefficient relates to the
constant term. Therefore, create a coefficient vector with the constant term first.
cnst = FitInfo.Intercept(indx); B1 = [cnst;B0];
Step 5. Examine residuals.
Plot the training data against the model predictions for the regularized lassoglm model.
preds = glmval(B1,X,'logit');
hist(Ybool - preds) % plot residuals title('Residuals from lassoglm model')
Step 6. Alternative: Use identified predictors in a least-squares generalized linear model.
Instead of using the biased predictions from the model, you can make an unbiased model using just the
identified predictors.
predictors = find(B0); % indices of nonzero predictors mdl =
GeneralizedLinearModel.fit(X,Ybool,'linear',... 'Distribution','binomial','PredictorVars',predictors)
mdl =
Generalized Linear regression model:
y ~ [Linear formula with 15 terms in 14 predictors] Distribution = Binomial
Estimated Coefficients:
Estimate SE tStat pValue (Intercept) -2.9367 0.50926 -5.7666 8.0893e-09
x1 2.492 0.60795 4.099 4.1502e-05
x3 2.5501 0.63304 4.0284 5.616e-05
x4 0.48816 0.50336 0.9698 0.33215
x5 0.6158 0.62192 0.99015 0.3221
x6 2.294 0.5421 4.2317 2.3198e-05
x7 0.77842 0.57765 1.3476 0.1778
x12 1.7808 0.54316 3.2786 0.0010432
x16 -0.070993 0.50515 -0.14054 0.88823
x20 -2.7767 0.55131 -5.0365 4.7402e-07
x24 2.0212 0.57639 3.5067 0.00045372
x25 -2.3796 0.58274 -4.0835 4.4363e-05
x27 0.79564 0.55904 1.4232 0.15467
x29 1.2689 0.55468 2.2876 0.022162
x32 -1.5681 0.54336 -2.8859 0.0039035
lassoPlot(B,S,'PlotType','Lambda','XScale','log')
The right (green) vertical dashed line represents the Lambda providing the smallest cross-validated
deviance. The left (blue) dashed line has the minimal deviance plus no more than one standard
deviation. This blue line has many fewer predictors:
[S.DF(S.Index1SE) S.DF(S.IndexMinDeviance)]
ans =
50 86
You asked lassoglm to fit using 100 different Lambda values. How many did it use?
size(B)
ans =
4000 84
lassoglm stopped after 84 values because the deviance was too small for small Lambda values. To avoid
overfitting, lassoglm halts when the deviance of the fitted model is too small compared to the deviance
in the binary responses, ignoring the predictor variables.
You can force lassoglm to include more terms by explicitly providing a set of Lambda values.
minLambda = min(S.Lambda);
explicitLambda = [minLambda*[.1 .01 .001] S.Lambda]; [B2,S2] =
lassoglm(obs,y,'binomial','Lambda',explicitLambda,...
'LambdaRatio',1e-4, 'CV',10,'Options',opt);
length(S2.Lambda)
ans =
87
lassoglm used the three smaller values in fitting.
To save time, you can use:
Fewer Lambda, meaning fewer fits
Fewer cross-validation folds
A larger value for LambdaRatio
Use serial computation and all three of these time-saving methods:
tic
[Bquick,Squick] = lassoglm(obs,y,'binomial','NumLambda',25,... 'LambdaRatio',1e-2,'CV',5);
toc
Elapsed time is 51.708074 seconds.
Graphically compare the new results to the first results.
lassoPlot(Bquick,Squick,'PlotType','CV');
lassoPlot(Bquick,Squick,'PlotType','Lambda','XScale','log')
is a shrinkage estimator: it generates coefficient estimates that are biased to be small. Nevertheless, a
lasso estimator can have smaller error than an ordinary maximum likelihood estimator when you apply it
to new data.
Unlike ridge regression, as the penalty term increases, the lasso technique sets more coefficients to zero.
This means that the lasso estimator is a smaller model, with fewer predictors. As such, lasso is an
alternative to stepwise regression and other model selection and dimensionality reduction techniques.
Elastic net is a related technique. Elastic net is akin to a hybrid of ridge regression and lasso
regularization. Like lasso, elastic net can generate reduced models by generating zero-valued
coefficients. Empirical studies suggest that the elastic net technique can outperform lasso on data with
highly correlated predictors.
Definition of Lasso for Generalized Linear Models For a nonnegative value of , lasso solves the
problem
min 1 DevianceEE O E
p
j ,
0,EE
j1
where
Deviance is the deviance of the model fit to the responses using intercept 0 and predictor coefficients .
The formula for Deviance depends on the distr parameter you supply to lassoglm. Minimizing the penalized deviance is equivalent to maximizing the -penalized log likelihood.
N is the number of observations.
is a nonnegative regularization parameter corresponding to one value of Lambda.
Parameters0 and are scalar and p-vector respectively.
As increases, the number of nonzero components of decreases.
The lasso problem involves the L1 norm of , as contrasted with the elastic net algorithm.
Definition of Elastic Net for Generalized Linear Models For an strictly between 0 and 1, and a
nonnegative , elastic net solves the problem
min 1 DevianceEE O E ,
,
0 EE
N where
PD
() p () 2DEDE .2 2 1 jj j 1 2
Elastic net is the same as lasso when = 1. For other values of , the penalty term P ( ) interpolates
between the L1 norm of and the squared L2 norm of .As shrinks toward 0, elastic net approaches ridge
regression.
References
[1] Tibshirani, R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical
Society, Series B, Vol. 58, No. 1, pp. 267288, 1996.
[2] Zou, H. and T. Hastie. Regularization and Variable Selection via the Elastic Net. Journal of the
Royal Statistical Society, Series B, Vol. 67, No. 2, pp. 301320, 2005.
[3] Friedman, J., R. Tibshirani, and T. Hastie. Regularization Paths for Generalized Linear Models via
Coordinate Descent. Journal of Statistical Software, Vol. 33, No. 1, 2010. http://www.jstatsoft.org/v33/
i01
[4] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, 2nd edition.
Springer, New York, 2008.
[5] McCullagh, P., and J. A. Nelder. Generalized Linear Models, 2nd edition. Chapman & Hall/CRC
Press, 1989.
Nonlinear Regression
In this section...
What Are Parametric Nonlinear Regression Models? on page 9-198 Prepare Data on page 9-199
Represent the Nonlinear Model on page 9-200
Choose Initial Vector beta0 on page 9-203
Fit Nonlinear Model to Data on page 9-203
Examine Quality and Adjust the Fitted Model on page 9-204 Predict or Simulate Responses to New
Data on page 9-208 Nonlinear Regression Workflow on page 9-212
In contrast, nonparametric models do not attempt to characterize the relationship between predictors and
response with model parameters. Descriptions are often graphical, as in the case of Classification Trees
and Regression Trees on page 15-30.
NonLinearModel.fit attempts to find values of the parameters that minimize the mean squared
differences between the observed responses y and the predictions of the model f(X, ). To do so, it needs a
starting value beta0 before iteratively modifying the vector to a vector with minimal mean squared error.
Prepare Data
To begin fitting a regression, put your data into a form that fitting functions expect. All regression
techniques begin with input data in an array X and response data in a separate vector y, or input data in a
dataset array ds and response data as a column in ds. Each row of the input data represents one
observation. Each column represents one predictor (variable).
For a dataset array ds, indicate the response variable with the 'ResponseVar' name-value pair:
mdl = LinearModel.fit(ds,'ResponseVar','BloodPressure');
The response variable is the last column by default.
You cannot use numeric categorical predictors for nonlinear regression. A categorical predictor is one
that takes values from a fixed set of possibilities.
Represent missing data as NaN for both input data and response data.
Dataset Array for Input and Response Data For example, to create a dataset array from an Excel
spreadsheet:
ds = dataset('XLSFile','hospital.xls',... 'ReadObsNames',true);
To create a dataset array from workspace variables:
load carsmall
ds = dataset(Weight,Model_Year,MPG n);
Numeric Matrix for Input Data and Numeric Vector for Response
For example, to create numeric arrays from workspace variables:
load carsmall
X = [Weight Horsepower Cylinders Model_Year]; y = MPG;
To create numeric arrays from an Excel spreadsheet:
[X Xnames] = xlsread('hospital.xls'); y = X(:,4); % response y is systolic pressure X(:,4) = []; % remove
y from the X matrix
Notice that the nonnumeric entries, such as sex, do not appear in X.
plotResiduals(mdl) gives the difference between the fitted model and the data.
There are also properties of mdl that relate to the model quality.
mdl.RMSE gives the root mean square error between the data and the fitted model.
mdl.Residuals.Raw gives the raw residuals.
mdl.Diagnostics contains several fields, such as Leverage and CooksDistance, that can help you
identify particularly interesting observations.
This example shows how to examine a fitted nonlinear model using diagnostic, residual, and slice plots.
1 Load the reaction data.
load reaction
2 Create a nonlinear model of rate as a function of reactants using the hougen.m function, starting from
beta0 = ones(5,1);.
beta0 = ones(5,1);
mdl = NonLinearModel.fit(reactants,... rate,@hougen,beta0);
3 Make a leverage plot of the data and model.
plotDiagnostics(mdl)
rng('default')
X = rand(100,1);
X = tan(pi*X - pi/2);
Generate the response according to the model y = b1*( /2 + atan((x b2)/ b3)).
Add noise to the response.
2
Estimated Coefficients:
Estimate SE tStat pValue b1 12.082 0.80028 15.097 3.3151e-27 b2 5.0603 1.0825 4.6747 9.5063e-06 b3
9.64 0.46499 20.732 2.0382e-37
Number of observations: 100, Error degrees of freedom: 97 Root Mean Squared Error: 1.02
R-Squared: 0.92, Adjusted R-Squared 0.918
F-statistic vs. constant model: 558, p-value = 6.11e-54
The fitted values are within a few percent of the parameters [12,5,10].
4 Examine the fit:
plotSlice(mdl)
predict
The predict method predicts the mean responses and, if requested, gives confidence bounds. For
example, to find the predicted response values and predicted confidence intervals about the response at
X values [-15;5;12]:
Xnew = [-15;5;12];
[ynew,ynewci] = predict(mdl,Xnew)
ynew =
5.4122
18.9022
26.5161
ynewci = 4.8233 6.0010
18.4555 19.3490
25.0170 28.0151
The confidence intervals are reflected in the slice plot.
feval
The feval method predicts the mean responses. feval is often more convenient to use than predict when
you construct a model from a dataset array. For example,
Create the nonlinear model from a dataset array.
ds = dataset({X,'X'},{y,'y'});
mdl2 = NonLinearModel.fit(ds,modelfun,beta0); 2 Find the predicted model responses (CDF) at X
1
values [-15;5;12].
Xnew = [-15;5;12];
ynew = feval(mdl2,Xnew)
ynew =
5.4122
18.9022
26.5161
random
The random method simulates new random response values, equal to the mean prediction plus a random
disturbance with the same variance as the training data. For example,
Xnew = [-15;5;12];
ysim = random(mdl,Xnew)
ysim =
6.0505
19.0893
25.4647
Rerun the random method. The results change.
ysim = random(mdl,Xnew)
ysim =
6.3813
19.2157
26.6541
The root mean squared error is fairly low compared to the range of observed values.
mdl.RMSE
ans = 0.1933
[min(rate) max(rate)]
ans =
0.0200 14.3900
Examine a residuals plot.
plotResiduals(mdl)
Remove the outlier from the fit using the Exclude name-value pair.
mdl1 = NonLinearModel.fit(reactants,...
rate,@hougen,ones(5,1),'Exclude',6)
mdl1 =
To see the effect of each predictor on the response, make a slice plot using plotSlice(mdl).
plotSlice(mdl)
plotSlice(mdl1)
The plots look very similar, with slightly wider confidence bounds for mdl1. This difference is
understandable, since there is one less data point in the fit, representing over 7% fewer observations.
Step 6. Predict for new data.
Create some new data and predict the response from both models.
Xnew = [200,200,200;100,200,100;500,50,5];
[ypred yci] = predict(mdl,Xnew)
ypred =
1.8762
6.2793
1.6718
yci =
1.6283 2.1242
5.9789 6.5797
1.5589 1.7846
[ypred1 yci1] = predict(mdl1,Xnew)
ypred1 =
1.8984
6.2555
1.6594
yci1 =
1.6260 2.1708
5.9323 6.5787
1.5345 1.7843
Even though the model coefficients are dissimilar, the predictions are nearly identical.
Mixed-Effects Models
In this section...
Introduction to Mixed-Effects Models on page 9-219 Mixed-Effects Model Hierarchy on page
9-220
Specifying Mixed-Effects Models on page 9-221
Specifying Covariate Models on page 9-224
Choosing nlmefit or nlmefitsa on page 9-226
Using Output Functions with Mixed-Effects Models on page 9-229 Mixed-Effects Models Using
nlmefit and nlmefitsa on page 9-234 Examining Residuals for Model Verification on page 9-249
For example, consider a model of the elimination of a drug from the bloodstream. The model uses time t
as a predictor and the concentration of the drug C as the response. The nonlinear model term C0ert
combines parameters C0 and r, representing, respectively, an initial concentration and an elimination
rate. If data is collected across multiple individuals, it is reasonable to assume that the elimination rate is
a random variable ri depending on individual i, varying around a population mean r . The term C0ert
becomes
Ce
+
[( )]
rrrt
()
ii +bt,
00Ce
Random effects are useful when data falls into natural groups. In the drug elimination model, the groups
are simply the individuals under study. More sophisticated models might group data by an individuals
age, weight, diet, etc. Although the groups are not the focus of the study, adding random effects to a
model extends the reliability of inferences beyond the specific sample of individuals.
Mixed-effects models account for both fixed and random effects. As with all regression models, their
purpose is to describe a response variable as a function of the predictor variables. Mixed-effects models,
however, recognize correlations within sample subgroups. In this way, they provide a compromise
between ignoring data groups entirely and fitting each group with a separate model.
yf x
(, )
=+ij MHij
yij is the response, xij is a vector of predictors, is a vector of model parameters, and ij is the measurement
or process error. The index j ranges from 1 to ni,where ni is the number of observations in group i. The
function f specifies the form of the model. Often, xij is simply an observation time tij.
Theerrorsareusuallyassumedtobeindependentandidentically,normally distributed, with constant variance.
Estimates of the parameters in describe the population, assuming those estimates are the same for all
groups. If, however, the estimates vary by group, the model becomes
ij
yf x
(, )
i
=+MH
ij
ij
If the design matrices also differ among observations, the model becomes
ij
ME
ij
ABb
=+
ij i
ij
=+
yf xij(, )ij MHij
Some of the group-specific predictors in xij may not change with observation j. Calling those vi, the
model becomes
yf xv
ij
ij
=+ij i MHij
yf X(, )MHiiii =+
bN(, )i
N
HV(, )2
This formulation of the nonlinear mixed-effects model uses the following notation:
i A vector of group-specific model parameters
A vector of fixed effects, modeling population parameters bi A vector of multivariate normally
distributed group-specific random effects
Ai A group-specific design matrix for combining fixed effects Bi A group-specific design matrix for
combining random effects Xi A data matrix of group-specific predictor values
yi A data vector of group-specific response values
f A general, real-valued function of i and Xi
A vector of group-specific errors, assumed to be independent,i identically, normally distributed, and independent
of bi
rt
yCe
exp(Rt)exp(Rt
pi
=+ +qi) ij H
qi
ij
Choosing which parameters to model with random effects is an important consideration when building a
mixed-effects model. One technique is to add random effects to all parameters, and use estimates of their
variances to determine their significance in the model. An alternative is to fit the model separately to
each group, without random effects, and look at the variation of the parameter estimates. If an estimate
varies widely across groups, or if confidence intervals for each group have minimal overlap, the
parameter is a good candidate for a random effect.
To introduce fixed effects and random effects bi for all model parameters, reexpress the model as
follows:
+
ij
p
=+
pi
[( )]
exp[( RR Rt p ppi pij+
exp[ RRR Rt
CC Ce
[( )]
qqi q
qqi qij+ Hij
exp(
bt)
=+ E()be+22iij +11i
()
+ beexp(E +bt
33
i
44 iij
+ Hij
In the notation of the general model:
?
E
E
?
?
b
i
1
?
? yi1 ? ?
??
,,
yi = ? ? ? ? i ?? ? ? ?
?????
ti1 ? E
=
?
???
b? ? Xi =?,,? ? E4? ?bi4? ? yini ? ?tini ?
where ni is the number of observations of individual i.Inthiscase,thedesign matrices Ai and Bi are, at
least initially, 4-by-4 identity matrices. Design matrices may be altered, as necessary, to introduce
weighting of individual effects, or time dependency.
Fitting the model and estimating the covariance matrix often leads to further refinements. A relatively
small estimate for the variance of a random effect suggests that it can be removed from the model.
Likewise, relatively small estimates for covariances among certain random effects suggests that a full
covariance matrix is unnecessary. Since random effects are unobserved,
must be estimated indirectly. Specifying a diagonal or block-diagonal covariance pattern for can
improve convergence and efficiency of the fitting algorithm.
Statistics Toolbox functions nlmefit and nlmefitsa fit the general nonlinear mixed-effects model to data,
estimating the fixed and random effects. The functions also estimate the covariance matrix for the
random effects. Additional diagnostic outputs allow you to assess tradeoffs between the number of
model parameters and the goodness of fit.
? ? ? ?? ?+? ?? ?
?M3? ?00 1 0 ??E3 ? ?00 1??b3?
? ? ? ??E4 ? ? ?? ?
??
Thus, the parameteri for any individual in the ith group is: ? ?
??
?
M1i ? ? EE *wi??b1i ?
? M2i ? = ? 14 ? + ?b2i ?
? ?? E2 ? ? ?
?M3i ?? E3? ?b3i ?
? ?? ? ? ?
To specify a covariate model, use the 'FEGroupDesign' option.
'FEGroupDesign' is a p-by-q-by-m array specifying a different p-by-q fixed-effects design matrix for
each of the m groups. Using the previous example, the array resembles the following:
Maximization step: Choose new parameter estimates to maximize the log likelihood function given the
simulated values of the random effects.
3
Both nlmefit and nlmefitsa attempt to find parameter estimates to maximize a likelihood function, which
is difficult to compute. nlmefit deals with the problem by approximating the likelihood function in
various ways, and maximizing the approximate function. It uses traditional optimization techniques that
depend on things like convergence criteria and iteration limits.
nlmefitsa , on the other hand, simulates random values of the parameters in such a way that in the long
run they converge to the values that maximize the exact likelihood function. The results are random, and
traditional convergence tests dont apply. Therefore nlmefitsa provides options to plot the results as the
simulation progresses, and to restart the simulation multiple times. You can use these features to judge
whether the results have converged to the accuracy you desire.
Parameters Specific to nlmefitsa
The following parameters are specific to nlmefitsa. Most control the stochastic algorithm.
Cov0Initialvalueforthecovariancematrix PSI.Mustbean r-byr positive definite matrix. If empty, the
default value depends on the values of BETA0.
ComputeStdErrors true to compute standard errors for the coefficient estimates and store them in
the output STATS structure, or false (default) to omit this computation.
LogLikMethod Specifies the method for approximating the log likelihood.
NBurnIn Number of initial burn-in iterations during which the parameter estimates are not
recomputed. Default is 5.
NIterations Controls how many iterations are performed for each of three phases of the algorithm.
NMCMCIterations Number of Markov Chain Monte Carlo (MCMC) iterations.
Model and Data Requirements
There are some differences in the capabilities of nlmefit and nlmefitsa. Therefore some data and models
are usable with either function, but some may require you to choose just one of them.
Error models nlmefitsa supports a variety of error models. For example, the standard deviation of
the response can be constant, proportional to the function value, or a combination of the two. nlmefit fits
models under the assumption that the standard deviation of the response is constant. One of the error
models, 'exponential', specifies that the log of the response has a constant standard deviation. You can fit
such models using nlmefit by providing the log response as input, and by rewriting the model function to
produce the log of the nonlinear function value.
Random effects Both functions fit data to a nonlinear function with parameters, and the parameters
may be simple scalar values or linear functions of covariates. nlmefit allows any coefficients of the
linear functions to have both fixed and random effects. nlmefitsa supports random effects only for the
constant (intercept) coefficient of the linear functions, but not for slope coefficients. So in the example
in Specifying Covariate Models on page 9-224, nlmefitsa can treat only the first three beta values as
random effects.
Model form nlmefit supports a very general model specification, with few restrictions on the
design matrices that relate the fixed coefficients and the random effects to the model parameters.
nlmefitsa is more restrictive:
- The fixed effect design must be constant in every group (for every individual), so an observationdependent design is not supported.
- The random effect design must be constant for the entire data set, so neither an observation-dependent
design nor a group-dependent design is supported.
- As mentioned under Random Effects, the random effect design must not specify random effects for
slope coefficients. This implies that the design must consist of zeros and ones.
- The random effect design must not use the same random effect for multiple coefficients, and cannot
use more than one random effect for any single coefficient.
- The fixed effect design must not use the same coefficient for multiple parameters. This implies that it
can have at most one nonzero value in each column.
If you want to use nlmefitsa for data in which the covariate effects are random, include the covariates
directly in the nonlinear model expression. Dont include the covariates in the fixed or random effect
design matrices.
Convergence As described in the Model form, nlmefit and nlmefitsa have different approaches to
measuring convergence. nlmefit uses traditional optimization measures, and nlmefitsa provides
diagnostics to help you judge the convergence of a random simulation.
In practice, nlmefitsa tends to be more robust, and less likely to fail on difficult problems. However,
nlmefit may converge faster on problems where it converges at all. Some problems may benefit from a
combined strategy, for example by running nlmefitsa for a while to get reasonable parameter estimates,
and using those as a starting point for additional iterations using nlmefit.
Use statset to set the value of Outputfcn to be a function handle, that is, the name of the function
preceded by the @ sign. For example, if the output function is outfun.m, the command
2
Stopping an Iteration Based on GUI Input. IfyoudesignaGUIto perform nlmefit iterations, you can
make the output function stop when auserclicksa Stop button on the GUI. For example, the following
code implements a dialog to cancel calculations:
function retval = stop_outfcn(beta,str,status)
persistent h stop;
if isequal(str.inner.state,'none')
switch(status)
case 'init'
% Initialize dialog
stop = false;
h = msgbox('Press STOP to cancel calculations.',...
'NLMEFIT: Iteration 0 ');
button = findobj(h,'type','uicontrol'); set(button,'String','STOP','Callback',@stopper) pos =
get(h,'Position');
pos(3) = 1.1 * pos(3);
set(h,'Position',pos)
drawnow
case 'iter'
% Display iteration number in the dialog title set(h,'Name',sprintf('NLMEFIT: Iteration %d',...
str.iteration))
drawnow;
case 'done'
% Delete dialog
delete(h);
end
end
if stop
% Stop if the dialog button has been pressed
delete(h)
end
retval = stop;
function stopper(varargin)
% Set flag to stop when button is pressed stop = true;
disp('Calculation stopped.')
end
end
Sample Output Function
nmlefitoutputfcn is the sample Statistics Toolbox output function for nlmefit and nlmefitsa. It initializes
or updates a plot with the fixed-effects (BETA) and variance of the random effects (diag(STATUS.Psi)).
For nlmefit, the plot also includes the log-likelihood (STATUS.fval).
nlmefitoutputfcn is the default output function for nlmefitsa.Touseit with nlmefit, specify a function
handle for it in the options structure:
opt = statset('OutputFcn', @nlmefitoutputfcn, ) beta = nlmefit( , 'Options', opt, )
To prevent nlmefitsa from using of this function, specify an empty value for the output function:
opt = statset('OutputFcn', [], ) beta = nlmefitsa( , 'Options', opt, )
nlmefitoutputfcn stops nlmefit or nlmefitsa if you close the figure that it produces.
Specifying Mixed-Effects
Models on page 9-221 discusses a useful model for this type of data. Construct the model via an
anonymous function as follows:
model = @(phi,t)(phi(1)*exp(-exp(phi(2))*t) + ...
phi(3)*exp(-exp(phi(4))*t));
Use the nlinfit function to fit the model to all of the data, ignoring subject-specific effects:
phi0 = [1 2 1 1];
[phi,res] = nlinfit(time,concentration,model,phi0);
numObs = length(time);
numParams = 4;
df = numObs-numParams;
mse = (res'*res)/df
mse =
0.0304
tplot = 0:0.01:8;
plot(tplot,model(phi,tplot),'k','LineWidth',2)
hold off
A box plot of residuals by subject shows that the boxes are mostly above or below zero, indicating that
the model has failed to account for subject-specific effects:
colors = 'rygcbm';
h = boxplot(res,subject,'colors',colors,'symbol','o'); set(h(~isnan(h)),'LineWidth',2)
hold on
boxplot(res,subject,'colors','k','symbol','ko')
grid on
xlabel('Subject') ylabel('Residual') hold off
To account for subjectspecific effects, fit the model separately to the data for each subject:
phi0 = [1 2 1 1];
PHI = zeros(4,6);
RES = zeros(11,6);
forI =1:6
tI = time(subject == I);
cI = concentration(subject == I);
[PHI(:,I),RES(:,I)] = nlinfit(tI,cI,model,phi0);
end
PHI
PHI =
2.0293 2.8277 5.4683 2.1981 3.5661 3.0023
0.5794 0.8013 1.7498 0.2423 1.0408 1.0882
0.1915 0.4989 1.6757 0.2545 0.2915 0.9685
-1.7878 -1.6354 -0.4122 -1.6026 -1.5069 -0.8731
numParams = 24;
df = numObs-numParams; mse = (RES(:)'*RES(:))/df mse =
0.0057
gscatter(time,concentration,subject)
xlabel('Time (hours)')
ylabel('Concentration (mcg/ml)')
PHI gives estimates of the four model parameters for each of the six subjects. The estimates vary
considerably, but taken as a 24-parameter model of the data, the mean-squared error of 0.0057 is a
significant reduction from 0.0304 in the original four-parameter model.
A box plot of residuals by subject shows that the larger model accounts for most of the subject-specific
effects:
h = boxplot(RES,'colors',colors,'symbol','o');
set(h(~isnan(h)),'LineWidth',2)
hold on
boxplot(RES,'colors','k','symbol','ko')
grid on
xlabel('Subject')
ylabel('Residual')
hold off
PSI =
0.3264000
0 0.0250 0 0
0 0 0.0124 0
0 0 0 0.0000
stats = dfe: 57
logl: 54.5882
mse: 0.0066
rmse: 0.0787
errorparam: 0.0815
aic: -91.1765
bic: -93.0506
covb: [4x4 double]
sebeta: [0.2558 0.1066 0.1092 0.2244] ires: [66x1 double]
pres: [66x1 double]
iwres: [66x1 double]
pwres: [66x1 double]
The mean-squared error of 0.0066 is comparable to the 0.0057 of the
24-parameter model without random effects, and significantly better than the
0.0304 of the four-parameter model without random effects.
The estimated covariance matrix PSI shows that the variance of the fourth random effect is essentially
zero, suggesting that you can remove it to simplify the model. To do this, use the REParamsSelect
parameter to specify the indices of the parameters to be modeled with random effects in nlmefit:
[phi,PSI,stats] = nlmefit(time,concentration,subject, ... [],nlme_model,phi0, ... 'REParamsSelect',[1 2 3])
phi =
2.8277
0.7728
0.4605
-1.3460
PSI =
0.3270 0 0
0 0.0250 0
0 0 0.0124
stats = dfe: 58
logl: 54.5875
mse: 0.0066
rmse: 0.0780
errorparam: 0.0815
aic: -93.1750
bic: -94.8410
covb: [4x4 double]
sebeta: [0.2560 0.1066 0.1092 0.2244] ires: [66x1 double]
pres: [66x1 double]
iwres: [66x1 double]
pwres: [66x1 double]
The log-likelihood logl is almost identical to what it was with random effects for all of the parameters,
the Akaike information criterion aic is reduced from -91.1765 to -93.1750, and the Bayesian information
criterion bic is reduced from -93.0506 to -94.8410. These measures support the decision to drop the
fourth random effect.
Refitting the simplified model with a full covariance matrix allows for identification of correlations
among the random effects. To do this, use the CovPattern parameter to specify the pattern of nonzero
elements in the covariance matrix:
[phi,PSI,stats] = nlmefit(time,concentration,subject, ... [],nlme_model,phi0, ... 'REParamsSelect',[1 2 3],
... 'CovPattern',ones(3))
phi =
2.8148
0.8293
0.5613
-1.1407
PSI =
0.4767 0.1152 0.0499
0.1152 0.0321 0.0032
0.0499 0.0032 0.0236
stats = dfe: 55
logl: 58.4731
mse: 0.0061
rmse: 0.0782
errorparam: 0.0781
aic: -94.9462
bic: -97.2369
covb: [4x4 double]
sebeta: [0.3028 0.1103 0.1179 0.1662] ires: [66x1 double]
pres: [66x1 double]
iwres: [66x1 double]
pwres: [66x1 double]
cwres: [66x1 double]
The estimated covariance matrix PSI shows that the random effects on the first two parameters have a
relatively strong correlation, and both have a relatively weak correlation with the last random effect.
This structure in the covariance matrix is more apparent if you convert PSI to a correlation matrix using
corrcov:
RHO = corrcov(PSI)
RHO =
1.0000 0.9316 0.4706 0.9316 1.0000 0.1178 0.4706 0.1178 1.0000
clf; imagesc(RHO)
set(gca,'XTick',[1 2 3],'YTick',[1 2 3]) title('{\bf Random Effect Correlation}') h = colorbar;
set(get(h,'YLabel'),'String','Correlation');
Incorporate this
structure into the model by changing the specification of the covariance pattern to block-diagonal:
P = [1 1 0;1 1 0;0 0 1] % Covariance pattern
P=
110
110
001
[phi,PSI,stats,b] = nlmefit(time,concentration,subject, ...
[],nlme_model,phi0, ...
'REParamsSelect',[1 2 3], ...
'CovPattern',P)
phi =
2.7830
0.8981
0.6581
-1.0000
PSI =
0.5180 0.1069 0
0.1069 0.0221 0
0 0 0.0454
stats = dfe: 57
logl: 58.0804
mse: 0.0061
rmse: 0.0768
errorparam: 0.0782
aic: -98.1608
bic: -100.0350
covb: [4x4 double]
sebeta: [0.3171 0.1073 0.1384 0.1453] ires: [66x1 double]
pres: [66x1 double]
iwres: [66x1 double]
pwres: [66x1 double]
cwres: [66x1 double]
b=
-0.8507 -0.1563 1.0427 -0.7559 0.5652 0.1550
-0.1756 -0.0323 0.2152 -0.1560 0.1167 0.0320
-0.2756 0.0519 0.2620 0.1064 -0.2835 0.1389
The block-diagonal covariance structure reduces aic from -94.9462 to
-98.1608 and bic from -97.2368 to -100.0350 without significantly affecting the log-likelihood. These
measures support the covariance structure used in the final model.
The output b gives predictions of the three random effects for each of the six subjects. These are
combined with the estimates of the fixed effects in phi to produce the mixed-effects model.
Use the following commands to plot the mixed-effects model for each of the six subjects. For
comparison, the model without random effects is also shown.
PHI = repmat(phi,1,6) + ... % Fixed effects [b(1,:);b(2,:);b(3,:);zeros(1,6)]; % Random effects
RES = zeros(11,6); % Residuals
colors = 'rygcbm';
forI =1:6
fitted_model = @(t)(PHI(1,I)*exp(-exp(PHI(2,I))*t) + ...
If obvious outliers in the data (visible in previous box plots) are ignored, a normal probability plot of the
residuals shows reasonable agreement with model assumptions on the errors:
clf; normplot(RES(:))
plot(x2,f2,'r');
end
end
5
The upper probability plots look straight, meaning the residuals are normally distributed. The bottom
histogram plots match the superimposed normal density plot. So you can conclude that the error model
matches the data.
For comparison, fit the data using a constant error model, instead of the proportional model that
created the data:
6
The upper probability plots are not straight, indicating the residuals are not normally distributed. The
bottom histogram plots are fairly close to the superimposed normal density plots.
7
For another comparison, fit the data to a different structural model than created the data:
Not only are the upper probability plots not straight, but the histogram plot is quite skewed compared to
the superimposed normal density. These residuals are not normally distributed, and do not match the
model.
But the nonlinear model can also be transformed to a linear one by taking the log on both sides, to get
log(y) = log(p1) + p2*x. Thats tempting, because we can fit that linear model by ordinary linear least
squares. The coefficients wed get from a linear least squares would be log(p1) and p2.
paramEstsLin = [ones(size(x)), x] \ log(y); paramEstsLin(1) = exp(paramEstsLin(1))
paramEstsLin =
11.9312
-0.4462
How did we do? We can superimpose the fit on the data to find out.
xx = linspace(min(x), max(x)); yyLin = modelFun(paramEstsLin, xx);
plot(x,y,'o', xx,yyLin,'-');
xlabel('x'); ylabel('y');
legend({'Raw data','Linear fit on the log scale'},'location','NE');
Something seems to have gone wrong, because the fit doesnt really follow the trend that we can see in
the raw data. What kind of fit would we get if we used nlinfit to do nonlinear least squares instead?
Well use the previous fit as a rough starting point, even though its not a great fit.
paramEsts = nlinfit(x, y, modelFun, paramEstsLin)
paramEsts =
8.8145
-0.2885
yy = modelFun(paramEsts,xx);
plot(x,y,'o', xx,yyLin,'-', xx,yy,'-');
xlabel('x'); ylabel('y');
legend({'Raw data','Linear fit on the log scale', ...
'Nonlinear fit on the original scale'},'location','NE');
The fit using nlinfit more or less passes through the center of the data point scatter. A residual plot
shows something approximately like an even scatter about zero.
r = y-modelFun(paramEsts,x);
plot(x,r,'+', [min(x) max(x)],[0 0],'k:');
xlabel('x'); ylabel('residuals');
So what went wrong with the linear fit? The problem is in log transform. If we plot the data and the two
fits on the log scale, we can see that theres an extreme outlier.
plot(x,log(y),'o', xx,log(yyLin),'-', xx,log(yy),'-'); xlabel('x'); ylabel('log(y)');
ylim([-5,3]);
legend({'Raw data', 'Linear fit on the log scale', ...
'Nonlinear fit on the original scale'},'location','SW');
That observation is not an outlier in the original data, so what happened to make it one on the log scale?
The log transform is exactly the right thing to straighten out the trend line. But the log is a very
nonlinear transform, and so symmetric measurement errors on the original scale have become
asymmetric on the log scale. Notice that the outlier had the smallest y value on the original scale -- close
to zero. The log transform has "stretched out" that smallest y value more than its neighbors. We made
the linear fit on the log scale, and so it is very much affected by that outlier.
Hadthemeasurementatthatonepointbeenslightlydifferent,thetwofits might have been much more similar.
For example,
y(11) = 1;
paramEsts = nlinfit(x, y, modelFun, [10;-.3])
paramEsts =
8.7618
-0.2833
paramEstsLin = [ones(size(x)), x] \ log(y); paramEstsLin(1) = exp(paramEstsLin(1))
paramEstsLin =
9.6357
-0.3394
yy = modelFun(paramEsts,xx);
yyLin = modelFun(paramEstsLin, xx);
plot(x,y,'o', xx,yyLin,'-', xx,yy,'-');
xlabel('x'); ylabel('y');
legend({'Raw data', 'Linear fit on the log scale', ...
'Nonlinear fit on the original scale'},'location','NE');
Still, the two fits are different. Which one is "right"? To answer that, suppose that instead of additive
measurement errors, measurements of y were affected by multiplicative errors. These errors would not
be symmetric, and least squares on the original scale would not be appropriate. On the other hand, the
log transform would make the errors symmetric on the log scale, and the linear least squares fit on that
scale is appropriate.
So, which method is "right" depends on what assumptions you are willing to make about your data. In
practice, when the noise term is small relative to the trend, the log transform is "locally linear" in the
sense that y values near the same x value will not be stretched out too asymmetrically. In that case, the
two methods lead to essentially the same fit. But when the noise term is not small, you should consider
what assumptions are realistic, and choose an appropriate fitting method.