100% found this document useful (1 vote)
108 views

lecture 6- Parameter Estimation in Time Series Models

Uploaded by

azizchaouahi89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
108 views

lecture 6- Parameter Estimation in Time Series Models

Uploaded by

azizchaouahi89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Parameter Estimation in Time Series Models

I The general model we have considered is the ARIMA(p, d, q)


model.
I Now assume we have chosen appropriate values of p, d, and q
(possibly based on evidence from the ACF, PACF, and/or
EACF plots).
I Assume that our observed time series data Y1, . . . , Yn follow a
stationary ARMA(p, q) model.
I In the case of nonstationary original data, we can assume that
taking d differences has produced differenced data that displays
stationarity.
I Our objective is to estimate the unknown parameters in that
stationary ARMA(p, q) model.
Method of Moments Estimation
I One of the easiest methods of parameter estimation is the
method of moments (MOM).
I The basic idea is to find expressions for the sample moments
and for the population moments and equate them:
n
1X r
Xi = E (X r )
n
i=1

I The E (X r ) expression will be a function of one or more


unknown parameters.
I If there are, say, 2 unknown parameters, we would set up
MOM equations for r = 1, 2, and solve these 2 equations
simultaneously for the two unknown parameters.
I In the simplest case, if there is only 1 unknown parameter to
estimate, then we equate the sample mean to the true mean
of the process and solve for the unknown parameter.
MOM with AR models

I First, we consider autoregressive models.


I In the simplest case, the AR(1) model, given by
Yt = φYt−1 + et , the true lag-1 autocorrelation ρ1 = φ.
I For this type of model, a method-of-moments estimator would
simply equate the true lag-1 autocorrelation to the sample
lag-1 autocorrelation r1 .
I So our MOM estimator of the unknown parameter φ would be
φ̂ = r1 .
MOM with an AR(2) model

I In the AR(2) model, we have unknown parameters φ1 and φ2 .


I From the Yule-Walker equations,

ρ1 = φ1 + ρ1 φ2 and ρ2 = ρ1 φ1 + φ2

I In the method of moments, we will replace the true lag-1 and


lag-2 autocorrelations, ρ1 and ρ2 , by the sample
autocorrelations r1 and r2 , respectively.
MOM with an AR(2) model, continued

I That gives the equations

r1 = φ1 + r1 φ2 and r2 = r1 φ1 + φ2

which are then solved for φ1 and φ2 to obtain

r1 (1 − r2 ) r2 − r12
φ̂1 = and φ̂ 2 =
1 − r12 1 − r12

I The general AR(p) model is estimated in a similar way, with


the Yule-Walker equations being used to obtain the
Yule-Walker estimates φ̂1 , φ̂2 , . . . , φ̂p .
ar(....,...,....,method='yw')
yw(x, p)
MOM with MA Models

I We run into problems when trying to using the method of


moments to estimate the parameters of moving average
models.
I Consider the simple MA(1) model, Yt = et − θet−1 .
I The true lag-1 autocorrelation in this model is
ρ1 = −θ/(1 + θ2 ).
I If we equate ρ1 to r1 , we get a quadratic equation in θ.
I If |r1 | < 0.5, then only one of the two real solutions satisfies
the invertibility condition |θ| < 1.
 q 
I That solution is θ̂ = −1 + 1 − 4r12 /(2r1 ).
I But if |r1 | = 0.5, no invertible solution exists, and if |r1 | > 0.5,
then no real solution at all exists, and the method of moments
fails to give any estimator of θ.
More MOM Problems with MA Models

I With higher-order MA(q) models, the set of equations for


estimating θ1 , . . . , θq is highly nonlinear and could only be
solved numerically.
I There would be many solutions, only one of which is invertible.
I In any case, for MA(q) models, the method of moments
usually produces poor estimates, so it is not recommended to
use MOM to estimate MA models.
MOM Estimation of Mixed ARMA Models

I Consider only the simplest mixed model, the ARMA(1, 1)


model.
I Since ρ2 /ρ1 = φ, a MOM estimator of φ is φ̂ = r2 /r1 .
I Then the equation

(1 − θφ̂)(φ̂ − θ)
r1 =
1 − 2θφ̂ + θ2
can be used to solve for an estimate of θ.
I This is a quadratic equation is θ, and so we again keep only
the invertible solution (if any exist) as our θ̂.
MOM Estimation of the Noise Variance

I We still must estimate the variance σe2 of our error


component.
I For any model, we first estimate the variance of the time
series process itself, γ0 = var (Yt ), by the sample variance
n
1 X
s2 = (Yt − Ȳ )2
n−1
i=1

I Then we can take advantage of known relationships among


the parameters in our specified model to obtain a formula for
σ̂e2 .
Formulas for MOM Noise Variance Estimators in Common
Models

I For AR(p) models, σ̂e2 = (1 − φ̂1 r1 − φ̂2 r2 − · · · − φ̂p rp )s 2 .


I For the AR(1) model, this reduces to σ̂e2 = (1 − r12 )s 2 .
I For MA(q) models,

s2
σ̂e2 = .
1 + θ̂12 + θ̂22 + · · · + θ̂q2

I For ARMA(1, 1) models,

1 − φ̂2
σ̂e2 = s 2.
1 − 2φ̂θ̂ + θ̂2
MOM Estimation in Some Simulated Time Series

I The course web page has R code to estimate the parameters


in several simulated AR, MA, and ARMA models.
I The estimates of the AR parameters are good, but the
estimates of the MA parameters are poor.
I In general, MOM estimators for models with MA terms are
inefficient.
Least Squares Estimation

I Since method-of-moments performs poorly for some models,


we examine another method of parameter estimation: Least
Squares.
I We first consider autoregressive models.
I We assume our time series is stationary (or that the time
series has been transformed so that the transformed data can
be modeled as stationary).
I To account for the possibility that the mean is nonzero, we
subtract µ from each observation and treat µ as a parameter
to be estimated.
LS Estimation for the AR(1) Model

I Consider the mean-centered AR(1) model:

Yt − µ = φ(Yt−1 − µ) + et

I The least squares method seeks the parameter values that


minimize the sum of squared differences:
n
X
Sc (φ, µ) = [(Yt − µ) − φ(Yt−1 − µ)]2
t=2

I This criterion is called the conditional sum-of-squares function


(CSS).
LS Estimation of µ for the AR(1) Model

I Taking the derivative of CSS with respect to µ, setting equal


to 0 and solving for µ, we obtain the LS estimator of µ:
n
X n 
1 X
µ̂ = Yt − φ Yt−1
(n − 1)(1 − φ)
t=2 t=2

I For large n, this µ̂ ≈ Ȳ , regardless of the value of φ.


LS Estimation of φ for the AR(1) Model

I Taking the derivative of CSS with respect to φ, setting equal


to 0 and solving for φ, we obtain the LS estimator of φ:
Pn
(Yt − Ȳ )(Yt−1 − Ȳ )
φ̂ = t=2 Pn 2
t=2 (Yt−1 − Ȳ )

I This estimator is almost identical to r1.


I So, especially for large n, the LS and MOM estimators are
nearly identical in the AR(1) model.
I In the general AR(p) model, the LS estimators of µ and of
φ1, . . . , φp are approximately equal to the MOM estimators,
especially for large samples.

# Least-squares estimation of the simulated MA(1) series:


arima(ma1.4.s,order=c(0,0,1),method='CSS',include.mean=F)
LS Estimation for Moving Average Models

I Consider now the MA(1) model:

Yt = et − θet−1

I Recall that this can be written as

Yt = −θYt−1 − θ2 Yt−2 − θ3 Yt−3 − · · · + et .

I So a least squares estimator of θ can be obtained by finding


the value of θ that minimizes
X
Sc (θ) = [Yt + θYt−1 + θ2 Yt−2 + θ3 Yt−3 + · · · ]2

I But this is nonlinear in θ, and the infinite series causes


technical problems.
LS Estimation for Moving Average Models
I Instead, we proceed by conditioning on one previous value of
et . Note that
et = Yt + θet−1
I If we set e0 = 0, then we have the set of recursive equations
e1 = Y1 , e2 = Y2 + θe1 , . . . , en = Yn + θen−1 .
I Since we know Y1 , Y2 , . . . , Yn (these are the observed data
values) and can calculate the e1 , e2 , . . . , en recursively, the
only unknown quantity here is θ.
I We can do a numerical search for the value of θ (within P the
invertible range between −1 and 1) that minimizes (et )2 ,
conditional on e0 = 0.
I A similar approach works for higher-order MA(q) models,
except that we assume e0 = e−1 = · · · = e−q = 0 and the
numerical search is multidimensional, since we are estimating
θ1 , . . . , θ q .
LS Estimation for ARMA Models

I With the ARMA(1, 1) model:

Yt = φYt−1 + et − θet−1 ,

we note that
et = Yt − φYt−1 + θet−1
and minimize Sc (φ, θ) = nt=2 et2 ; note that the sum starts at
P
t = 2 to avoid having to choose an “initial” value Y0 .
I With the general ARMA(p, q) model, the procedure is similar,
except that we assume ep = ep−1 = · · · = ep+1−q = 0, and we
estimate φ1 , . . . , φp , θ1 , . . . , θq .
I For large samples, when the parameter sets yield invertible
models, the initial values for ep , ep−1 , . . . , ep+1−q have little
effect on the final parameter estimates.
Maximum Likelihood Estimation

I On the other hand, for small to moderate sample sizes (and


for stochastic seasonal models), assuming
ep = ep−1 = · · · = ep+1−q = 0 can greatly affect the final
parameter estimates, which is undesirable.
I In those cases, rather than using least squares, it may be
advantageous to use maximum likelihood (ML) estimation.
I An advantage of ML estimation is that it uses all of the
information in the data (not just the first few moments as in
MOM).
I Also, many large-sample results are known about the sampling
distribution of ML estimators.
I A disadvantage of ML estimation is that we must assume the
form of the joint probability distribution of the time series
process.
Maximum Likelihood in Time Series Models

I The likelihood function is the joint density function of the


data, but treated as a function of the unknown parameters,
given the observed data Y1 , . . . , Yn .
I For the models we have studied, The likelihood L is a function
of the φ’s, θ’s, µ, and σe2 , given the observed Y1 , . . . , Yn .
I The maximum likelihood estimates (MLEs) are the values of
the parameters that maximize this likelihood function, i.e.,
that are the “most likely” parameter values given the data we
actually observed.
Maximum Likelihood in the AR(1) Model

I In the AR(1) model with an unknown but constant mean, the


parameters we must estimate are φ, µ, and σe2 .
I To perform ML estimation in the AR(1) model, we must
assume a distribution for our data.
I The typical assumption is that the {et } in the AR(1) model
are iid N(0, σe2 ) random variables.
I Under this assumption, the likelihood function (details are
given on page 159) is:
 
2 2 −n/2 2 1/2 1
L(φ, µ, σe ) = (2πσe ) (1 − φ ) exp − 2 S(φ, µ)
2σe

where
S(φ, µ) = nt=2 [(Yt − µ) − φ(Yt−1 − µ)]2 + (1 − φ2 )(Y1 − µ).
P
MLE’s in the AR(1) Model

I This S(φ, µ) is called the unconditional sum-of-squares


function.
I We must find estimates φ̂, µ̂, and σ̂e2 that maximize the
likelihood function (in practice, we typically maximize the
log-likelihood function, which produces equivalent estimates).
I The estimator of the noise variance σe2 , in terms of the other
estimates, is
S(φ̂, µ̂)
σ̂e2 = .
n
I Note that dividing by n − 2 rather than n produces a less
biased estimator, but for large sample sizes, this makes little
practical difference.
MLE’s in the AR(1) Model

I We still need to estimate φ and µ.


I Comparing the unconditional sum-of-squares function to the
conditional sum-of-squares function we saw earlier, note that
S(φ, µ) = Sc (φ, µ) + (1 − φ2 )(Y1 − µ)2 , so for large sample
sizes, S(φ, µ) ≈ Sc (φ, µ).
I This implies that our ML estimates of φ and µ will be very
similar to the LS estimates, at least for large sample sizes.
I The likelihood function for general ARMA models is more
complicated, but ML estimates can usually be found in these
models.
I In practice, for AR models, MA models, or general ARMA or
ARIMA models, we can often find either the LS estimates or
the ML estimates easily using R.
Properties of the Estimators

I Recall that LS estimators and ML estimators become


approximately equal for large samples.
I So the large-sample properties of LS estimators and ML
estimators are identical for basic ARMA-type models.
I For large n, these estimators are approximately unbiased and
normally distributed.
I Note: For AR models, MOM estimators have identical
large-sample properties as LS and ML estimators.
I But for models with MA terms, MOM estimators have poor
performance and should not be used!
I For some common models, variance and correlation results for
the estimators are given on page 161.
Properties of the Estimators in AR(1) and MA(1) models

I For example, for the AR(1) model, var (φ̂) ≈ (1 − φ2 )/n, and
for the MA(1) model, var (θ̂) ≈ (1 − θ2 )/n.
I Clearly, the variance of the estimator decreases (i.e., the
precision improves) as n increases.
I For the AR(1) model, the variance of the estimator φ̂ will be
low when the true φ is near 1.
I For the MA(1) model, the variance of the estimator θ̂ will be
low when the true θ is near 1.
Large-sample Inference about the Model Parameters
I When the model parameters are estimated by the ML method,
then the ML estimators are approximately normally distributed
when n is large.
I So we can use normal-based inference to get, say, confidence
intervals for the true values of the parameters.
I For example, it may be of interest to know whether 0 is a
plausible value of some parameter.
I For large samples, a (1 − α)100% CI for a parameter takes the
form:
estimate ± (zα/2 )(estimated standard error)
I For example, in an AR(1) model, a 95% CI for φ is:
q
φ̂ ± 1.96 (1 − φ̂2 )/n
I For example, in an MA(1) model, a 90% CI for θ is:
q
θ̂ ± 1.645 (1 − θ̂2 )/n
Small-sample Inference about the Model Parameters

I The ML estimators are not necessarily approximately normally


distributed when n is small.
I So when n is small, we can use a more general approach,
bootstrap-based inference, to get confidence intervals for the
true values of the parameters.
I Section 7.6 gives details about bootstrap intervals.
I Some R examples give code for calculating 95% bootstrap CIs
for ARIMA-type model parameters using four different
methods; note that Method IV makes the fewest assumptions
about the error distribution.
I The bootstrap method also makes it possible to construct CIs
about relevant functions of the model parameters.

You might also like