0% found this document useful (0 votes)
0 views

STA03B3 Lecture 18

Uploaded by

lihlekhambula718
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

STA03B3 Lecture 18

Uploaded by

lihlekhambula718
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Statistics 3B (STA03B3)

STOCHASTIC PROCESSES

Lecture 18

Chapter 3 → ARIMA Models (Part 5)

Dr V. van Appel
Department of Statistics
Faculty of Science, University of Johannesburg

1 / 30
Outline

Building ARIMA Models


Example: Analysis of GNP Data

2 / 30
Building ARIMA Models

There are a few basic steps to fitting ARIMA models to time series
data. These steps involve
▶ plotting the data,
▶ possibly transforming the data,
▶ identifying the dependence orders of the model,
▶ parameter estimation,
▶ diagnostics, and
▶ model choice.

3 / 30
▶ First, as with any data analysis, we should construct a time
plot of the data, and inspect the graph for any anomalies.
▶ If, for example, the variability in the data grows with time, it
will be necessary to transform the data to stabilize the
variance.
▶ In such cases, the Box-Cox class of power transformations
could be employed.
▶ Also, the particular application might suggest an appropriate
transformation.
▶ For example, frequently,

∇[log(xt )] = log(xt ) − log(xt−1 )

is called the return or growth rate. Also, it will be a relatively


stable process.
▶ After suitably transforming the data, the next step is to
identify preliminary values of the autoregressive order, p, the
order of differencing, d, and the moving average order, q.
4 / 30
Model identification
▶ A time plot of the data will typically suggest whether any
differencing is needed.
▶ If differencing is called for, then difference the data once,
d = 1, and inspect the time plot of ∇xt .
▶ If additional differencing is necessary, then try differencing
again and inspect a time plot of ∇2 xt .

▶ Be careful not to overdifference because this may


introduce dependence where none exists.
▶ For example, xt = wt is serially uncorrelated, but
∇xt = wt − wt−1 is MA(1).

5 / 30
▶ In addition to time plots, the sample ACF can help in
indicating whether differencing is needed.
▶ Because the polynomial ϕ(z)(1 − z)d has a unit root, the
sample ACF, ρ̂(h), will not decay to zero fast as h increases.
▶ Thus, a slow decay in ρ̂(h) is an indication that differencing
may be needed.
▶ When preliminary values of d have been settled, the next step
is to look at the sample ACF and PACF of ∇d xt for whatever
values of d have been chosen.

6 / 30
▶ Using the following Table as a guide, preliminary values of p
and q are chosen.
▶ Note that it cannot be the case that both the ACF and PACF
cut off.
▶ Because we are dealing with estimates, it will not always be
clear whether the sample ACF or PACF is tailing off or cutting
off.
▶ Also, two models that are seemingly different can actually be
very similar.
▶ With this in mind, we should not worry about being so precise
at this stage of the model fitting.
▶ At this point, a few preliminary values of p, d, and q should
be at hand, and we can start estimating the parameters.

7 / 30
Table: Behavior of the ACF and PACF for ARMA Models
AR(p) MA(q) ARMA(p, q)
ACF Tails off Cuts off after lag q Tails off
PACF Cuts off after lag p Tails off Tails off

8 / 30
Example: Analysis of GNP Data
▶ In this example, we consider the analysis of quarterly U.S.
GNP from 1947(1) to 2002(3), n = 223 observations.
▶ The data are real U.S. gross national product in billions of
chained 1996 dollars and have been seasonally adjusted.
▶ The data were obtained from the Federal Reserve Bank of St.
Louis (http://research. stlouisfed.org/).
▶ The following Figure shows a plot of the data, say, yt .

> library(astsa)
> gnp1=Data3$GNP[1:223]
> gnp<-ts(gnp1,start=c(1947,01,01),frequency=4)
> par(mfrow=c(2,1))
> plot(gnp, ylab="Billions of Dollars",type="l",
+ xlab="Time")
> acf(gnp,lag.max=48,main=NA) # will produce values

9 / 30
10000
Billions of Dollars

0 4000
1950 1960 1970 1980 1990 2000

Time
0.0 0.4 0.8
ACF

0 2 4 6 8 10 12

Lag

Figure: Top: Quarterly U.S. GNP from 1947(1) to 2002(3). Bottom:


Sample ACF of the GNP data. Lag is in terms of years.

10 / 30
▶ Because strong trend tends to obscure other effects, it is
difficult to see any other variability in data except for periodic
large dips in the economy.
▶ When reports of GNP and similar economic indicators are
given, it is often in growth rate (percent change) rather than
in actual (or adjusted) values that is of interest.
▶ The growth rate, say, xt = ∇ log(yt ) is plotted in the
following Figure, and it appears to be a stable process.

> gnpgr1<-diff(log(gnp))
> gnpgr<-ts(gnpgr1,start=c(1947,01,01),frequency=4)
> plot(gnpgr, ylab="GNP Growth Rate",type="l",
+ xlab="Time")

11 / 30
0.06
0.04
GNP Growth Rate

0.02
0.00
−0.02

1950 1960 1970 1980 1990 2000

Time

Figure: U.S. GNP quarterly growth rate. The horizontal line displays the
average growth of the process, which is close to 1%.

12 / 30
▶ The sample ACF and PACF of the quarterly growth rate are
plotted in the following Figure.
▶ Inspecting the sample ACF and PACF, we might feel that the
ACF is cutting off at lag 2 and the PACF is tailing off.
▶ This would suggest the GNP growth rate follows an MA(2)
process, or log GNP follows an ARIMA(0, 1, 2) model.
▶ Rather than focus on one model, we will also suggest that it
appears that the ACF is tailing off and the PACF is cutting off
at lag 1.
▶ This suggests an AR(1) model for the growth rate, or
ARIMA(1, 1, 0) for log GNP.
▶ As a preliminary analysis, we will fit both models.

13 / 30
Figure: Sample ACF and PACF of the GNP quarterly growth rate. Lag is
in terms of years

14 / 30
Using MLE to fit the MA(2) model for the growth rate, xt , the
estimated model is

x̂t = 0.008(0.001) + 0.303(0.065) ŵt−1 + 0.204(0.064) ŵt−2 + ŵt (1)

where σ̂w = 0.0094 is based on 219 degrees of freedom.


▶ The values in parentheses are the corresponding estimated
standard errors.
▶ All of the regression coefficients are significant, including the
constant.
▶ We make a special note of this because, as a default, some
computer packages do not fit a constant in a differenced
model.
▶ That is, these packages assume, by default, that there is no
drift.

15 / 30
▶ In this example, not including a constant leads to the wrong
conclusions about the nature of the U.S. economy.
▶ Not including a constant assumes the average quarterly
growth rate is zero, whereas the U.S. GNP average quarterly
growth rate is about 1% (which can be seen easily in the
above Figure.).
The estimated AR(1) model is

x̂t = 0.008(0.001) (1 − 0.347) + 0.347(0.063) x̂t−1 + ŵt (2)

where σ̂w = 0.0095 is based on 220 degrees of freedom; note that


the constant in (2) is 0.008(1 − 0.347) = 0.005.

16 / 30
Model diagnostics
▶ We will discuss diagnostics next, but assuming both of these
models fit well, how are we to reconcile the apparent
differences of the estimated models (1) and (2)?
▶ In fact, the fitted models are nearly the same.
▶ To show this, consider an AR(1) model of the form in (1)
without a constant term; that is,

xt = 0.35xt−1 + wt ,

and write it in its causal form,



X
xt = ψj wt−j ,
j=0

where we recall ψj = 0.35j .

17 / 30
Thus,

ψ0 = 1
ψ1 = 0.350
ψ2 = 0.123
ψ3 = 0.043
ψ4 = 0.015
ψ5 = 0.005
ψ6 = 0.002
ψ7 = 0.001
ψ8 = 0
ψ9 = 0
ψ10 = 0

18 / 30
and so forth. Thus

xt ≈ 0.35wt−1 + 0.12wt−2 + wt ,

which is similar to the fitted MA(2) model in (2).

19 / 30
Residual analysis
The next step in model fitting is diagnostics.
▶ This investigation includes the analysis of the residuals as well
as model comparisons.
▶ Again, the first step involves a time plot of the innovations (or
residuals) or of the standardized innovations.
▶ If the model fits well, the standardized residuals should behave
as an iid sequence with mean zero and variance one.
▶ The time plot should be inspected for any obvious departures
from this assumption.
▶ Investigation of marginal normality can be accomplished
visually by looking at a histogram of the residuals.
▶ In addition to this, a normal probability plot or a Q-Q plot can
help in identifying departures from normality.

20 / 30
Residual analysis
▶ We could also inspect the sample autocorrelations of the
residuals, say, ρ̂e (h), for any patterns or large values.
▶ Recall that, for a white noise sequence, the sample
autocorrelations are approximately independently and normally
distributed with zero means and variances 1/n.
▶ Hence, a good check on the correlation structure of the
residuals is to plot ρ̂e (h) versus h along with the error bounds

of ±2/ n.
▶ The residuals from a model fit, however, will not quite have
the properties of a white noise sequence and the variance of
ρ̂e (h) can be much less than 1/n.

21 / 30
Figure: Diagnostics of the residuals from MA(2) fit on GNP growth rate.

22 / 30
The Ljung-Box-Pierce Q-statistic
▶ In addition to plotting ρ̂e (h), we can perform a general test
that takes into consideration the magnitudes of ρ̂e (h) as a
group.
▶ For example, it may be the case that, individually, each ρ̂e (h)
is small in magnitude, say, each one is just slightly less that

2/ n in magnitude, but, collectively, the values are large.
▶ The Ljung-Box-Pierce Q-statistic given by

H
X ρ̂2e (h)
Q = n(n + 2) (3)
n−h
h=1

can be used to perform such a test.


▶ The value H in (3) is chosen somewhat arbitrarily, typically,
H = 20.
▶ Under the null hypothesis of model adequacy, asymptotically
(n → ∞), Q ∼ χ2H−p−q .
23 / 30
The Ljung-Box-Pierce Q-statistic
▶ Thus, we would reject the null hypothesis at level α if the
value of Q exceeds the (1 − α)− quantile of the χ2H−p−q
distribution.
▶ The basic idea is that if wt is white noise, then nρ̂2w (h) for
h = 1, 2, · · · , H, are asymptotically independent χ21 random
variables.
▶ This means that n H 2 2
P
h=1 ρ̂w (h) is approximately a χH random
variable.
▶ Because the test involves the ACF of residuals from a model
fit, there is a loss of p + q degrees of freedom; the other
values in (3) are used to adjust the statistic to better match
the asymptotic chi-squared distribution.

24 / 30
Example: Diagnostics for GNP Growth Rate Example

Example: Diagnostics for GNP Growth Rate Example


▶ We will focus on the MA(2) fit; the analysis of the AR(1)
residuals is similar.
▶ The above Figure displays a plot of the standardized residuals,
the ACF of the residuals, a boxplot of the standardized
residuals, and the p-values associated with the Q-statistic,
(3), at lags H = 3 through H = 20 (with corresponding
degrees of freedom H − 2).

25 / 30
Example: Diagnostics for GNP Growth Rate Example
▶ Inspection of the time plot of the standardized residuals in the
above Figure shows no obvious patterns.
▶ Notice that there may be outliers, with a few values exceeding
3 standard deviations in magnitude.
▶ The ACF of the standardized residuals shows no apparent
departure from the model assumptions, and the Q-statistic is
never significant at the lags shown.
▶ The normal Q-Q plot of the residuals shows that the
assumption of normality is reasonable, with the exception of
the possible outliers.
▶ The model appears to fit well.
▶ The diagnostics shown in the above Figure are a byproduct of
the sarima command from the previous example.

26 / 30
Example: Model Choice for the U.S. GNP Series

Example: Model Choice for the U.S. GNP Series


▶ Recall that two models, an AR(1) and an MA(2), fit the GNP
growth rate well.
▶ To choose the final model, we compare the AIC, the AICc,
and the BIC for both models.
▶ These values are a byproduct of the sarima runs displayed at
the end of the above Example.

27 / 30
▶ The AIC and AICc both prefer the MA(2) fit, whereas
the BIC prefers the simpler AR(1) model.
▶ It is often the case that the BIC will select a model of
smaller order than the AIC or AICc.
▶ In either case, it is not unreasonable to retain the
AR(1) because pure autoregressive models are easier to
work with.

28 / 30
References I

R H. Shumway and D S. Stoffer.


Time Series Analysis and Its Applications: With R Examples.
Fourth Edition, Springer, 2016.

29 / 30
Questions?

30 / 30

You might also like