Lab: Box-Jenkins Methodology - Test Data Set 1: Time Series and Forecast
Lab: Box-Jenkins Methodology - Test Data Set 1: Time Series and Forecast
4.0
3.0
2.0
1.0
0.0
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
-1.0
-2.0
-3.0 Actual
-4.0 Forecast
In keeping with the principles of the Box-Jenkins method, the analysis will follow the
usual sequence, illustrated overleaf.
Ø The series is clearly stationary so we may go directly to the second part of Phase I,
model selection.
Ø Use only the last 98 observations to in the model building and testing phases.
1. Compute the ACF and PACF of the time series and use these to select from
amongst the available level I ARMA models ARMA(1,0), ARMA(1,1) and
ARMA(0,1).
2. Perform an analysis of variance for each model to compute the model and error
sums of squares and test the significance of each model.
3. Compute the Akaike Information Criterion (AIC) and and Bayes Information
Criterion (BIC) for each model and use these to estimate the model parameters
and determine the model which best fits the data.
It may help you to perform the analysis in the following way:
Phase I
Identification Data Preparation
Ø Transform data to stabilize
variance
Ø Difference data to obtain
stationary series
Model Selection
Ø Use ACF and PACF to identify
appropriate models
Phase II
Estimation Estimation
and Testing Ø Derive MLE parameter
estimates for each model
Ø Use model selection criteria to
choose the best model
Diagnostics
Ø Chec k ACF/PACF of residuals
No
Ø Do portmanteau and other tests
of residuals
Ø Are residuals white noise?
Box-Jenkins Methodology -
Test Data Set 1
1. The ACF and the PACF of the time series are shown below. The positive,
geometrically decaying pattern of the ACF, coupled with the single significant
PACF coefficient φ 11 strongly suggest an AR(1) {=ARMA(1,0)} process.
1.00
0.80 ACF
PACF
Upper 95%
0.60
Lower 95%
0.40
0.20
0.00
1
7
9
11
13
15
17
19
-0.20
-0.40
yt = ayt-1 + εt + βεt -1
Where, in the case of an ARMA(1,0) model β is zero, while in the case of an
ARMA(0,1) model a = 0.
SSM = ∑ ( yˆ t − y) 2
ANOVA DF SS MS F p
Model 1 127.35 127.3496 130.3163 1E-19
Error 96 93.81 0.977234
Total 97 221.16
3. :We can now compute the Akaike Information Criterion using the Excel formula
=n*LN(SSE)+2*m in cell M3. For comparison, we compute the Schwartz
Bayesian Information criterion (BIC) in cell M4 using the Excel Formula
=n*LN(SSE)+m*LN(n).
So far, we have been working with a dummy value of our model coefficients.
Now that we have computed the formula for the AIC (BIC) we can proceed to find
the maximum likelihood estimates of the coefficients. We do this by using Excel
SOLVER to find the coefficient values which minimize the AIC (or BIC).
To run SOLVER, go to the Forecasting commandbar and choose Solver. The
following dialog box appears:
Using a similar technique to estimate the parameters for all three models, and the
corresponding AIC (and BIC), we arrive at the results shown in the table below.
These clearly indicate that, using either the AIC or BIC criteria, the preferred
model is the ARMA(1, 0) {= AR(1)} model
yt = 0.766yt-1 + εt
0.25
ACF
0.20 PACF
9
11
13
15
17
19
-0.10
-0.15
-0.20
-0.25
We can use the portmanteau tests to verify that the 20 ACF coefficients are
collectively insignificant. The Excel formula for the Box-Pierce statistic Q(20) is
=n*SUMPRODUCT($I$11:$I$30,$I$11:$I$30), which returns a value of 9.24 in
cell P4. [Alternatively you can use the Box-Pierce function]. The Box-Pierce
statistic has a χ2 distribution with 20 - 1 = 19 degrees of freedom. Using the Excel
formula =CHIDIST(P4,B30-m) in cell Q4, we find that the probability of the
statistic taking this value or large is 96.9%. So we accept the hypothesis that the
residual ACF coefficients are insignificantly different from zero.
A similar test using the Ljung-Box statistic can be performed using the Excel
formula =n*(n+2)*SUMPRODUCT($I$11:$I$30,$I$11:$I$30,1/(n-
$B$11:$B$30)) in cell P5. Again the conclusion is that the residual ACF
coefficients are insignificant at the 95.6% level.
We can also check for serial correlation amongst the residuals using the Durbin-
Watson statistic:
n
∑ (e t − et −1 ) 2
DW = t=2
n
∑e
t =1
2
t
This can be computed using the DurbinWatson function or directly using the
following Excel formula:
=SUMPRODUCT($E$14:$E$110-$E$13:$E$109,$E$14:$E$110-$E$13:$E$109)
/ SUMPRODUCT($E$13:$E$110,$E$13:$E$110)
2
n −1 n−1 ft +1 − yt +1
∑ ( FPE t +1 − APEt +1 ) 2 ∑
t =1 y
U= t =1
n−1
= t
2
yt +1 − yt
∑ APE
n −1
∑
2
t +1
t =1
t =1 y t