Box–Jenkins (BJ) Methodology: ARMA and ARIMA
Modelling
“Let’s data speaks themselves”
Unlike traditional regression model, BJ allows 𝑌𝑡 to be explained by the past
or lagged values of 𝑌𝑡 itself and the lagged values of 𝑒𝑡 , which is uncorrelated
random error term with ZERO mean and CONSTANT variance that is a white
noise error term.
AR means Autoregressive
Autoregressive (AR) Model
In autoregressive model, the independent variables are all lagged dependent
variable, there are no other independent variable
𝑌𝑡 = 𝛼 + 𝛽1 𝑌𝑡−1 + 𝛽2 𝑌𝑡−2 + 𝑒𝑡
𝑌𝑡 = 𝛼 + 𝛽1 𝑌𝑡−1 + 𝛽2 𝑌𝑡−2 + − − − − −𝛽𝑝 𝑌𝑡−𝑝 + 𝑒𝑡
The above model is called autoregressive model of order p, AR(p). Here value
of “p” is 2 in above model.
MA means Moving Average
Moving Average (MA) Model
A moving average model uses lagged error terms as independent variables.
𝑌𝑡 = 𝛼 + 𝛽1 𝑒𝑡−1 + 𝛽2 𝑒𝑡−2 +. . … . +𝛽𝑞 𝑒𝑡−𝑞 + 𝑒𝑡
In above equation, q different lagged error terms are used as independent
variable. Independent variables 𝑒𝑡−1 through 𝑒𝑡−𝑞 are uncorrelated error
terms. 𝑒𝑡 is the typical error term found in every regression, and we assume it
is not correlated with the other error term.
Moving average models are abbreviated MA(q) where q is the number of lagged
error term present in the model. The name “moving average” comes from the
fact that moving average of past error term with the mean of dependent
variable to produce a moving average of the dependent variable.
Assumption
This methodology is based on the assumption that the time series (variable)
under study is “STATIONARY”. If the variable is not stationary then we have to
take first difference to make variable stationary at first difference or integrated
of order 1 or I(1).
Autoregressive Moving Average (ARMA) and
Autoregressive integrated Moving Average (ARIMA)
Process
As we have discussed that BJ method assumes that variable is stationary. If this
is the case then ARMA process will be used. If variable is not stationary at level
then we have to take difference (assume first difference here) to make it
stationary or the variable is I(1) or integrated of order 1 then ARIMA process
will be used.
ARMA(p, q): AR(p) and MA(q) are used where p and q are determined by PACF
and ACF respectively.
ARIMA(p, d, q): AR(p), I(d) and MA(q) are used where p, d and q are
determined by PACF, order of differencing and ACF respectively.
In short: if variable is stationary then ARMA will be used. If variable is
stationary after taking difference then ARIMA will be used.
STEPS:
1. Identification
ARMA (p, q)
ARIMA (p, d, q)
The value of “p” is determined by some criterion such as Partial Autocorrelation
Function (PACF).
PACF: It gives the correlation between the dependent variable and its lag
values, while keeping the effect of shorter lags constant. The first correlation
value of for 𝒀𝒕 & 𝒀𝒕−𝟏, the second one is 𝒀𝒕 & 𝒀𝒕−𝟐 then 𝒀𝒕 & 𝒀𝒕−𝟑 and so forth.
The correlation between 𝒀𝒕 & 𝒀𝒕−𝟐 does not include the effect of 𝒀𝒕 & 𝒀𝒕−𝟏 .
That is why it is known as Partial Autocorrelation Function.
We determine the optimal number of lagged error terms to include in a
moving average model by its autocorrelation function (ACF). ACF is different
from PACF used with autoregressive model.
ACF: It gives correlation coefficient between the dependent variable and the
same variable with different lags, but the effect of shorter lags in not kept
constant. This means that the effect of shorter lag is included in the numbers
given with the autocorrelation function. The correlation between 𝒀𝒕 & 𝒀𝒕−𝟐
includes the effect of correlation between 𝒀𝒕 & 𝒀𝒕−𝟏. This is the opposite of the
PACF used earlier where the effect of shorter lags is not included.
2. Model’s Estimation
After finding p (PACF), d(Unit root test) and q(ACF), here AR(1), d(1) and
MA(1) have been found. Keep this think in mind that we have to run all possible
models of AR(1) and MA(1) for forecasting like
ARIMA (1, 1, 1)
ARIMA (1, 1, 0)
ARIMA (0, 1, 1)
For appropriate model we have to see 5 things
1. Significant Coefficient
2. SIGMASQ (Minimum) – which shows the volatility
3. Adj. R2 (Maximum)
4. AIC (Minimum)
5. SIC (Minimum)
Models’ Comparison
Differenced CPI ARIMA(1, 1, 1 ) ARIMA(1, 1, 0 ) ARIMA(0, 1, 1 )
Significant Coefficient 0 1 1
SIGMASQ (Minimum) 0.6365 0.6690 0.6366
Adj. R2 (Maximum) 0.1236 0.0994 0.1430
AIC (Minimum) 2.5581 2.5640 2.5165
SIC (Minimum) 2.7140 2.6810 2.6335
3. Diagnostic test
Error term should be stationary. We can check it through conventional unit root
test or Q-statistics.
If error is not stationary then we have to repeat all above steps.
If this is NOT the case then go for FORECASTING
4. Forecasting
After running regression model, go to forecast (watch STATIC and DYNAMIC
FORECAST with EViews for step by step procedure:
https://youtu.be/eUH1ioLAaYY)