Jmp045 Modeling Gold Prices
Jmp045 Modeling Gold Prices
Produced by
1
Modeling Gold Prices
ARIMA/ARMA Models, Model Comparison
Key ideas:
The case study deals with univariate time series modeling, in which a time series is modeled using its
own past values. Hence, these models use only one data series. Univariate models are specialized
models, where past values (or lags) of a series are considered as independent variable. ARIMA/ARMA is
a popular univariate model, used extensively for analyzing the characteristics and forecasting time series
data. This study analyzes time series data with JMP.
Background
Hari, a research assistant at a leading university, has been asked by his professor to prepare a report on
gold prices in the United States. The professor wants Hari to look at the price of gold over a five-year
period, analyze the characteristics of gold prices and suggest a suitable univariate model that fits the
data.
The Task
Hari is entrusted with the following tasks:
Gold price is a continuous time series variable, whereas the date is time variable.
2
Analysis
Descriptive statistics
Let’s explore the data using distribution and Graph Builder in JMP.
To create, Analyze>Distribution>Y = GP>OK. Under the red triangle next to Distributions, select Stack to align the output
horizontally. Under the red triangle next to Summary Statistics, select Customize Summary Statistics and then N, Skewness,
Kurtosis, Minimum and Maximum. Click OK.
To create, Graph>Graph Builder>Y = GP, X = Date. Select line graph from the chart options. Click Done.
The basic descriptive characteristics of the data are presented in Exhibit 1 and the graph showing
movement of gold prices during the five years is given in Exhibit 2. The gold prices fluctuated between
$1,073 and $2,051 during the five years, with a mean of $1,388 and standard deviation of $217. Exhibit 2
shows an upward trend in prices during this period, especially after 2019.
3
Stationarity of the data
Stationarity of the data series is a prerequisite for most of the econometric models. From Exhibit 2, it is
evident that the gold price is not a stationary series. In order to confirm the same, the augmented Dickey-
Fuller (ADF) test is used.
The null hypothesis for the ADF test is that the series has unit root, or the series is non-stationary. The
ADF test returns three test statistics:
• Zero Mean ADF: A test against a random walk with zero mean.
• Single Mean ADF: A test against a random walk with a non-zero mean.
• Trend ADF: A test against a random walk with a non-zero mean and a linear trend.
The test statistic is expected to be negative; therefore, it must be more negative (less) than the critical
value for the hypothesis to be rejected. The values shown for the Zero Mean, Single Mean and Trend
ADF in JMP are the Tau statistics associated with the Dickey-Fuller test. Because Dickey and Fuller
produced tables for the critical values associated with the distribution of the Tau statistic, and because the
associated p-values would only be approximations, the JMP developer decided not to display
approximate p-values for these statistics. The critical values for the ADF test at a 5% level are -2.86
without trend and -3.41 with trend for large samples.
The result of the ADF test for gold prices is given in Exhibit 3. The test statistic of all three ADF tests are
above the critical value of 5%. Hence, the null hypothesis is accepted, and it is concluded that the gold
price series is non-stationary.
4
Exhibit 4 First Difference of Daily Gold Prices (2016-2020)
To create, Graph>Graph Builder>Y = FDGP, X = Date. Select line graph from the chart options and click Done.
From Exhibit 4, it can be observed that the series is stationary. However, to confirm the same, an ADF
test is performed.
Exhibit 5 gives the results of the ADF test of FDGP. Since the test statistics are all less than the critical
value, the null hypothesis is rejected. So, we can conclude that the first difference of gold prices is a
stationary series.
5
ARIMA & ARMA models
An autoregressive (AR) process is one where the current value of a variable depends on its past values.
The number of past values (lags) that determine the current value is known as the order of the AR model.
Thus, an AR (3) model would use three past values of the data for modeling the current value. In more
general terms an AR(p) model is specified as:
𝑝
𝑦𝑡 = 𝛼 + ∑ 𝛽𝑖 𝑦𝑡−𝑖 + 𝑢𝑡
𝑖=1
A moving average (MA) process is one where the current value of a variable depends on the past and
current values of the white noise disturbance terms (error terms). The number of past white noise terms
included in the model is known as the order of the MA model. An MA (q) model with (q) lags is specified
as:
𝑞
𝑦𝑡 = 𝛼 + ∑ ∅𝑖 𝑢𝑡−𝑖 + 𝑢𝑡
𝑖=1
The autoregressive moving average (ARMA) process is a combination of the AR and MA processes. In
the ARMA model, the current value of a variable depends on the past values of the variable itself and the
past and current white noise disturbance terms. ARMA (p,q) model represents an ARMA process with (p)
lags of AR terms and (q) lags of MA terms. The model is specified as:
𝑝 𝑞
𝑦𝑡 = 𝛼 + ∑ 𝛽𝑖 𝑦𝑡−𝑖 + ∑ ∅𝑖 𝑢𝑡−𝑖 + 𝑢𝑡
𝑖=1 𝑖=1
Building ARMA models involves three steps: identification, estimation and diagnostic checking.
Identification deals with choosing the right order of the model that captures the dynamic features of the
data. The order of the model can be decided via a graphical method by plotting the autocorrelation
function (ACF) and partial autocorrelation function (PACF). Another method for deciding the order is to
use information criteria. Once the order is identified, the parameters of the model are estimated. Finally,
diagnostic checking is done by testing the residuals of the selected model for autocorrelation.
An ARMA model is suitable for stationary data. Using non-stationary data for modeling is called the
autoregressive integrated moving average (ARIMA) model, where the order of integration is built into the
model. For an ARIMA (p,d,q) model, (d) represents the order of integration, (p) represents the lags of AR
term, and (q), the lags of MA term. So, if the data turns stationary on first differencing, (d) would be
specified as 1. For example, ARIMA (2,1,2) would be used for modeling a time series that is integrated of
first order; it will have two lags of AR and MA terms each.
6
Exhibit 6 ACF and PACF Plots for Gold Prices
Exhibit 7 ACF and PACF Plots for First Difference of Gold Prices
The ACF and PACF plots of gold prices (GP) and first difference (FDGP) are shown in Exhibits 6 and 7.
We can see that the ACF plot of GP in Exhibit 6 does not decay since GP is a non-stationary series. The
ACF and PACF of FDGP shown in Exhibit 7 do not show any significant spikes up to the fourth lag. The
pattern of decaying of ACF and PACF is also not clear. Unfortunately, while using real data, a clear
pattern is rarely seen, making it difficult to interpret ACF and PACF plots. In such cases, the order of the
model is determined using information criteria.
7
Model building
We shall build various ARMA models using the first difference of gold prices (FDGP), which is the
stationary series, by specifying the parameters of autoregressive order (p) and moving average order (q).
The differencing order (d) is set at 0 for all these models. The models estimated here include AR (1), MA
(1), ARMA (1,1), AR (2), MA (2), ARMA (1,2), ARMA (2,1), ARMA (2,2), AR (3), MA (3), ARMA (1,3),
ARMA (2,3), ARMA (3,1), ARMA (3,2) and ARMA (3,3).
To create, Analyze>Specialized Modeling>Time Series>Y = FDGP, X, Time ID = Date. Click OK. Under the red triangle next to
Time Series FDGP, choose ARIMA for the option to specify the values for p and q. Once you have specified them, click Estimate.
Select the ARIMA option again from the dropdown and repeat the process to build different ARMA models.
In this case, ARMA models are estimated using the first difference of gold prices (FDGP), which is the
stationary series.
Information criteria are measures of model fit, which include a penalty for adding extra parameters.
Hence, the objective is always to minimize the value of information criteria. Among several criteria
available, the most popular is the Akaike’s information criteria (AIC). The AIC values of these model are
compared to identify the most suitable one, which would be the one with the lowest AIC value. By default,
JMP sorts the models by the AIC statistic in increasing order. The various ARMA models, along with the
fit indices, are shown in Exhibit 9.
8
Exhibit 9 Model Comparison Using AIC
It can be observed from Exhibit 9 that ARMA (3,2) has the minimum AIC value of 10062.421. Hence, the
most suitable ARMA model for gold prices is identified as ARMA (3,2).
Exhibit 10 ARMA (3,2) Model for FDGP and ARIMA (3,1,2) Model for GP
To create, Analyze>Specialized Modeling>Time Series>Y = FDGP, X, Time ID = Date. Click OK. Under the red triangle next to
Time Series FDGP, choose ARIMA. Put p = 3 and q = 2 to estimate the ARMA (3,2) model. Similarly build the ARIMA (3,1,2) model
using GP data.
In Exhibit 10, we can observe that the parameter estimates for both the models are the same. Hence, the
choice of ARMA or ARIMA is irrelevant in terms of correctly specifying the order of integration. Exhibit 10
also shows that Prob>|t| values for all the AR and MA terms are less than 0.05. Hence, the null
9
hypothesis that the lags of AR and MA terms are not significant is rejected at 5%. It is evident that all the
three AR terms and two MA terms are statistically significant in the model.
Diagnostic checking
Diagnostic checking of the model is done by looking at the autocorrelation of the residuals. Once you
build an ARMA or ARIMA model in JMP, it will by default provide residuals information along with model
summary and parameter estimates.
Exhibit 11 shows the ACF and PACF of residuals of the ARMA (3,2) model. It can be observed that there
is no autocorrelation in the residuals (as seen in the second column, AutoCorr). Thus, the ARMA (3,2)
model is correctly specified and is able to adequately capture the dynamic features of the data series.
10
Summary
Statistical insights
To summarize, in this case, the scheme of analysis using JMP involved the following:
Implications
Hari can draw the following conclusions from the analysis:
• The gold prices in the United States are non-stationary in nature and exhibited an upward trend
during the five-year period.
• The gold price series turned stationary on first differencing.
• An ARMA (3,2) model captures the behavior of gold prices in the United States; the same can be
used for forecasting gold prices.
This study used the Distribution platform to display histograms and summary statistics; it also used Graph
Builder to visualize the data in a time series manner. Time series analysis, which is under Specialized
Modeling Platform, was used conduct an augmented Dickey-Fuller test for stationarity.
Transformations are applied to create new columns, followed by checking for the stationarity of the data
series. ARMA and ARIMA models were built by specifying the parameters. A Model Comparison report
was used to select the model.
Exercise
The price of silver for a five-year period (2016-2020) is available in SP.jmp. Perform the scheme of
analysis explained in this study. Identify the best ARMA model for silver price and perform diagnostic
checking.
JMP and all other JMP Statistical Discovery LLC product or service names are registered trademarks or trademarks of JMP Statistical Discovery LLC in the USA and other
countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright © 2022 JMP Statistical Discovery LLC.
All rights reserved.
11