0% found this document useful (0 votes)
5 views

1. Module 2 Univariate Time Series - Copy (3)

The document discusses Univariate Time Series Analysis, focusing on methods such as the Autocorrelation Function (ACF), Box-Ljung Q test, and various time series models including ARMA/ARIMA and GARCH. It examines the Efficient Market Hypothesis, volatility clustering, and the distinction between stationary and nonstationary time series, providing statistical tests and examples using SP500 data. Key concepts include the implications of autocorrelation, the significance of unit roots, and the behavior of financial time series in relation to predictability and forecasting.

Uploaded by

wuximjj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

1. Module 2 Univariate Time Series - Copy (3)

The document discusses Univariate Time Series Analysis, focusing on methods such as the Autocorrelation Function (ACF), Box-Ljung Q test, and various time series models including ARMA/ARIMA and GARCH. It examines the Efficient Market Hypothesis, volatility clustering, and the distinction between stationary and nonstationary time series, providing statistical tests and examples using SP500 data. Key concepts include the implications of autocorrelation, the significance of unit roots, and the behavior of financial time series in relation to predictability and forecasting.

Uploaded by

wuximjj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Value at Risk Methods

Module 2
1. Univariate Time Series
Analysis
Elena Goldman
Applied Statistics and Econometrics II
Department of Economics
New York University
• Univariate Time Series Analysis
• Autocorrelation Function and Q test (Box-Ljung Test)
• Applications for the Efficient Market Hypothesis and Volatility clustering
• Nonstationary vs. Stationary Time Series
• Autoregressive Process
Agenda •

Unit Root Tests
Moving Average Process
• ARMA/ARIMA Process

• Time Varying Volatility


• Historical, EWMA and ARCH/GARCH Volatility
Autocorrelation Function (ACF) and Box-Ljung
Q test
• Correlation for a variable with its own lags is called autocorrelation. For example, 𝜌! = 𝑐𝑜𝑟(𝑦" , 𝑦"#! ) is
autocorrelation for lag j. The autocorrelation function of order k: ACF=𝜌! for orders j=1,2,…k.
• Let 𝑐𝑜𝑣(𝑦" , 𝑦"#! ) = gj
• ACF shows for every lag j=1,…,k time series correlation of 𝑦" and 𝑦"#! , i.e. 𝜌! = 𝑐𝑜𝑣(𝑦" , 𝑦"#! )/𝑣𝑎𝑟(𝑦" )= gj/g0.
• As with regular correlation coefficient the ACF can take values between -1 and 1. If autocorrelation is close to
1 it shows very strong correlation of 𝑦" with its own past values.
• If it is close to zero, there is no relation to the past values of the variable. Such time series with no
autocorrelation are called white noise.

• Box-Ljung Q test is based on the normalized sum of squared autocorrelations and has asymptotic c2
distribution.
& '!"
𝑄 = 𝑛(𝑛 + 2) ∑!$% , where 𝜌! = 𝑐𝑜𝑟(𝑦" , 𝑦"#! )
(#!

• The underlying assumption of the Q test under the null hypothesis is the independent identical distribution (iid)
of 𝑦" .
Testing Efficient Market Hypothesis
• The weak form of efficient market hypothesis (EMH) says that returns
are almost unpredictable from their past history.
• Test for autocorrelation between returns at time t and past returns at
times t-1, t-2,..t-k.
• Ljung Box Q test
Null Hypothesis: H0: Q=0 (no autocorrelation up to order k) EMH
Alternative Hypothesis Ha: Q>0 (presence of autocorrelation)
Correlogram (ACF plot)
q = acf(ret,10) #<== Compute and plot
autocorrelation for 10 lags starting from
lag 0.

plot(q[1:10]) #<== plot autocorrelation


for 10 lags starting from lag 1.

First, we obtain autocorrelation function


ACF=𝜌! of orders j=1,2,…k=10.

Some autocorrelations are statistically


significant as their values are outside of
the 95% confidence intervals. But
mostly they are not significant and small
in size.
Testing Efficient Market Hypothesis
• Obtain Q stat and p-value for the Box-Ljung test of no autocorrelation. The null
hypothesis of Q=0 supports the EMH of no predictability of returns from past returns
based on no autocorrelation.

• Test SP500 2012/01/03-2018/07/25


For k=10 Q=14.25, p=.16. “cannot reject the null at 5%”

• Test SP500 2005/01/03-2018/07/25


With k=10, Q=71.16, p=.000. “can reject the null at 5% or any other reasonable value”
• Advanced:
The statistically significant autocorrelation rejects the hypothesis of returns being iid
(independent and identically distributed). This may be due to variance changing over time
and does not prove predictability of returns!
Testing for Volatility Clustering
• Periods of high volatility persist for certain period of time. Likewise, periods of low volatility
last for certain period of time. This is called volatility clustering. News on future values and
risks move asset prices and news arrives in clusters at high and low frequency.
• For squared returns we find statistically significant ACF and sizable
for all orders: 1,..10

• Apply Q test for squared returns


Null Hypothesis H0: Q=0 (no autocorrelation of returns^2 up to order k)
No Volatility Clustering
Alternative Hypothesis Ha: Q>0 Volatility Clustering
• Test SP500 returns
• At k=10, Q=3205.6, p=0 for 2005-2018
• At k=10, Q=482.2, p=0 for 2012-2018
• In both cases reject H0 and find volatility clustering!
Nonstationary vs. Stationary Time Series
• In finance and macroeconomics, we find that many time series are nonstationary.
• They do not mean revert and exhibit overtime changes in distribution. For example, mean or
variance may be growing with time t.
• On the contrary, the distribution of stationary 𝑦! and its unconditional moments, such as mean,
variance, autocovariance and higher moments do not depend on time: 𝐸(𝑦! ) = 𝜇 for all t, etc.
• The formal definition of weak stationary (A time series process is weakly stationary (or covariance
stationary) time series is as follows: 𝑦! is stationary if

(1) 𝐸(𝑦! ) = 𝜇 for all t (i.e. the time series has a constant mean which is that same at all times),

(2) var (𝑦! ) < ∞

(3) 𝑐𝑜𝑣(𝑦! , 𝑦!"# ) = gs is not a function of time t (i.e. the correlation between two values of the
series s periods only depends on s and not on t).
Nonstationary vs. Stationary Time Series

AAPL price and EURO are trending


and are non-stationary

AAPL returns and EURO


returns are mean-reverting
and are stationary
ACF is close to1 for many lags without going downà the

Correlogram time series Yt is non-stationary such as for AAPL prices and


EURO

ACF is strictly less than 1à the time series is stationary and reverts to
the mean like the returns series
Consumer Price Index and Inflation
ACF results
• When ACF is close to one and does not go to zero quickly (exponentially) the time series Yt is non-
stationary such as AAPL price and EURO exchange rate series. The ACF of non-stationary time
series shows slow decay.
• On the contrary if ACF is strictly less than one and goes down exponentially the time series is
stationary and reverts to the mean.
• Based on zero autocorrelation both APPL and EURO returns series are stationary and even more
strictly are white noise.
• The last two graphs for the CPI and inflation as well have a non-stationary (slow decay) and
stationary behavior respectively. The ACF for CPI goes down from 1 to 0.6 after 36 lags showing
high persistence. Inflation ACF looks stationary but there is periodicity in ACF being higher at lags
12, 24, 36 exhibiting seasonality. Peaks of ACF are 12 months apart and troughs are as well 12
months apart due to seasonal nature of consumer prices.
Nonstationary vs. Stationary Time Series
• If series are stationary one can use time series models for forecasting
which mean reverts in the long run.
• For nonstationary time series one can not produce a long-term
forecast as time series trend without limit and do not mean revert.
• Non-stationary time series can be differenced to make make them
stationary.
• Prices (non-stationary) can be differenced to produce returns
(stationary).
Partial autocorrelation function (PACF)
• Partial autocorrelation function (PACF) for each series measures correlation between 𝑦!
and 𝑦!"# after removing the effect of intermediate lags 1,..,(s-1). This way we can find if
there is a direct relation between 𝑦! and its lag s.
• One way to estimate the partial autocorrelation for lag s is to run a linear regression of 𝑦!
on previous lags 𝑦!"$ , 𝑦!"% ,…, 𝑦!"# and find the estimated coefficient of 𝑦!"# .
• For the first partial correlation (s=1) the resulting estimated coefficient is identical to the
first autocorrelation.
• Regression of the variable on its past lags is called autoregression to be discussed next.
• The PACF is useful in determining how many lags are there in the autoregressive process.
All three price series show that only the first lag is statistically significant. For the returns
none of PACF is significant (except for lag 8 where it is close to the lower bound of the
95% interval). For inflation PACF exhibits periodic structure indicating seasonality.
PACF
Autoregressive Process AR(1)
• Autoregressive model of order 1 written as AR(1) :

𝑌! = 𝛼 + 𝜑𝑌!"$ + 𝜀! (1)

• error term is assumed to be white noise, which means it is independent and identically distributed with Normal distribution, zero mean and
constant variance 𝜎 ! . So, E (𝜀! ) = 0, var (𝜀! ) =𝜎 " and cov (𝜀! , 𝜀# ) = 0 for 𝑠 ≠ 𝑡.
• the disturbance term is also called innovations.

o if the coefficient 𝜑 = 1 then we say that Y has a unit root or is a random walk

!" = $ + !"&' + ("


• Random Walk with drift:

• Difference by taking ∆"# = "# − "#&'


∆"# = % + '"#() + *#
where

! = # -1
AR(1)
• We can see that if 𝜑 = 1 (unit root) then 𝜌 = 0 and we get:
∆𝑌! = 𝛼 + 𝜀!
Thus, if 𝜑 = 1 the process ∆𝑌! is has constant mean 𝛼 with a white noise
error. The differenced process is stationary fluctuating around its constant
mean.
• In case Y has a unit root we can work with differenced data as the
differences ∆𝑌! are stationary.
• In the unit root test, we work with differenced series and test if 𝜌 = 0 to
see if a unit root is present. The alternative hypothesis of a unit root is
stationarity condition −1 < 𝜑 < 1 which is equivalent to −2 < 𝜌 < 0.
Random Walk Model

• A random walk model is a special case of AR(1) process with the unit root 𝜑 = 1:

𝑌! = 𝑌!"$ + 𝜀!

• This model holds for many financial series, such as stock prices and exchange rates. The
intuition behind this model is as follows: changes in Y (stock price) are unpredictable
based on past information about prices, so there are no arbitrage opportunities for
investors as only news 𝜀! drives changes in prices and this news is not known prior to
time t. This is equivalent to the weak-form efficient market hypothesis (EMH) that we
analyzed with the Q test.

• Forecast with a random walk model is equal to the last observed value of Y since future
movements are unpredictable and can go in any direction. The random walk model thus
provides a naïve forecast which is the same for all future horizons.
Random Walk Model
• The Random Walk with Drift is a random walk process with the constant term:

𝑌2 = 𝛼 + 𝑌234 + 𝜀2
• Here the drift term parameter (constant 𝛼) pushes the process up or down. For
example, 𝛼 may present the average return that investor earns on particular
stock based on risk as is predicted by CAPM or Multifactor models.

• In both versions the random walk process describes a stochastic trend in a non-
stationary variable 𝑌2 . This process can wonder in any direction and is
unpredictable.
ACF of AR(1) Process with unit root
• For AR(1) if the coefficient 𝜑 = 1 then Y has a unit root.
• If Y has a unit root (𝜑 = 1) then its autocorrelations (𝑌! and any lag 𝑌!"# ) will be near one and will not drop much as lag
length increases. Moreover, both the variance and autocorrelation are functions of time. The variance in particular grows
linearly with time. The ACF is close to 1 for small s and decays slowly when s increases.

𝑌! = 𝛼 + 𝑌!"$ + 𝜀! = 𝛼𝑡 + 𝜀$ + 𝜀% + … + 𝜀!"$ + 𝜀!

E(𝑌! )= 𝛼𝑡

g0 = 𝑣𝑎𝑟(𝑌! ) = 𝜎 %𝑡

𝐴𝐶𝐹# = 𝑡 − 𝑠 𝜎% = ((𝑡 − 𝑠)/𝑡)$/%


8
𝑡𝜎 %(𝑡 − 𝑠)𝜎 %
• In such case Y is non-stationary and has a long memory which means the effect of past news does not die out with time.
Representation of AR(1) process
• For simplicity assume α=0.

𝑌! = 𝜑 𝜑𝑌!"% + 𝜀!"$ + 𝜀! = 𝜑% 𝑌!"% + 𝜑𝜀!"$ + 𝜀!


𝑌! = 𝜑% 𝜑𝑌!"& + 𝜀!"% + 𝜑𝜀!"$ + 𝜀! = 𝜑& 𝑌!"& + 𝜑% 𝜀!"% + 𝜑𝜀!"$ + 𝜀!

𝑌! = 𝜑! 𝑌' + 𝜑!"$ 𝜀$ + ⋯ + 𝜑𝜀!"$ + 𝜀!
*
𝑌! = 4 𝜑 ( 𝜀!"(
()'
• 𝛾' = 𝑣𝑎𝑟(𝑌! ) = 𝑣𝑎𝑟 ∑* (
()' 𝜑 𝜀!"( = ∑* %(
()' 𝜑 𝑣𝑎𝑟(𝜀!"( ) = 𝜎 % ∑*
()' 𝜑
%(

• If -1<𝜑<1 using the infinite geometric progression sum the result is


𝜎%
𝛾' = 𝑣𝑎𝑟(𝑌! ) =
1 − 𝜑%
• For 𝜑 ≥ 1 this variance will be infinite.
• Alternatively, assuming the process is stationary and thus unconditional mean and
variance are not functions of time we can find the same result by taking expectation of
AR(1) equation:
𝑌! = 𝜑𝑌!"$ + 𝜀!

𝛾< = 𝑣𝑎𝑟(𝑌! ) = 𝜑 % 𝑣𝑎𝑟(𝑌!"$ ) + 𝑣𝑎𝑟(𝜀! )

𝛾< = 𝜑 % 𝛾< + 𝜎 %
• Thus,
𝜎%
𝛾< = 𝑣𝑎𝑟(𝑌! ) =
1 − 𝜑%
ACF of AR(1) Process
• If 𝛼=0, then 𝐸 𝑌" ) = 𝐸(𝑌"#/ =0
𝛾% = 𝑐𝑜𝑣 𝑌" 𝑌"#% = 𝐸 𝑌" 𝑌"#% = 𝐸 𝜑𝑌"#% + 𝜀" 𝑌"#% = 𝜑𝐸 𝑌"#% 0 + 𝐸 𝜀" 𝑌"#% = 𝜑𝛾1 .

The last term is zero since 𝜀" is white noise and is independent of 𝑌"#% .

• The autocorrelation
𝛾%
𝜌% = 𝑐𝑜𝑟 𝑌" , 𝑌"#% = =𝜑
𝛾1

𝛾0 = 𝐸 𝑌" 𝑌"#0 = 𝜑 𝐸 𝑌"#% 𝑌"#0 = 𝜑𝛾% = 𝜑 0 𝛾1



𝛾/ = 𝜑 / 𝛾1
𝛾/
𝜌/ = 𝑐𝑜𝑟 𝑌" , 𝑌"#/ = = 𝜑/
𝛾1
ACF of AR(1) Process
• If −1 < 𝜑 < 1 then Y is stationary. ACF= 𝜑 = < 1 and as lag length s
increases ACF will tend to zero.
• For example, if 𝜑 = 0.5 then 𝜑> = 0.5, 𝜑 ? = 0.25, 𝜑 @ = 0.125, … so
as s =1,2,3,..increases the ACF=𝜑 = tends to zero exponentially.
• When 𝜑 < 0, the autocorrelation function oscillates (shifts sign) and
also rapidly goes to zero. For example, if 𝜑 = −0.5 then 𝜑> =
− 0.5, 𝜑 ? = 0.25, 𝜑 @ = −0.125.
Forecasting
• Forecasting with AR(1) in equation (1) is based on substitution of previously observed or
previously forecasted value of Y when observed data are not available. For example, if
there are T observations:
1
• 1 step forecast 𝑌=>$ = 𝛼 + 𝜑𝑌= ,
1
• 2 step forecast 𝑌 1 %
=>% = 𝛼 + 𝜑𝑌=>$ = 𝛼 1 + 𝜑 + 𝜑 𝑌= ,
1
• k-step forecast 𝑌 =>? = 𝛼 1 + 𝜑+. . +𝜑
?"$ + 𝜑 ? 𝑌 .
=

• If the process is stationary (𝜑 < 1) the long run forecast when 𝑘 → ∞ tends to the
unconditional mean, i.e. 𝜇 = 𝐸(𝑌! ) = 𝛼 + 𝜑𝐸(𝑌!"$ ) = 𝛼 + 𝜑𝜇.
@
𝜇= (2)
$"A
• If Yt has a unit root then it exhibits stochastic trend behavior, in other words Yt
can trend without limit in the future and in the case of 𝛼 =0 the forecast for any
horizon is the last available observation YT, which is called a naïve forecast.
• There is no long run forecast for a unit root process since mean does not exist in
equation (2). The uncertainty (standard deviation) of the forecast increases as a
square root of time with forecast horizon. Thus, the naïve forecast has large
uncertainty.
• If Yt has a unit root, then often differencing it by taking ∆𝑌2 = 𝑌2 − 𝑌234 (or
growth rate transformation) will result in ∆𝑌2 being stationary.
• If we can difference the unit root process once to get a stationary time series the
original process Yt is referred to as difference stationary, or integrated of order
one I(1). If series need to be differenced twice to obtain stationarity they are
called integrated of order two I(2).
Autoregressive Process AR(p)
• Autoregressive process with p lags is called AR(p) and is written as:

!" = $ + &'!"(' + &) !"() + ⋯ + &+!"(+ + ,"


𝜀" ~𝑖𝑖𝑑 𝑁 0, 𝜎 0 .

• model dependence of the time series variable on its past p lags (e.g. current inflation depends on previous
12 months of inflation)
• The process may be described as AR(p) if its ACF exponentially decays to zero (possibly with oscillations
switching sign) and if there is a significant spike at lag p in the PACF and not beyond lag p
• AR(p) process can be estimated using linear regression (OLS) or the Maximum Likelihood
• Select number of lags p is by minimizing Akaike or Schwarz information criteria
Maximum Likelihood
• Time series models are often estimated by the method of maximizing the
log Likelihood function for observing actual data.
• If we use Normal distribution for AR(p) process the log Likelihood function
given by:
*
$ ,' "-".( ,')( ".* ,')* "⋯".+ ,+)(
log Likelihood 𝛼, 𝜑$ , … , 𝜑+ , 𝜎 % = − ∑ log(𝜎 % ) + .
% 0*

• Here, we maximize this function with respect to AR parameters


𝛼, 𝜑" , … , 𝜑# , 𝜎 $ . Thus, models which achieve the highest value of the log
Likelihood are preferred.
• But if models have different numbers of parameters there needs to be
some adjustment or penalty for number of estimated parameters in order
to avoid overfitting.
Model Choice using Information Criteria
• Akaike (AIC) or Schwarz (SIC) information criteria are computed based on the
negative of log Likelihood with added penalty for number of parameters included
in the model.
𝐴𝐼𝐶 𝑝 = −2 log 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 + 2𝑘/𝑇
𝑆𝐼𝐶 𝑝 = −2 log 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 + 2𝑘 log(𝑇)/𝑇

• where k is the number of estimated parameters. T is number of observations and


log likelihood is similar to the sum of squared residuals in the regression model.
• The smaller is the information criterion the better is the model.
• Moreover, the SIC criterion imposes higher penalty for extra parameters and
based on this criterion a simpler (more parsimonious) model may be selected
compared to other information criteria.
Example: Selection of p for AR(p)
• Below are estimated AR(p) models for monthly inflation using the
Maximum likelihood. By minimizing AIC we get the optimal p=11 lags
and for SIC the optimal p=2 lags:
p 1 2 3 4 5 6 7 8 9 10 11 12
k 3 4 5 6 7 8 9 10 11 12 13 14
AIC(p) 147.8 135.2 136.4 138.3 134.3 135.1 135.7 129.3 130.7 127.8 126.9* 127.9
SIC(p) 158.0 148.8* 153.5 158.8 158.2 162.3 166.4 163.3 168.2 168.7 171.2 175.6

• Since overfitted models often results in poor forecasting we select a


simpler AR(2) model based on a stricter SIC criterion.
Autoregressive Process AR(p) in differenced form

• An alternative way of writing AR(p) model is in differenced form for ∆𝑌! =


𝑌! − 𝑌!%"

∆𝑌! = 𝛼 + 𝜌𝑌!%" + 𝛾" ∆𝑌!%" + ⋯ + 𝛾#%" ∆𝑌!%#&" + 𝜀!

where coefficients in this alternative representation are functions of


original coefficients 𝜑" , 𝜑$… 𝜑#. For example, 𝜌 = 𝜑" + 𝜑$&⋯ + 𝜑# − 1 .

• Similar to AR(1) here 𝜌 = 0 implies that the time series Y contains a unit
root and exhibits stochastic trend. The stationarity condition is given by
− 2 < 𝜌 < 0.
• We can see from the differenced equation that if a time series contains a
unit root (𝜌 = 0) then a regression model involving only ∆𝑌 differenced
terms is appropriate as the term 𝑌!%" drops out of the equation.
• In case of a unit root one can difference the data (by creating ∆𝑌! ) to
create a regression with only stationary variables.
• We need to test for stationarity of the differenced series ∆𝑌! to make sure
they are indeed stationary, otherwise we need to difference one more
time till we get a stationary variable.
• The differenced form is preferable as this specification is less likely to run
into the multicollinearity problems when the right hand side variables are
highly correlated. In financial and macroeconomic time series Y is highly
correlated with lags of itself but ∆𝑌 is not.
Autoregressive Processes
Autoregressive process with time trend, t=1,2,3…
!" = $ + &' + ()!"*) + (+!"*+ + ⋯ + (-!"*- + ."
• Non-stationarity in a unit root process is exhibited in a stochastic trend behavior.
• Non-stationarity may be also due to a deterministic trend. For example, the mean
increases with time in case of population growth. When 𝑌 has no unit root and
stationarity condition −2 < 𝜌 < 0 holds after incorporating a time trend, we call
these series trend stationary.

Autoregressive process can be written in differenced form


∆"# = % + () + *"#+, + -, ∆"#+, + ⋯ + -/+,∆"#+/0, + 1#
Unit Root Test
• There are 3 cases for the specification of AR(p) model above:
Case 1: No constant term (no drift) and no linear trend in the model
Case 2: Constant term (drift) but no linear trend
Case 3: Constant term (drift) and linear trend

• The null and alternative hypotheses for the unit root.


H0: ! = 0 (UNIT ROOT)
Ha: ! < 0 (stationarity)
Augmented Dickey-Fuller (ADF) test
• When we run a linear regression for the AR(p) differenced equation we obtain
parameter estimates, standard errors, t-statistics and p-values. While it is fine to
use t-statistics and p-values for testing the significance of coefficients of drift 𝛼,
trend 𝛿 and all coefficients 𝛾4, … 𝛾C for lags of ∆𝑌, it turns out that t-ratio
computed for the coefficient 𝜌 does not follow t-distribution and the p-value
reported in the Least Squares results is incorrect.
• The t-statistics of 𝜌 under the null hypothesis of a unit root follows a DF-t
distribution and we need to carefully perform the test using different critical
values and p-values.
• The Augmented Dickey-Fuller (ADF) test uses the t-statistic from the regression
and compares it to a DF-t critical value that can be found in statistical packages.
• Example of a unit root test with AR(1) model:

o Run regression ∆"# = % + '"#() + *#


o Compare the t-ratio of to the corresponding Dickey-Fuller critical value or use p-value.
o If t-ratio > critical value or p-value>significance level, do not reject H0 of unit root (find that Y is
nonstationary)
o If t-ratio < critical value or p-value≤significance level, reject H0 of unit root (find that Y is
stationarity)
• If the unit root hypothesis 𝜌 = 0 has not been rejected, the conclusion is that at least
one unit root exists and we need to test for possible second unit root for the differenced
series ∆𝑌 until we get a stationary process.
• The critical values of ADF test depend on model specification with or without constant
and trend: Cases 1, 2 ,3.
• Many unit root packages have automatic selection of number of lags using information
criteria AIC or SIC
Unit Root Test Example
• Using p-values find that price of AAPL is non-stationary (unit root) and
return is stationary
• Below library automatically incorporates Case 3 (constant and a linear
trend)
library(tseries)
adf.test(price_AAPL)
Augmented Dickey-Fuller Test
data: price_AAPL
Dickey-Fuller = -1.2098, Lag order = 10, p-value = 0.9049
alternative hypothesis: stationary

adf.test(ret_AAPL)
Augmented Dickey-Fuller Test
data: ret_AAPL
Dickey-Fuller = -10.653, Lag order = 10, p-value = 0.01
alternative hypothesis: stationary
Case 2
• Case 2: with drift and no trend, which is type="c"
library(fUnitRoots)
adfTest(coredata(price_AAPL),lags=2, type="c")
Title:
Augmented Dickey-Fuller Test
Test Results:
PARAMETER:
Lag Order: 2
STATISTIC:
Dickey-Fuller: 0.3215
P VALUE: 0.9783
adfTest(coredata(ret_AAPL),lags=2, type="c")
Title: Augmented Dickey-Fuller Test
Test Results:
PARAMETER:
Lag Order: 2
STATISTIC: Dickey-Fuller: -19.9323
P VALUE: 0.01
Automatic lags selection (BIC same as SIC)
library(urca) adf=ur.df(y=ret_AAPL, type = "drift", selectlags = "BIC")
adf=ur.df(y=price_AAPL, type = "drift", selectlags = "BIC")
summary(adf)
summary(adf)
############################################### Value of test-statistic is: -23.8644 284.7548
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Value of test-statistic is: 0.2656 2.4133
Critical values for test statistics:
1pct 5pct 10pct
Critical values for test statistics: tau2 -3.43 -2.86 -2.57
1pct 5pct 10pct
tau2 -3.43 -2.86 -2.57

• For price of AAPL t-ratio>critical value and in fact t-ratio is positive


(0.2656>-2.57).
• For the returns of AAPL we reject the null hypothesis as t-ratio<critical
value (-23.86<-2.57).
• Thus, we confirm that AAPL stock is non-stationary and has a unit root
while the returns are stationary.
ADF Unit Root Test Results
10% DF P-
critical VALUE: Unit
t-ratio value Root
SP500 -0.276 -2.57 0.92 Yes
Nikkei 225 -1.537 -2.57 0.49 Yes
EURO/Dollar -1.940 -2.57 0.34 Yes
Unemployment -1.060 -2.57 0.66 Yes
1 Year T-Bond -2.485 -2.57 0.13 Yes
Inflation -9.794 -2.57 <0.01 NO
All above except for inflation are nonstationary and have unit root
Notes: Data for SP500, Nikkei 225 and EURO/Dollar are daily between 2014-01-02 and 2018-08-10, Unemployment,
Inflation and 1 Year T-Bond Yield are monthly between 2000-01-01and 2018-08-01.
KPSS test
• In addition to the ADF test there are other popular tests such as Phillips-Perron and Kwiatkowski-
Phillips-Schmidt-Shin (KPSS)
• Both tests are available in the urca library. Unlike other tests, in the KPSS test the null hypothesis
is that the data are stationary and the alternative is that there is a unit root. If the test statistic is
greater than the critical value, we reject the null hypothesis (and find the series is not stationary).
The results from KPSS below confirm that inflation series are stationary as the test-statistics is not
in the 10% tail.
kpss=ur.kpss(y=infl, type = c("mu"), lags = c("short"))
summary(kpss)

#######################
# KPSS Unit Root Test #
#######################
Test is of type: mu with 4 lags.
Value of test-statistic is: 0.1613
Critical value for a significance level of:
10pct 5pct 2.5pct 1pct
critical values 0.347 0.463 0.574 0.739
Moving Average processes MA(q)
• The moving average process is based on past errors (news) instead of past
observations of 𝑌 in the model. The moving average process of order q is
called MA(q) and is given by

𝑌! = 𝛼 + 𝜀! + 𝜃" 𝜀!%" + ⋯ + 𝜃+ 𝜀!%+

• Here as in the autoregressive process the error term is white noise with
Normal distribution 𝜀! ~𝑖𝑖𝑑 𝑁 0, 𝜎 $ .
• The moving average process MA(q) is stationary since it is a linear
combination of white noise variables.
AR(∞) Representation
• It is possible to write any stationary AR(p) model as an MA(∞) model. For
example, for a stationary AR(1) model in zero-mean form (subtract 𝜇 defined in
(2) from both sides of equation (1)):

𝑌2 − 𝜇 = 𝜑(𝑌234−𝜇) + 𝜀2 = 𝜀2 + 𝜑𝜀234 + 𝜑 D𝜀23DE⋯

• We can also write any invertible MA(q) process as an AR(∞) process. For
example, MA(1) process 𝑌2 = 𝛼 + 𝜀2 + 𝜃𝜀234 is invertible if |θ|<1. In this case
more recent observations have higher weight than more distant and the equation
can be written as
𝜀2 = 𝑌2 − 𝛼 − 𝜃𝜀234 = (𝑌2 −𝛼) − 𝜃(𝑌234−𝛼) + 𝜃 D(𝑌23D−𝛼) − 𝜃 G(𝑌23G−𝛼) + ⋯.
• One could use autocorrelation functions (ACF and PACF) to identify
the MA(q) process, with the pattern opposite of AR(p).
• The process may be described as MA(q) if its PACF exponentially
decays to zero (possibly with oscillations switching sign) and if there is
a significant spike at lag q in the ACF and not beyond lag q.
• But a more formal way is to use information criteria for selecting
number of lags in the MA process similarly to AR process.
Autoregressive and Moving Average Process
ARMA(p,q)
• ARMA(1,1) model captures infinite number of lags in AR(p) or MA(q)
process with only few parameters that need to be estimated in this
model:
!" = $ + &' !"(' + )" + *')"('

• ARMA (p,q) combines AR(p) process and the moving average MA(q)
process for past q lags of the news 𝜀K
𝑌K = 𝛼 + 𝜑> 𝑌KL> + ⋯ + 𝜑M 𝑌KLM + 𝜀K + 𝜃> 𝜀KL> + ⋯ + 𝜃N 𝜀KLN
• It can be estimated using maximum likelihood.
• While this model has richer time series properties tests of unit root
are similar with 𝜑 = 1 implying unit root and |𝜑| < 1 implying
stationarity. This is because the MA process with limited number of
lags is stationary.
• If 𝑌K modeled by an ARMA process is non-stationary with the unit
root it can be differenced and written as an Autoregressive Integrated
Moving Average processes ARIMA(p,d,q) model. Here d is the number
of times 𝑌K needs to be differenced to become stationary.
• If only one difference is necessary (d=1) the ARIMA(p,1,q) model is an
ARMA(p,q) model for ∆𝑌K . If the resulting ARMA model for ∆𝑌K is
stationary it can be used for further modeling and forecasting
Exponential Smoothing

• Exponential smoothing model is widely used in practice and is related to ARIMA(0,1,1) model with no
constant term and |𝜃| < 1. We can re-write the ARIMA(0,1,1) model as AR(∞)
• ARIMA(0,1,1): 𝑌" − 𝑌"#% = 𝜀" + 𝜃𝜀"#%
• AR(∞): 𝑌" = (1 + 𝜃)(𝑌"#% − 𝜃𝑌"#0 + 𝜃 0 𝑌"#2 + ⋯ ) + 𝜀" .

• The Autoregressive process above has exponentially decaying weights of previous observations 𝑌"#% , 𝑌"#0 ,
𝑌"#2 with larger weight assigned to the most recent observation. The one day forecast from this model is

𝑌M"3% = 1 + 𝜃)(𝑌" − 𝜃𝑌"#% + 𝜃 0 𝑌"#0 + ⋯ = (1 + 𝜃)𝑌" − 𝜃𝑌M"

• The exponential smoothing forecast is based on a weighted average of the most recent observation and
forecast. This model assigns weight 𝜆 to previous period forecast and (1-𝜆) to previous period observation.
𝜆 is a smoothing parameter that is restricted to take values between 0 and 1.
Exponential Smoothing
• 𝑌4KO> = 𝜆𝑌4K + (1 − 𝜆)𝑌K .
• The Exponential Smoothing model is non-stationary and the forecast
is the same for any other horizon k>1, a naïve forecast.
• For example, for k=2 we substitute the forecasted value instead of the
observed value of 𝑌KO> and get: 𝑌4KO? = 𝜆𝑌4KO> + (1 − 𝜆) 𝑌4KO> = 𝑌4KO> .
• The uncertainty of such forecast measured by the standard deviation
grows as a square root of time horizon
Up Next Time Varying Volatility
Up Next
Annualized Volatility Computation
• A common assumption is that returns are not correlated over time or
that the weak form of efficient market hypothesis approximately
holds, i.e. the returns are not predictable from past returns.
• In this case the variance over n days equals n times variance of one
day. In particular, if we want to compute variance for one year
assuming there are n=252 trading days in a year in the US market we
get annualized variance=252*daily variance. Equivalently we get
square root of variance to get standard deviation or volatility of an
asset
• 𝐴𝑛𝑛𝑢𝑎𝑙𝑖𝑧𝑒𝑑 𝑉𝑜𝑙𝑎𝑡𝑖𝑙𝑖𝑡𝑦 = 252 ∗ 𝑑𝑎𝑖𝑙𝑦 𝑣𝑜𝑙𝑎𝑡𝑖𝑙𝑖𝑡𝑦
Historical Volatility
• The historical volatility is based on a rolling standard deviation of returns 𝑟! using k days
window. As before we use log returns variable computed from adjusted stock price 𝑝! in
percentage as
B4
𝑟! = 100 log = 100(log 𝑝! − log 𝑝!"$ ).
B456
• Rolling standard deviation of returns using k days window (ignoring the mean of daily
returns, 𝜇 ):
𝜎! = 252 ∑!CD!"? 𝑟C% /𝑘
• If k is small volatility does not change much within the window. Estimated volatility
reflects only more recent news.
• If k is large average news for a long period of time and recent news becomes less
important.
• With large k get a smoother estimate of volatility
Exponential Smoothing EWMA
• Exponential smoothing or exponentially weighted moving average (EWMA) model

D
𝜎2D = 𝜆𝜎234 + 1 − 𝜆 (𝑟234 − 𝜇)D

• Like equation of exponential smoothing 𝑌O2E4 = 𝜆𝑌O2 + (1 − 𝜆)𝑌2 for the variance:
𝑌2 → (𝑟234 − 𝜇)D and 𝑌O2 → 𝜎234
D
.
• Not a stable model but is very useful in producing a 1-day forecast.
• Larger 𝜆 implies smoother volatility forecast for time t as smaller weight (1- 𝜆) is
given to the news.
• 𝜆 = .94 for S&P500 was recommended by RiskMetrics.
• 𝜆 can be estimated using the Maximum Likelihood Estimation (MLE).
EWMA Volatility Model
Example: Updating EWMA Volatility
Suppose 𝜆 = .94, 𝜇 = 0 and the model is:
𝜎2D= .06𝑟234
D D
+.94𝜎234
Today’s annualized volatility is 20% and the market return is -1%. Find the forecast
of tomorrow’s volatility from this model.
Answer: sqrt(252*(. 06*(-.01)^2+.94*.2^2/252))= 19.78%.
• First convert volatilities to daily variances
• Todays daily variance is .2^2/252=.0001587
• Todays squared return is (-.01)^2=.0001
• Tomorrows variance is =.06*.0001+.94*.0001587=0.0001552064
• Tomorrows annualized volatility is =sqrt(0.0001552064*252)=. 0.1978 or 19.78%
EWMA estimates of SP500 index volatility
Two smoothing parameters: 0.94 (in black) and 0.99 (in red).
Larger 𝜆 implies smoother volatility forecast
Returns with time-varying volatility
• Let the time series 𝑌! represent the returns 𝑟! with variance 𝜎!% that is time dependent and
exhibits volatility clustering.
• There are two equations involved in modelling: returns equation and variance equation. In the
simplest case the returns equation is assumed to have no explanatory variables, no AR or ARMA
dependence and only includes a constant term 𝜇 and error term 𝑢! representing news about the
asset price (as is expected from the EMH).
• The focus is on evolution of variance given in the second equation.

𝑌! = 𝑟! = 𝜇 + 𝑢!
𝜎!% = a time series process discussed below
• Often mean is omitted and assumed 𝜇=0 if focus is purely on variance
ARCH Model
• The Autoregressive Conditional Heteroscedasticity (ARCH) model was introduced
by Engle (1982). It produces forecast of variance at time t based on information
captured by returns innovations squared (𝑢2D = (𝑟2 − 𝜇)D) in the past p lags. The
weights of previous returns squared are estimated within a model.
• “Conditional” means that prediction for time t is based on previous information
at time t-1, thus the forecast is conditional on information available before time t.
• “Heteroscedasticity” means changing variance.

• ARCH(p) model

D D
𝜎2D = 𝜔 + 𝛼4𝑢234 + 𝛼D𝑢23D D
+ ⋯ + 𝛼C 𝑢23C
GARCH
• GARCH or Generalized ARCH model was introduced by
Bollerslev(1986).
• The forecast of variance at time t is modelled as a weighted average
?
of the long run average variance, today’s variance forecast (𝜎KL> ) and
?
today’s squared news from the returns (𝑢KL> = (𝑟KL> − 𝜇)? ).
• The GARCH model is an extension of ARCH and EWMA models. It is a
more flexible model than ARCH as with three parameters it allows
modelling of ARCH with infinite number of lags. It also allows mean
reversion unlike EWMA model.
GARCH(1,1) Volatility Model
• Generalized Autoregressive Conditional Heteroscedasticity (GARCH)
D D
𝜎2D = 𝜔 + 𝛼𝑢234 + 𝛽𝜎234
• The main properties of GARCH coefficients to satisfy a good model are:
• Positivity 𝜔 > 0, 𝛼 ≥ 0, 𝛽 ≥ 0
• Stability or mean reversion: (𝛼 + 𝛽) < 1

• GARCH parameters 𝜔, 𝛼, 𝛽 can be estimated using the MLE.


• GARCH model does not have an asymmetric (leverage) effect and its extensions
such as GJR-GARCH or EGARCH are frequently used.
GARCH(1,1) Volatility Model
• Example: Updating GARCH Volatility
Suppose the GARCH(1,1) model is (assuming 𝜇 = 0):
% %
𝜎!% =.00001+.05𝑟!"$ +.9𝜎!"$
Today’s annualized volatility is 20% and the market return is -3%. Find the forecast of tomorrow’s
volatility from this model.
Answer: sqrt(252*(.00001+.05*(-.03)^2+.9*.2^2/252))= 22.3%.
• First convert volatilities to daily variances
• Todays daily variance is .2^2/252=.0001587
• Todays squared return is (-.03)^2=.0009
• Tomorrows variance is =.00001+.05*.0009+.9*.0001587 =.0001978
• Tomorrows annualized volatility is =sqrt(.0001978*252)=.223 or 22.3%
GARCH Model as ARMA Representation

• GARCH equation can represented as an ARMA model for the squared


news from the returns 𝑢K? as dependent variable. In particular,
GARCH (1,1) model can be written as ARMA(1,1) model:

?
𝑢K? = 𝜔 + 𝛼 + 𝛽 𝑢KL> − 𝛽𝑧KL> + 𝑧K

where 𝑧K = 𝑢K? − 𝜎K? . As we can see the Autoregressive coefficient


𝛼 + 𝛽 measures the persistence of the model and this process
stationary when 𝛼 + 𝛽 < 1.
Long Run (Unconditional) Volatility
• If the GARCH model is stable satisfying the condition (𝛼 + 𝛽) < 1 then the long run forecast of variance is given by:

𝜔
𝜎% =
1−𝛼−𝛽

%
• This result can be obtained by taking expectation of the GARCH (1,1) equation and noting that 𝜎 % = 𝐸 𝑢!% = 𝐸 𝑢!"$ ,
%
and E(𝑧! ) = 𝐸 𝑧!"$ = 0. The 𝜎 variance above is also called unconditional as it does not depend on information at time
t.

% %
• For the estimated GARCH (1,1) model: 𝜎!%=.00001+.05𝑟!"$ +.9𝜎!"$

The long run (unconditional) variance: 𝜎 %= .00001/(1-.05-.9)=.0002


The unconditional annualized volatility: sqrt(.0002*252)=.224 or 22.4%
Forecasting

• Forecasts out-of-sample for volatility for 2-day based on information up to time t is based on the following
formula:
% % % % %
𝜎!,%|! = 𝜔 + 𝛼𝐸 𝑢!,$ + 𝛽𝜎!,$ = 𝜔 + 𝛼 + 𝛽 𝜎!,$ = 𝜎 % + 𝛼 + 𝛽 𝜎!,$ − 𝜎% .

% %
• Here 𝐸 𝑢!,$ is conditional forecast (expectation) of 𝑢!,$ based on information at time t. This is the same as
%
conditional variance forecast 𝜎!,$ . Instead of 𝜔 we plug in its expression through the long run unconditional
.
variance: 𝜎 % = $"/"0 → 𝜔 = 𝜎 % − 𝛼 + 𝛽 𝜎 %

• Using similar re-arrangement, we can derive a formula for k-day forecast conditional on time t

% %
𝜎!,1|! = 𝜎% + 𝛼 + 𝛽 1"$
𝜎!,$ − 𝜎% .
• When k is large the forecast converges to long run unconditional variance 𝜎 % if the model is stationary and
𝛼 + 𝛽 < 1.
% %
• As a special case of 𝛼 + 𝛽 = 1 any future forecast k steps ahead from time t is 𝜎!,1|! = 𝜎!,$ . Such model does
not have long run mean reversion. It behaves like random walk.
GARCH(p,q) Model

• GARCH(p,q) model includes p lags for ARCH term (past returns


squared) and q lags for GARCH term (past conditional variance)

? ?
𝜎K? = 𝜔 + 𝛼> 𝑢KL> ?
+ ⋯ + 𝛼M 𝑢KLM + 𝛽> 𝜎KL> ?
+ ⋯ + 𝛽N 𝜎KLN
• In order to find optimal number of lags for fitting the model to data---
information criteria are commonly used.
Maximum Likelihood
• Normal distribution for modeling returns the procedure involves maximizing the log Likelihood
function given by
% ()# *+)"
log Likelihood 𝜔, 𝛼, 𝛽 ∝ − ∑ log(𝜎'& ) + .
& -#"

• maximize this function with respect to GARCH parameters 𝜔, 𝛼, 𝛽 that are part of 𝜎'& equation as
specified in GARCH equation.
• The Maximum Likelihood method maximizes the probability of observing actual returns using for
probability density function for returns with some specific distributional assumption, such as Normal or
t distribution.
• Once the parameter estimates are found based on the optimum, the asymptotic standard errors of
parameters are calculated using an assumption that for large number of observations the distribution
of each parameter will converge to Normal.
• This method is also called Quasi Maximum Likelihood since it says that even though returns themselves
may be not Normal, the parameters 𝜔, 𝛼, 𝛽 converge to Normal for large sample.
Estimating and Testing GARCH for SP500
returns
Results for GARCH (1,1) Model with Normal distribution

library(fGarch)
m2=garchFit(~garch(1,1),data=y, trace=F) # Fit an GARCH(1,1) model
summary(m2)

Results for GARCH (1,1) Model


Estimate Std. Error t-value Pr(>|t|)
mu 0.060911 0.012611 4.830 1.36e-06 ***
omega 0.021526 0.003388 6.354 2.10e-10 ***
alpha1 0.117153 0.011566 10.129 < 2e-16 ***
beta1 0.862849 0.012369 69.759 < 2e-16 ***
------
Information Criterion Statistics:
AIC BIC SIC HQIC
2.593567 2.600760 2.593565 2.596137
GARCH (1,1) with Normal Distribution Results
• All GARCH parameters are positive and the model is stable since
alpha+beta=.117+.863=.98<1
• Schwarz Information Criterion SIC =2.594.
• When GARCH models are estimated the method is based on maximizing the log likelihood
function for observing actual returns in the data. Models which achieve the highest value of
the log likelihood are preferred.
• But if models have different numbers of parameters there needs to be some adjustment or
penalty for number of estimated parameters in order to avoid overfitting. The information
criteria are computed based on the negative of log likelihood with added penalty for number
of parameters included in the model.
• The smaller is the information criterion the better is the model. Moreover, the SIC criterion
compared to AIC imposes higher penalty for extra parameters. Based on minimum SIC
criterion a simpler (more parsimonious) model may be selected.
GARCH (1,1) with t-distribution Results
m3= garchFit(~ garch(1,0), data = y, trace=F) # Fit an ARCH(1) model
m5=garchFit(~garch(1,1), data = y, trace=F, cond.dist="std") # Fit
GARCH(1,1) with t-dsitribution

Results for GARCH (1,1) Model with t-distribution


Estimate Std. Error t value Pr(>|t|)
mu 0.07663 0.01135 6.749 1.48e-11 ***
omega 0.01265 0.00348 3.634 0.000279 ***
alpha1 0.12077 0.01492 8.095 6.66e-16 ***
beta1 0.87709 0.01381 63.531 < 2e-16 ***
shape 5.22991 0.50649 10.326 < 2e-16 ***

Information Criterion Statistics:


AIC BIC SIC HQIC
2.536763 2.545753 2.536758 2.539975
GARCH (1,1) with t-distribution Results
• Parameter shape measures degrees of freedom of fitted t
distribution.
• Degrees of freedom= 5.2, thus distribution of residuals is fat-tailed.
• The smaller is the number of degrees of freedom the more fat-tailed
is distribution.
• Using minimum of Schwarz Information Criterion GARCH model with t
distribution is selected over a model with Normal distribution (here
SIC=2.537).
SP500 GARCH (1,1) Volatility
• plot annualized
GARCH(1,1) volatility
over time
[email protected]*sqrt(252)

We observe the highest peak of


volatility at the end of 2008
and next peak in August 2011
when the US Government shutdown
and downgrade of the Treasury bonds
rating happened
VLAB GARCH volatility http://vlab.stern.nyu.edu

• https://vlab.stern.nyu.edu/analysis/VOL.SPX%3AIND-R.GARCH
Model Diagnostic
• Finally, in order to make sure that GARCH model well describes the financial data and captures well time
varying volatility in volatility clusters we create a new variable:

𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑖𝑧𝑒𝑑 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠 = 𝜀" = (𝑟" −𝜇)/𝜎"


• They should not be autocorrelated and should not exhibit volatility clustering. Use the Ljung Box Q test
(output from GARCH).

Standardised Residuals Tests for GARCH (1,1):


Statistic p-Value
Ljung-Box Test R Q(10) 26.74455 0.002857889
Ljung-Box Test R^2 Q(10) 12.02807 0.2831826
• The p-value for R^2 is above 5% or any reasonable significance level. Thus, there is no volatility clustering in
standardized residuals.
• The p-value for R is much smaller than 5%. Thus, there is autocorrelation in residuals that may be analyzed
further using autoregressive model or perhaps there are other important factors omitted from the returns
equations.
Normality assumption
• Are there fat tails compared to the SP500 returns themselves?
library(fBasics)
basicStats(y) #<== Compute descriptive statistics of returns
Skewness -0.375528
Kurtosis 11.951403
basicStats(resi) #<== Compute descriptive statistics of Standardized residuals
Skewness -0.557790

Kurtosis 1.923420

• The fourth moment measured by excess kurtosis goes down from about 12 for returns to 1.9 for standardized residuals
(the kurtosis itself goes down from 12+3=15 to 1.9+3=5). The negative skewness (asymmetry) remains in the residuals.
• Thus, GARCH model explains some of the extremes found in returns data as measured by kurtosis. After finding
standardized residuals (filtering GARCH or “de-garching”) the kurtosis is reduced three times but still is above Normal
distribution kurtosis=3, so the residuals are still fat tailed. One could fit a t-distribution to get a better fit for standardized
residuals.
• As for negative skewness a better model for capturing this phenomenon (also called risk aversion) is an asymmetric
volatility model that forecasts higher volatility for negative news compared to positive news. GJR-GARCH or EGARCH
model are examples.
Forecasting k-days ahead
• If the GARCH model is stable in the long run volatility should converge to its
unconditional level.
• If current level of volatility is below unconditional level the GARCH prediction for
volatility will be to go up (see graph for 1 year forecast in 2018/7/30) . Otherwise,
it will go down.
• SP500 volatility is expected to rise in the
next year. This is because current level of
volatility is below its long run value.

k=252 # 1 year forecast


sigf=predict(m2,k)

You might also like