Differencing and Unit Root Tests
Differencing and Unit Root Tests
In the Box-Jenkins approach to analyzing time series, a key question is whether to difference the
data, i.e., to replace the raw data {xt } by the differenced series {xt xt 1}. Experience indicates that
most economic time series tend to wander and are not stationary, but that differencing often yields a
stationary result. A key example, which often provides a fairly good description of actual data, is the
random walk, xt = xt 1 + t , where {t } is white noise, assumed here to be independent, each having the
same distribution (e.g., normal, t , etc.). The random walk is said to have a unit root.
To understand what this means, lets recall the condition for stationarity of an AR (p ) model. In
xt = 1xt 1 + 2xt 2 + . . . + p xt p + t
will be stationary if the largest root of the equation (in the complex variable z )
z p = 1z p 1 + 2z p 2 + . . . + p 1z + p (1)
satisfies e e < 1 . So stationarity is related to the location of the roots of Equation (1).
We can think of the random walk as an AR (1) process, xt = xt 1 + t with = 1. But since it has
= 1, the random walk is not stationary. Indeed, for an AR (1) to be stationary, it is necessary that all
roots of the equation z = have "absolute value" less than 1. Since the root of the equation z = is
just , we see we see that the AR (1) is stationary if and only if 1 < < 1. For the random walk, we
have a unit root, that is, a root equal to one. The first difference of a random walk is stationary, how-
In general, we say that a time series {xt } is integrated of order 1, denoted by I (1), if {xt } is not
stationary but the first difference {xt xt 1} is stationary and invertible. If {xt } is I (1), it is considered
important to difference the data, primarily because we can then use all of the methodologies developed
for stationary time series to build a model, or to otherwise analyze, the differenced series. This, in turn,
improves our understanding (e.g., provides better forecasts) of the original series, {xt }. For example, in
the Box-Jenkins ARIMA (p , 1 , q ) model, the differenced series is modeled as a stationary ARMA (p , q )
process. In practice, then, we need to decide whether to build a stationary model for the raw data or for
-2-
More generally, there is the question of how many times we need to difference the data. In the
order d , denoted by I (d ), where d is an integer with d 1, if the series and all its differences up to
the d 1st are nonstationary, but the d th difference is stationary. A series is said to be integrated of
order zero, denoted by I (0), if the series is both stationary and invertible. (The importance of inverti-
bility will be discussed later). If the series {xt } is I (d ) with d 1, then the differenced series {xt xt 1}
is I (d 1).
For an example of an I (2) process, consider the AR (2) series xt = 2xt 1 xt 2 + t . This process is
not stationary. Equation (1) becomes z 2 = 2z 1, that is, z 2 2z + 1 = 0. Factoring this gives
(z 1) (z 1) = 0, so the equation has two unit roots. Since the largest root (i.e., one) does not have
"absolute value" less than one, the process is not stationary. It can be shown that the first difference is
xt xt 1 [xt 1 xt 2] = xt 2xt 1 + xt 2 ,
which is equal to t by the definition of our AR (2) process. Since the second difference is white noise,
{xt } is an ARIMA (0 , 2 , 0). Since the second difference is stationary, {xt } is I (2). In general, for any
ARIMA process which is integrated of order d , Equation (1) will have exactly d unit roots. In practice ,
however, the only integer values of d which seem to occur frequently are 0 and 1. So here, we will
limit our discussion to the question of whether or not to difference the data one time.
If we fail to take a difference when the process is nonstationary, regressions on time will often
yield a spuriously significant linear trend, and our forecast intervals will be much too narrow (optimis-
tic) at long lead times. For an example of the first phenomenon, recall that for the Deflated Dow Jones
series, we got a t -statistic for the slope of 5.27 (creating the illusion of a very strong indication of
trend), but the mean of the first differences was not significantly different from zero. For an example
of the second phenomenon, lets compare a random walk with a stationary AR (1) model. For a random
which goes to as h increases. The width of the forecast intervals will be proportional to hdd , indicat-
ing that our uncertainty about the future value of the series grows without bound as the lead time is
increased. On the other hand, for the stationary AR (1) process xt = xt 1 + t with 1 < < 1, the best
linear h -step forecast is f n , h = h xn , which goes to zero as h increases. The variance of the forecast
error is var [xn +h h xn ], which tends to var [xt ], a finite constant. So as the lead time h is increased,
the width of the h -step prediction intervals grows without bound for a random walk, but remains
bounded for a stationary AR (1). Clearly, then, if our series were really a random walk, but we failed to
difference it and modeled it instead as a stationary AR (1), then our prediction intervals would give us
much more faith in our ability to predict at long lead times than is actually warranted.
It is also undesirable to take a difference when the process is stationary. Problems arise here
because the difference of a stationary series is not invertible, i.e., cannot be represented as an AR ().
For example, if xt = .9xt 1 + t , so that {xt } is really a stationary AR (1), then the first difference {zt } is
the non-invertible ARMA (1 , 1) process zt = .9zt 1 + t t 1, which has more parameters than the origi-
nal process. (Recall that an ARMA (p , q ) is invertible if the largest root of the equation
non-invertibility of {zt }, its parameters will be difficult to estimate, and it will be difficult to construct a
Ideally, then, what we would like is a way to decide whether the series is stationary, or integrated
of order 1. A method in widespread use today is to declare the series nonstationary if the sample auto-
correlations decay slowly. If this pattern is observed, then the series is differenced and the autocorrela-
tions of the differenced series are examined to make sure that they decay rapidly, thereby indicating
that the differenced series is stationary. This method is somewhat ad hoc , however. What is really
needed is a more objective way of deciding between the two hypotheses, I (0) and I (1), without making
-4-
any further assumptions. Unfortunately, each of these hypotheses covers a vast range of possibilities,
and any classical approach to discriminate between them seems doomed to failure unless we limit the
A test involving much more narrowly-specified null and alternative hypotheses was proposed by
Dickey and Fuller in 1979. In its most basic form, the Dickey-Fuller test compares the null hypothesis
H 0 : xt = xt 1 + t ,
i.e., that the series is a random walk without drift, against the alternative hypothesis
H 1 : xt = c + xt 1 + t ,
where c and are constants with e e < 1. According to H 1, the process is a stationary AR (1) with
xt = (1) + xt 1 + t ,
so that
xt = (xt 1 ) + t .
Note that by making the random walk the null hypothesis, Dickey and Fuller are expressing a prefer-
ence for differencing the data unless a strong case can be made that the raw series is stationary. This is
consistent with the conventional wisdom that, most of the time, the data do require differencing. A
Type I error corresponds to deciding the process is stationary when it is actually a random walk. In this
case, we will fail to recognize that the data should be differenced, and will build a stationary model for
our nonstationary series. A Type II error corresponds to deciding the process is a random walk when it
is actually stationary. Here, we will be inclined to difference the data, even though differencing is not
desirable.
We should mention two additional important differences between the AR (1) and the random
walk. Whereas the innovation t has a temporary (exponentially decaying) effect on the AR (1), it has a
permanent effect on the random walk. Whereas the expected length of time between crossings of is
-5-
finite for the AR (1) (so the AR (1) fluctuates around its mean of ), the expected length of time between
crossings of any particular level is inf inite for the random walk (so the random walk has a tendency to
The Dickey-Fuller test is easy to perform. Given data x 1 , . . . , xn , we run an ordinary linear
regression of the observations (x 2 , . . . , xn ) of the "dependent variable" {xt }, against the observations
(x 1 , . . . , xn 1) of the "independent variable" {xt 1}, together with a constant term. Under both H 0 and
xt = c + xt 1 + t ,
and H 0 corresponds to = 1, c = 0.
= ( 1)/s ,
where s is the estimated standard error for . Note that is easy to calculate, since and s can be
obtained directly from the output of the standard computer regression packages.
For the Deflated Dow data, regressing x 2 , . . . , x 547 on x 1 , . . . , x 546, we obtain the following
regression output:
The R-Square statistic is .9907, indicating a very strong linear relationship between {xt } and
Note that we do NOT use the t statistic (240.8140) from the output, since this was computed relative to
The statistic can be used to test H 0 versus H 1. The percentiles of under H 0 are given in the
attached table. The null hypothesis is rejected if is less than the tabled value. The tabulations for
finite n were based on simulation, assuming t are iid Gaussian. The tabled values for the asymptotic
distribution (n = ) are valid as long as the t are iid with finite variance. (No Gaussian assumption is
needed here.) It should be noted that does not have a t distribution in finite samples, and does not
have a standard normal distribution asymptotically. In fact, the asymptotic distribution is longer-tailed
than the standard normal. For example, the asymptotic .01 percentage point of is at 3.43, instead of
2.326 for a standard normal. Thus, use of the standard normal table would result in an excess of spuri-
For the Deflated Dow data, we obtained = .9024, which is not significant according to the
Table. So we are not able to reject the random walk hypothesis. As usual in statistical hypothesis test-
ing, this does not mean that we should conclude that the series is a random walk. In fact, from our ear-
lier analysis we have strong statistical evidence that the series is not a random walk, since the lag-1
autocorrelation for the first differences is highly significant. All we can conclude from the Dickey-Fuller
test is that there is no strong evidence to support the hypothesis H 1 that the series is a stationary AR (1).
This is the type of alternative that the test was designed to detect. The question of whether the first
difference has any autocorrelation is another issue altogether, and the test was not designed to detect
this type of failure of the random walk hypothesis. In any case, the results of the test indicate that it
would be a good idea to difference the data. We could have come to this same conclusion by examin-
ing the ACF of the raw data, but the Dickey-Fuller test provides a more objective basis for making this
decision.
As an illustration of the long tails in the distribution, consider the random walk data (n = 547)
which was used in the last handout for comparison with the Dow and Deflated Dow series. For this
We therefore get = (.9913 1)/.0057 = 1.53. If had a standard normal distribution, we would
obtain a p -value of .063 (one-sided), indicating some evidence in favor of the alternative hypothesis
(that the series is a stationary AR (1)). Of course, we know that this series was in fact a random walk,
and so it is somewhat distressing that we are almost being led to commit a Type I error. But when we
use the true distribution of under the null hypothesis (see table) we find that the actual significance
level is substantially greater than .10, although the table is not precise enough to allow us to find the
exact p -value.
Of course, the null and alternative hypotheses H 0 and H 1 described above are too narrow to be
very useful in a wide variety of situations. Often, we will want to consider differencing the data
because we hope the difference may be stationary, but we do not want to commit ourselves to the
assumption that the series is either a random walk or a stationary AR (1). Fortunately, although we will
not describe the details here, there is a similar test known as the Augmented Dickey-Fuller test, which
p 0 is known. If p = 1, for example, the null hypothesis would be that the series is nonstationary, but
its first difference is a stationary AR (1); the alternative hypothesis would be that the series is a station-
ary AR (2). In retrospect, it seems that the Deflated Dow series is better described by the above null
hypothesis than by the one which was actually tested, i.e., the random walk. But it is never a good idea
to change a statistical hypothesis after looking at the data; it can destroy the validity of the test. Furth-
ermore, the use of the random walk as a null hypothesis for financial time series seems wise as a gen-
eral rule.
In the ordinary Dickey-Fuller () test, the series is assumed to be free of deterministic trend,
under both the null and alternative hypotheses. Many actual series do have trend, however, and it is of
-8-
interest to study the nature of this trend. Perhaps the most important issue is the way in which the trend
is combined with the random aspects of the series. In the case of a random walk with drift
xt = c + xt 1 + t where {t } is zero mean white noise, there is a mixture of deterministic and stochastic
trend, the process has a unit root, and the forecast intervals grow without bound as the lead time
increases. Differencing {xt } yields a stationary series, so {xt } is said to be difference stationary. (This
Another way to combine trend and randomness is to start with a deterministic linear trend and
bury it in white noise: xt = 0 + 1t + t . This is a standard linear regression (trend-line) model, which
can be analyzed without using time series methods. If the parameters (0 , 1 , var [t ]) are known, then
the forecast of xn +h is simply f n , h = 0 + 1(n +h ). If the t are normally distributed, a forecast interval
x t = 0 + 1t + y t
formed by adding a deterministic linear trend to a stationary, invertible, zero mean "noise" series {yt } is
said to be trend stationary. Trend stationary series do not contain a unit root. The width of their fore-
ddddd
cast intervals for large h is 2 z /2 var yt , which does not tend to infinity. Trend stationary series are
not difference stationary, since it can be shown that the difference of {yt } is not invertible. Since the
trend stationary series obeys a regression model with autocorrelated errors, we can use generalized least
squares (a popular linear regression technique) to estimate the trend and assess its statistica l
significance.
Here, we show how to test a specific form of difference stationarity against a specific form of
trend stationarity, using a variant () of the Dickey-Fuller test. The null hypothesis is
H 0 : xt = c + xt 1 + t ,
H 1 : xt = 0 + 1t + yt ; yt = yt 1 + t .
-9-
Under H 1, {xt } is trend stationary, and the "noise" term is AR (1). If we put = 0, then we get the
xt = 0 + 1t + xt 1 + t , (2)
where 0 and 1 are constants. (Specifically, 0 = 0(1 ) + 1 and 1 = 1(1 ).) If we put = 1,
xt = 1 + xt 1 + t ,
i.e., a random walk with drift. Thus, we want to test the null hypothesis that = 1 versus the alternative
To perform the test, we run an ordinary linear regression of the "dependent variable" {xt } against
the explanatory variables time (t ) and {xt 1}, together with a constant term. The observations on {xt }
1
= hhhhh .
s
Although this may appear to be the same as the ordinary Dickey-Fuller statistic , it is actuall y
different because of the presence of time as an explanatory variable. The percentiles of under the null
hypothesis ( = 1) are given in the attached table. The null hypothesis is rejected if is less than the
tabled value. The percentiles of are considerably less than the corresponding percentiles of , indi-
cating the effects of including time as an explanatory variable. For example, the asymptotic .01 percen-
The log10 Dow data seems to contain a trend, but what is the nature of this trend? Would it be
more appropriate to model this data as a random walk with drift, or as a trend line plus stationary
AR (1) errors? In our original analysis of this data, we first tried an ordinary trend-line model, and found
a highly significant trend. We then questioned the validity of this finding, since the Durbin-Watson
statistic showed strong error autocorrelation. We could have pursued the use of a trend stationary model
(i.e., linear trend plus autocorrelated errors) for this series, by re-estimating the trend line using general-
- 10 -
ized least squares. This still would not have answered the question as to whether such a model is more
appropriate than a random walk with drift, however. To address this question, we now run the test.
where Time denotes (2 , . . . , 547), x.lag is (x 1 , . . . , x 546) and the dependent variable is
Since this is not less than the tabled value of 3.42, we do not reject the null hypothesis of random
walk with drift at level .05. In fact, examination of the table reveals that our observed is not small at
all, with a p -value around .9, indicating that there is virtually no evidence in favor of trend stationarit y
for this series. This does not mean that the log10 Dow data is actually a random walk with drift.
(Indeed, we previously found strong evidence that the differences of this data are not uncorrelated, even
though they seem to have a nonzero expectation.) It just means that we cannot reject the random walk