Autocorrelation
Autocorrelation
Autocorrelation
u
ui
Asif Tariq
What is Autocorrelation?
• In cross-section studies, data are often collected on the basis of a random sample of cross-
sectional units, such as households so that there is no prior reason to believe that the error
term pertaining to one household is correlated with the error term of another household.
• The situation, however, is likely to be very different if we are dealing with time series data,
for the observations in such data follow a natural ordering over time so that successive
observations are likely to exhibit intercorrelations.
• In the regression context, the classical linear regression model assumes that such
autocorrelation does not exist in the disturbances ui .
• Symbolically:
𝑪𝒐𝒗(𝒖𝒊 , 𝒖𝒋 ) = 0 for 𝑖 ≠ 𝑗
• Put simply, the classical model assumes that the disturbance term relating to any
observation is not influenced by the disturbance term relating to any other observation.
Examples of no autocorrelation
• For example,
Quarterly time series data involving the regression of output on labor and capital
inputs- and if, say, there is a labor strike affecting output in one quarter, there is no reason to
believe that this disruption will be carried over to the next quarter. That is, if output is lower
this quarter, there is no reason to expect it to be lower next quarter.
• Symbolically:
Cov(𝒖𝒊 , 𝒖𝒋 ) ≠ 0 for 𝑖 ≠ 𝑗
• In this situation, the disruption caused by a strike this quarter may very well affect output
next quarter, or the increases in the consumption expenditure of one family may very well
prompt another family to increase its consumption expenditure if it wants to keep up with
the Joneses.
Autocorrelation vs Serial correlation
• Tintner defines autocorrelation as “lag correlation of a given series with itself, lagged by a
number of time units,’’ where as he reserves the term serial correlation to define “lag
correlation between two different series.’’
• Thus, correlation between two time series such as u1 , u2 , . . . , u10 and u2 , u3 , . . . , u11 , where
the former is the latter series lagged by one time period, is autocorrelation, whereas
correlation between time series such as u1 , u2 , . . . , u10 and v2, v3, . . . , v11 , where u and v are
two different time series, is serial correlation.
Why autocorrelation?
• There are several reasons:
• In recession, when economic recovery starts, most of the variables start moving
upward.
• In this upswing, the value of a series at one point in time is greater than its previous
value. Thus there is a “momentum’’ built into them, and it continues until something
happens (e.g., increase in interest rate or taxes or both) to slow them down.
• Now if Eq. (1) is the true relation, running Eq. (2) is tantamount to letting vt = b3X3t + ut
• And to the extent the price of Thums-up affects the consumption of Pepsi, the error or
disturbance term v will reflect a systematic pattern, thus creating (false) autocorrelation.
• A simple test of this would be to run both Eqs. (1) and (2) and see whether autocorrelation, if
any, observed in model (2) disappears when model (1) is run.
Specification Bias
• Incorrect Functional Form: Suppose the “true’’ or correct model in a cost-output study is as follows:
• A regression such as Eq. (5) is known as autoregression because one of the explanatory
variables is the lagged value of the dependent variable.
• Now if we neglect the lagged term in Eq. (5), the resulting error term will reflect a
systematic pattern due to the influence of lagged consumption on current consumption.
Data transformation
• As an example of this, consider the following model:
Yt = b0 + b1Xt + ut (6)
• Since Eq. (6) holds true at every time period, it holds true also in the previous time period, (t − 1).
So, we can write Eq. (6) as:
where ∆ is known as the first difference operator, tells us to take successive differences of the
variables in question.
Data transformation
• For empirical purposes, we write Eq. (8) as:
• Now if the error term in Eq. (6) satisfies the standard OLS assumptions, particularly the
assumption of no autocorrelation, it can be shown that the error term vt in Eq. (9) is
autocorrelated.
• The point of this example is that sometimes autocorrelation may be induced as a result of
transforming the original model.
• It may be noted here that models like Eq. (9) are known as dynamic regression models, that
is, models involving lagged regressands.
Nonstationarity
• While dealing with time series data, we may have to find out if a given time series is
stationary.
• Loosely speaking, a time series is stationary if its characteristics (e.g., mean, variance, and
covariance) are time invariant; that is, they do not change over time. If that is not the case,
we have a nonstationary time series.
• In a regression model such as Eq. (6), it is quite possible that both Y and X are nonstationary
and therefore the error u is also nonstationary. In that case, the error term will exhibit
autocorrelation.
Consequences
• Unbiasedness: Autocorrelation does not remove the unbiasedness property for OLS estimators.
• Bestness: OLS estimators are no longer best because they don’t have minimum variance. Hence they
are not BLUE.
• If we disregard the problem of autocorrelation and believe that all usual assumptions are valid,
following problems will arise:
• True variances of the OLS estimators increase. Estimated variances of OLS estimators are smaller
(biased downwards).
• The standard error of the estimated slope coefficient(s) will be much smaller if it is computed
using OLS formula.
• In most of the cases, R2 will be overestimated (indicating a better fit than the one that truly exists).
• The usual t and F statistic tend to be higher and thus become invalid.
Detection- Durbin Watson Test
• The most celebrated test for detecting serial correlation is that developed by statisticians Durbin
and Watson.
σ𝐭=𝐧
𝐭=𝟐(ෝ 𝐮𝐭−𝟏 )𝟐
𝐮𝐭 −ෝ
𝐝= (10)
σ𝐭=𝐧 ෝ𝐭𝟐
𝐭=𝟏 𝐮
which is simply the ratio of the sum of squared differences in successive residuals to the RSS.
• Note that in the numerator of the d statistic the number of observations is n − 1 because one
observation is lost in taking successive differences.
• A great advantage of the d statistic is that it is based on the estimated residuals, which are
routinely computed in regression analysis.
• Because of this advantage, it is now a common practice to report the Durbin–Watson d along
with summary measures, such as R2, adjusted R2, t, and F.
Detection- Durbin Watson Test
• The limits of d are 0 and 4.
ෝ 𝒕 𝟐 +σ 𝒖
σ𝒖 ෝ 𝟐𝒕−𝟏 − 𝟐 σ 𝒖
ෝ𝒕𝒖
ෝ 𝒕−𝟏
𝒅= (11)
σ𝒖ෝ𝒕𝟐
ෝ 𝒕 𝟐 and σ 𝒖
• Since σ 𝒖 ෝ 𝟐𝒕−𝟏 differ in only one observation, they are approximately equal.
ෝ 𝒕 𝟐 we get;
ෝ 𝟐𝒕−𝟏 ≈ σ 𝒖
• Therefore setting σ 𝒖
ෝ𝒕𝟐− 𝟐 σ 𝒖
𝟐σ𝒖 ෝ𝒕𝒖
ෝ 𝒕−𝟏
𝒅≈ (12) Where ≈ means approximately equal to.
ෝ𝒕𝟐
σ𝒖
σ𝒖
ෝ𝒕𝒖
ෝ 𝒕−𝟏
𝒅 ≈ 𝟐 (𝟏 − ) (13)
ෝ𝒕𝟐
σ𝒖
Detection- Durbin Watson Test
• Now let us define
σ𝒖
ෝ𝒕𝒖
ෝ 𝒕−𝟏
ෝ=
𝝆 (14)
ෝ𝒕𝟐
σ𝒖
ෝ)
𝒅 ≈ 𝟐 (𝟏 − 𝝆 (15)
• But since −𝟏 ≤ 𝝆
ෝ ≤ 𝟏, eq. (15) can be written as:
𝟎≤𝒅≤𝟒
• These are the bounds of d i.e., any estimated d-value must lie within these limits.
Durbin Watson Test- Decision rule
• It is apparent from Eq. (15) that if 𝛒
ෝ = 0, d = 2; that is, if there is no serial correlation, d is expected to be
about 2. Therefore, as a rule of thumb, if d is found to be 2 in an application, one may assume that
there is no first-order autocorrelation, neither positive nor negative.
• If 𝛒
ෝ = +1, indicating perfect positive correlation in the residuals, d ≈ 0. Therefore, the closer d is to 0,
ෝ → +1, d → 0.
the greater the evidence of positive serial correlation. i.e. as 𝛒
• If 𝛒
ෝ = -1, that is, there is perfect negative correlation among successive residuals, d ≈ 4. Hence, the
ෝ → -1, d → 4.
closer d is to 4, the greater the evidence of negative serial correlation. i.e. as 𝛒
• In summary
If d = 2 → No autocorrelation
If d = 0 → Positive autocorrelation
If d = 4 → Negative autocorrelation
Durbin-Watson Test – Decision Rule
H0 : = 0 H0 : = 0
No autocorrelation No autocorrelation
H1 : > 0 H1 : < 0
Positive autocorrelation Negative autocorrelation
Reject H0 Reject H0
Evidence of
Evidence of
Positive AC Negative AC
Don’t reject H0
Zone of Zone of
indecision indecision
DW
dL dU 2 4-dU 4-dL 4
0 (d)
Durbin-Watson Test – The Crux
• In an OLS regression:
Y = 0 + 1 X1 + 2 X2 …… + k Xk + ut
• Obtain 𝐮
ෝ𝐭 .
• Assuming AR(1) process of errors:
𝐮 ෝ 𝐭−𝟏 + εt
ෝ𝐭 = 𝛒 𝐮 -1 < 𝛒 < 1
H0 : = 0 no autocorrelation
H1 : ≠ 0 autocorrelation (Either positive or negative)
• Calculate d-statistic as in equation (10).
• Compare d-statistic with d-critical.
• For finding d-critical for ,
• α (1% or 5%).
• k (number of parameters excluding constant), and
• n are needed.
Durbin-Watson Test - Steps
• The mechanics of the Durbin–Watson test are as follows:
• Step 1: Run the OLS regression and obtain the residuals.
• Step 2: Compute d-statistic from Eq. (10). (Most computer programs now do this
routinely.)
• Step 3: Now follow the decision rule and make the conclusion.
▪ Decision rule:
If Durbin-Watson d is close to 4 → Negative AC
If Durbin-Watson d is close to 0 → Positive AC
If Durbin-Watson d is close to 2 → No AC
Remedy
• The remedy depends on the knowledge one has about the nature of interdependence among
the disturbances, that is, knowledge about the structure of autocorrelation.
• When ρ is Known.
• Generalised least squares.
• When ρ is not Known.
• The First-Difference Method.
• Cochrane–Orcutt two-step procedure.
• Cochrane–Orcutt iterative procedure.
• Durbin two–step procedure, and
• Hildreth–Lu scanning or search procedure.
Remedy – HAC Standard Errors
• If autocorrelation is present in our data, we can still apply the OLS method and obtain the
corrected standard errors which are known as Heteroscedasticity and Autocorrelation
consistent (HAC) standard errors or Newey–West standard errors*.
• Most standard software packages, including EViews, enable us to compute the HAC standard
errors. However, it is to be remembered that the HAC standard errors may not be valid when
the sample size is small.