Econometria 2
Econometria 2
2) Discuss the assumptions underlying the classical linear regression model (CLRM) and the properties
of the OLS estimators.
3) What is meant for “autocorrelation”? Discuss the Durbin-Watson statistic.
4) What are the differences between autoregressive and moving average models? Please include
equations to justify your answer.
5) Modelling and forecasting stock market volatility has been the subject of vast empirical and
theoretical investigations.
6) Discuss the properties of Garch models.
7) Simultaneous equation models: discuss the difference between the structural form and reduced
form.
8) Building ARMA models (the box Jenkins Approach). Please discuss.
9) How do we measure multicollinearity and what are the solutions to the problem of multicollinearity
10) What is the difference between in-sample and out-sample forecasting?
11) What is the heteroskedasticity? Discuss whether that is an issue and how to solve it.
12) Discuss the vector autoregressive model.
Answers:
If a random sample of size N: y1, y2, y3, …, yN is drawn from a population that is normally distributed with
mean μ and variance σ2, the sample mean, Y is also normally distributed with mean μ and variance o²/N. In
fact, an important rule in statistics known as the central limit theorem states that the sampling distribution
of the mean of any random sample of observations will tend towards the normal distribution with mean
equal to the population mean, μ, as the sample size tends to infinity. This theorem is a very powerful result
because it states that the sample mean, Y will follow a normal distribution even if the original observations
(Y₁, Y2... YN) did not. This means that we can use the normal distribution as a kind of benchmark when
testing hypotheses.
2) Discuss the assumptions underlying the classical linear regression model (CLRM) and the properties
of the OLS estimators.
Regression analysis is a very important tool because the regression is concerned with describing and
evaluating the relationship between a given variable (y, the dependent variable) and one or more other
variables (x’s, the independent variables).
In fact, one of the goals of econometricians is to understand what kind of relationship there is between
variables. To describe it, we need a straight line that best fits the data. To found it, we may use the equation
of the line (y= α+β) and add a random disturbance term (that make the model more realistic). Now the
equation to considerer is yt=α+βx+ut. where α and β are the parameters of the model, ut is the random
disturbance term.
Α and β, the parameters of the model, are identified in such a way as to minimize the vertical distances
between the data points and the fitted lines. This process can be done ''by eye,'' but it runs the risk of being
tedious and inaccurate. For this reason, we can use the method known as ordinary least squares (OLS),
which is a technique that aims to find a function (represented by a curve) that is as close as possible to the
set of data being considered.
In order to use OLS, a model that is linear is required (the model must be linear in the parameters, α and β,
but not necessarily in the variables, x and y.
The model yt= α+β+ut together with the assumption listed below, is known as classical linear regression
model (CLRM). We observe data for xt, but since yt also depends on ut (the unobservable error terms) we
must be specific about how the ut are generated. We usually make the following set of assumptions about
the ut’ s:
As long as assumption 1 holds, assumption 4 can be equivalently written E (xt ut). Both formulations imply
that the regressor is orthogonal to (i.e. unrelated to) the error term.
An alternative assumption to 4, which is slightly stronger, is that the xt’s are non-stochastic or fixed in
repeated samples.
A fifth assumption is required to make valid inferences about the population parameters (the actual α and
β) from the sample parameters (α^ β^) estimated using a finite amount of data, namely that the
disturbance follow a normal distribution.
If assumption 1-4 hold, then the estimators α^ and β^ determined by OLS will have a number of desirable
properties and are known as best linear unbiased estimators (BLUE). This acronym means:
Under assumptions 1-4 listed above, the OLS estimator can be shown to have the desirable properties that
it is consistent, unbiased, and efficient. Even if we have already talked about unbiasedness and efficiency,
now let's go into more detail.
Consistency
The least squares estimators α^ and β^ are consistent. One way to state this algebraically for β^ (with the
obvious modifications made for α^) is:
Where β^is the estimate, β is the estimator, and δ is the value of the distance from β^ and its true value.
This is a technical way of stating that the probability (Pr) thar β^ is more than dome arbitrary fixed distance
δ away from its true value tends to zero as the sample size tend to infinity, for all positive values of δ. Thus,
β is the probability limit of β^.
Unbiasedness
Thus, on average, the estimated values for the coefficients will be equal to their true values. That is, there is
no systematic overestimation or underestimation of the true coefficients.
An unbiased estimator will also be consistent if its variance falls as the sample size increases.
Efficiency
An estimator β^ of a parameter β is said to be efficient if no other estimator has a smaller variance. Broadly,
if the estimator is efficient, it will be minimising the probability that it is a long way off from the true value
of β or, in sample terms, the variation in the parameter estimate from one sample within the population to
another would be minimised.
This assumption that is made of the CLRM’s disturbance terms is that the covariance between the error
terms over time is zero. In other words, it is assumed that the errors are uncorrelated with one another. If
the errors are not uncorrelated with one another, it would be stated that they are “autocorrelated” or that
they are “serially correlated”, that is a problem because is a violation of the third assumption. A test of this
assumption is therefore required. Again, the population disturbances cannot be observed, so tests for
autocorrelation are conducted on the residuals,u^ . Before one can proceed to see how formal tests for
autocorrelation are formulated, the concept of the lagged value of a variable need to be defined.
In order to test autocorrelation, it is necessary to investigate whether any relationship exist between the
current value of u^ , u^ t , and any of its previous values, u^ t-1,u^ t-2, … The first step is to consider possible
relationships between the current residual and the immediately previous one, u^ t-1, via a graphical
exploration. Thus u^ t is plotted against u^ t-1, and u^ t is plotted over time. Some stereotypical patterns that may
be found in the residuals are discussed below.
The figures above show positive autocorrelation in the residuals, which is indicated by a cyclical residual
plot over time. This case is known as positive autocorrelation since on average if the residual at time t-1 is
positive, the residual at time t is likely to be also positive; similarly, if the residual at t-1 is negative, the
residual at t is also likely to be negative. The figure at left, shows that most of the dots representing
observations are in the first and third quadrants; the figure at right shows that a positively autocorrelated
series of residuals will not cross the time-axis very frequently.
The figures above show negative autocorrelation indicated by an alternating pattern in the residuals. This
case is known as negative autocorrelation since on average if the residual at time t-1 is positive, the residual
at time t is likely to be negative; similarly, if the residual at t-1 is negative, the residual at t is likely to be
positive. The figure at left shows that most of the dots are in the second and fourth quadrants; the figure at
right shows that a negatively autocorrelated series of residuals will cross the time-axis more frequently than
if they were distributed randomly.
Finally, the figures above show no pattern in the residuals at all: this is what is desirable to see. In the plot of
u^ t against u^ t −1(figure at left), the points are randomly spread across all four quadrants, and the time-series
plot of the residuals (figure at right) does not cross the x-asis either too frequently or too little.
One the simplest test we can do to detecting autocorrelation is the Durbin-Watson test.
The Durbin-Watson is a test for first order autocorrelation -i.e., it assumes that the relationship is between
an error and the previous one: ut=put-1+vt where vt ~ N (0, σv2).
The DW test statistic actually tests H0: p=0 and H1: p≠0
Thus, under the null hypothesis, the errors at time t − 1 and t are independent of one another, and if this
null were rejected, it would be concluded that there was evidence of a relationship between successive
residuals.
It is also possible to express the DW statistic as an approximate function of the estimated value of ρ:
DW ≈ 2(1-p^) where p^ is the estimated correlated coefficient that would have been obtained from an
estimation of equation: ut=put-1+vt (previously seen).
Since p^ is a correlation, it implies that -1≤p^≤1. That is, p^ is bounded to lie between −1 and +1.
Substituting in these limits for p^ to calculate DW from equation DW ≈ 2(1-p^) would give the
corresponding limits for DW as 0 ≤ DW ≤ 4.
Consider now the implication of DW taking one of three important values (0, 2, and 4):
- p^=0, DW = 2 This is the case where there is no autocorrelation in the residuals. So roughly
speaking, the null hypothesis would not be rejected if DW is near 2 → i.e., there is little evidence of
autocorrelation.
- P^=1, DW = 0 This corresponds to the case where there is perfect positive autocorrelation in the
residuals.
- P^=-1, DW = 4 This corresponds to the case where there is perfect negative autocorrelation in the
residuals.
The DW test does not follow a standard statistical distribution such as a t, F, or χ2. DW has 2 critical values:
an upper critical value (dU) and a lower critical value (dL ), and there is also an intermediate region where
the null hypothesis of no autocorrelation can neither be rejected nor not rejected! The rejection, non-
rejection and inconclusive regions are shown on the number line in the figure below:
So, to reiterate, the null hypothesis is rejected and the existence of positive autocorrelation presumed if DW
is less than the lower critical value; the null hypothesis is rejected and the existence of negative
autocorrelation presumed if DW is greater than 4 minus the lower critical value; the null hypothesis is not
rejected and no significant residual autocorrelation is presumed if DW is between the upper and 4 minus
the upper limits.
4) What are the differences between autoregressive and moving average models? Please include
equations to justify your answer.
An autoregressive model is one where the current value of a variable, y, depends upon only the variable
took in previous periods plus an error term. An autoregressive model of order p, denoted as AR(p), can be
expressed as:
yt=μ+ф1yt-1+ф2yt-2+…+фpyt-p+ ut, where ut is a white noise disturbance term, or, more compactly (and using a
lag operator) as:
P
y t =μ+ ∑ ф i Li y t + ut
t=1
Stationarity is a desirable property of an estimated AR model, for several reasons. One important reason is
that a model whose coefficients are non-stationary will exhibit the unfortunate property that previous
values of the error term will have a non-declining effect on the current value of yt as time progresses. This is
arguably counter-intuitive and empirically implausible in many cases.
The condition for testing for the stationarity of a general AR (p) model is that the roots of the “characteristic
equation”: 1- ф1z- ф2z2-…- фpzp=0 all lie outside the unit circle.
A moving average model is one of the simplest models we can have. Let ut (t=1,2, 3…) be a white noises
process with E(μt) =0 and var(ut) = σ2. Then:
Yt= μ+μt+θ1ut-1+ θ2ut-2+…+ θqut-q is a qth order Moving average model, denoted MA(q). This can be expressed
using sigma notation as:
q
γ t =μ+ ∑ θi ut −i +ut
i=1
In fact, a Moving average model is simply a linear combination of white noise processes, so that yt depends
on the current and previous values of a white noise disturbance term. Considering the previous equation
and adding a lag operator notation we have to written:
q
γ t =μ+ ∑ θi Li ut + ut
i=1
A white noise process is one with no discernible structure. A definition of a white noise process is:
E(yt)= μ
Var(yt)= σ2
2
γ t =Σ σ0 if t=r
−r
otherwise
Thus, a white noise process has constant mean and variance, and zero autocovariance, except a lag zero.
The principal difference between the two models is that: in the moving average model, the yt depends on
lagged errors, while, in the autoregressive model, the yt depends not only on the errors, but also from the
previous values assumed by yt.
AR:
P
y t =μ+ ∑ ф i Li y t + ut
t=1
MA:
q
γ t =μ+ ∑ θi ut −i +ut
i=1
5) Modelling and forecasting stock market volatility has been the subject of vast empirical and
theoretical investigations.
The reasons for the many researches and theories regarding volatility is because it is a very important tool
in finance. In fact, volatility, as measured by the standard deviation or variance of returns, is often used as a
crude measure of the total risk of financial assets. Many value-at-risk models for measuring market risk
require the estimation or forecast of a volatility parameter. The volatility of stock market prices also enters
directly into the Black–Scholes formula for deriving the prices of traded options.
-Historical volatility, that involves in calculating the variance (or standard deviation) of returns in the usual
way over dome historical period, and this then becomes the volatility forecast for all future periods. In fact,
historical average variance was traditionally used as the volatility input to options pricing models.
-One particular non-linear model in widespread usage in finance is known as an ‘ARCH’ model. The arch
model is a non-linear model that tries to explain volatility in financial markets. It is used when the variance
of the errors is not constant (heteroscedasticity). Arch model equation is: σ2t = α0+α1u2t-1.
The above model is known as an ARCH (1), since the conditional variance depends on only one lagged
squared error. To see why this class of models is useful we have to recall the second assumption of CLRM
that poses the hypothesis of homoscedasticity. In fact, if the variance of the errors is constant is known as
homoscedasticity; if the variance of the errors isn’t constant is known as heteroscedasticity. If the errors are
heteroscedastic, but assumed homoscedastic, an implication would be that standard error estimates could
be wrong. It is unlikely in the context of financial time series that the variance of the errors will be constant
over time, and hence it makes sense to consider a model that does not assume that the variance is
constant, and which describes how the variance of the errors evolves.
- The GARCH model allows the conditional variance to be dependent upon previous own lags, so that the
conditional variance equation in the simplest case is now, that is GARCH model (1,1):
σ2t is known as the conditional variance since it is a one-period ahead estimate for the variance calculated
based on any past information thought relevant.
The most popular non-linear financial models are the ARCH or GARCH models used for modelling and
forecasting volatility, and switching models, which allow the behaviour of a series to follow different
processes at different points in time.
One particular non-linear model in widespread usage in finance is known as an ‘ARCH’ model. The arch
model is a non-linear model that tries to explain volatility in financial markets. It is used when the variance
of the errors is not constant (heteroscedasticity). The formula is: σ2t = α0+α1u2t-1
The above model is known as an ARCH (1), since the conditional variance depends on only one lagged
squared error. To see why this class of models is useful we have to recall the second assumption of CLRM
that poses the hypothesis of homoscedasticity. In fact, if the variance of the errors is constant is known as
homoscedasticity; if the variance of the errors isn’t constant is known as heteroscedasticity. If the errors are
heteroscedastic, but assumed homoscedastic, an implication would be that standard error estimates could
be wrong. It is unlikely in the context of financial time series that the variance of the errors will be constant
over time, and hence it makes sense to consider a model that does not assume that the variance is
constant, and which describes how the variance of the errors evolves.
The GARCH model allows the conditional variance to be dependent upon previous own lags, so that the
conditional variance equation in the simplest case is now, that is GARCH model (1,1): σ2t = α0+α1u2t-1 + β σ2t-1
σ2t is known as the conditional variance since it is a one-period ahead estimate for the variance calculated
based on any past information thought relevant.
Using the GARCH model it is possible to interpret the current fitted variance, ht, as a weighted function of a
long-term average value (dependent on α0), information about volatility during the previous period (α1u2t-1)
and the fitted variance from the model during the previous period (βσ2t-1).
More generally speaking, a GARCH model will be sufficient to capture the volatility clustering in the data.
GARCH model is better than the ARCH model, because there is less likely to breech non-negativity
constraints, is more parsimonious, and avoids overfitting.
7) Simultaneous equation models: discuss the difference between the structural form and reduced
form.
To introduce the discussion of structural and reduced form we have to take as reference one of the most
common structural models, that is given by the equation: y= Xβ+u.
We may say that all the variables contained in the X matrix are assumed to be exogenous – that is, their
values are determined outside that equation, y, on the other hand, is an endogenous variable. This is a
rather simplistic working definition of exogeneity, and, in fact, we could have various alternatives.
An example from economics to illustrate the demand and supply of a good (without explaining in detail all
the variables):
Qdt= α+βPt+ySt+ut
Qst= λ+ µPt+kTt+vt
Qdt=Qst
Where:
St price of a substitute.
Now, assuming that the market always clears, that is, that the market is always in equilibrium, and dropping
the time subscripts for simplicity, the previous equations can be written as:
Q= α+βP+yS+u
Q= λ+µP+kT+v
Those two equations together comprise a simultaneous structural form of the model, or a set of structural
equations. These are the equations incorporating the variables that economic or financial theory suggests
should be related to one another in a relationship of this form. The point is that price and quantity are
determined simultaneously (price affects quantity and quantity affects price).
A set of reduced form equations corresponding to previous equations can be obtained by solving them for P
and for Q (separately). There will be a reduced form equation for each endogenous variable in the system.
So, the reduced form equations is given by (although they should be rearranged):
Q α yS u Q λ kT v
Solving for P + + − = − − −
β β β β µ µ µ µ
An autoregressive model is one where the current value of a variable, y, depends upon only the variable
took in previous periods plus an error term. An autoregressive model of order p, denoted as AR(p), can be
expressed as:
yt=μ+ф1yt-1+ф2yt-2+…+фpyt-p+ ut, where μt is a white noise disturbance term, or, more compactly as:
P
y t =μ+ ∑ ф i Li y t + ut
t=1
A moving average model is one of the simplest models we can have. Let μt (t=1,2, 3…) be a white noises
process with E(μt) =0 and var(μt) = σ2. Then:
Yt= μ+μt+θ1μt-1+ θ2μt-2+…+ θqμt-q is a qth order Moving average model, denoted MA(q). This can be expressed
using sigma notation as:
q
γ t =μ+ ∑ θi ut −i +ut
i=1
In fact, a Moving average model is simply a linear combination of white noise processes, so that yt depends
on the current and previous values of a white noise disturbance term. Considering the previous equation
and adding a lag operator notation we have to written:
q
γ t =μ+ ∑ θi Li ut + ut
i=1
A white noise process is one with no discernible structure. A definition of a white noise process is:
E(yt)= μ
Var(yt)= σ2
2
γ t =Σ σ0 if t=r
−r
otherwise
Thus, a white noise process has constant mean and variance, and zero autocovariance, except a lag zero.
By combining AR(p) and MA(q) models an ARMA (p, q) model is obtained. The model could be written as:
Ф(L)yt=μ+θ(L)μt.
The characteristic of an ARMA process will be a combination of those from the AR and MA parts. Note that
the pacf (that measures the correlation between an observation k periods ago and the current observation,
after controlling for observations at intermediate lags, i.e., all lags < k) is particularly useful in this context.
The acf (that reveals how the correlation between any two values of the signal changes as their separation
changes) alone can distinguish between a pure autoregressive and a pure moving average process.
However, an ARMA process will have a geometrically declining acf (autocorrelation function), as will a pure
AR process. So, the pacf is useful for distinguishing between an AR(p) process and an ARMA (p, q) process –
the former will have a geometrically declining autocorrelation function, but a partial autocorrelation
function which cuts off to zero after p lags, while the latter will have both autocorrelation and partial
autocorrelation functions which decline geometrically.
Box and Jenkins (1976) were the first to approach the task of estimating an ARMA model in a systematic
manner. Their approach was a practical and pragmatic one, involving three steps: (1) Identification (2)
Estimation (3) Diagnostic checking.
Step 1
This involves determining the order of the model required to capture the dynamic features of the data.
Graphical procedures are used (plotting the data over time and plotting the acf and pacf) to determine the
most appropriate specification.
Step 2
This involves estimation of the parameters of the model specified in step 1. This can be done using least
squares or another technique, known as maximum likelihood, depending on the model.
Step 3
This involves model checking – i.e., determining whether the model specified and estimated is adequate.
Box and Jenkins suggest two methods: overfitting and residual diagnostics. Overfitting involves deliberately
fitting a larger model than that required to capture the dynamics of the data as identified in stage 1. If the
model specified at step 1 is adequate, any extra terms added to the ARMA model would be insignificant.
Residual diagnostics imply checking the residuals for evidence of linear dependence which, if present,
would suggest that the model originally specified was inadequate to capture the features of the data. The
acf, pacf or Ljung–Box tests could be used.
It is usually the objective to form a parsimonious model, which is one that describes all of the features of
data of interest using as few parameters (i.e., as simple a model) as possible. A parsimonious model is
desirable because:
-The residual sum of squares is proportional to the number of degrees of freedom. A model which contains
irrelevant lags of the variable or of the error term (and therefore unnecessary parameters) will usually lead
to increased coefficient standard errors, implying that it will be more difficult to find significant relationships
in the data.
-Models that are profligate might be inclined to fit to data specific features, which would not be replicated
out-of-sample. This means that the models may appear to fit the data very well, with perhaps a high value
of R 2 (useful to know the goodness of the model, but would give very inaccurate forecasts.
9) How do we measure multicollinearity and what are the solutions to the problem of multicollinearity
Normally, however, a small degree of correlation is present, but it will not cause an excessive loss of
accuracy. There is, however, a problem when variables are highly correlated known as multicollinearity, that
is a problem because violated the fourth assumption of CLRM.
Perfect multicollinearity occurs when there is an exact relationship between two or more variables. In this
case, it is not possible to estimate all of the coefficients in the model.
Near multicollinearity occurs when here is a non-negligible, but not perfect, relationship between two or
more explanatory variables. Note that a high correlation between the dependent variable and one of the
independent variables is not multicollinearity.
Testing for multicollinearity is particularly complex. Two of the simplest methods are:
-Examining the matrix of correlations between individual variables if multicollinearity is suspected, the most
likely culprit likely would be a high correlation between two variables.
-Calculating the variance inflation factors (VIF), which provide an estimate of to what extent the variance of
a parameter estimates increases because the explanatory variables are correlated.
1) R2 (measure the goodness of the model) will be high but the individual coefficients will have high
standard errors, so that the regression ‘looks good’ as a whole, but the individual variables are not
significant.
2) the regression becomes very sensitive to small changes in the specification.
3) will thus make confidence intervals for the parameters very wide, and significance tests might
therefore give inappropriate conclusions, and so make it difficult to draw sharp inferences.
1) Ignore it, in fact, the presence of near multicollinearity does not affect the BLUE properties of the
OLS estimator – i.e., it will still be consistent, unbiased and efficient.
2) Drop one of the collinear variables, so that the problem disappears.
3) Transform the highly correlated variables into a ratio and include only the ratio and not the
individual variables in the regression.
4) Near multicollinearity is more a problem with the data than with the model, so that there is
insufficient information in the sample to obtain estimates for all of the coefficients.
Forecasting simply means an attempt to determine the values that a series is likely to take. Of course,
forecasts might also usefully be made in a cross-sectional environment. Although the discussion below
refers to time-series data, some of the arguments will carry over to the cross-sectional context. Determining
the forecasting accuracy of a model is an important test of its adequacy.
Forecasts are made essentially because they are useful! Financial decisions often involve a long-term
commitment of resources, the returns to which will depend upon what happens in the future.
-Econometric (structural) forecasting relates a dependent variable to one or more independent variables.
-Time series forecasting involves trying to forecast the future values of a series given its previous values
and/or previous values of an error term.
In-sample forecasts are those generated for the same set of data that was used to estimate the model’s
parameters. One would expect the ‘forecasts’ of a model to be relatively good in-sample, for this reason.
Therefore, a sensible approach to model evaluation through an examination of forecast accuracy is not to
use all of the observations in estimating the model parameters, but rather to hold some observations back.
The latter sample, sometimes known as a holdout sample, would be used to construct out- of sample
forecasts.
To better understand this difference we can give an example: Suppose we have the monthly returns of the
FTSE for 120 months (1990-1999): Would it be possible to use all of them to build the model (and generate
only in-sample forecasts), or would it be possible to withhold some observations.
As we can see from the graph above, in this case would be to use data from 1990M1 until 1998M12 to
estimate the model parameters, and then the observations for 1999 would be forecast from the estimated
parameters. Of course, where each of the in-sample and out-of-sample periods should start and finish is
somewhat arbitrary and at the discretion of the researcher.
11) What is the heteroscedasticity? Discuss whether that is an issue and how to solve it.
In fact, if the errors have a constant variance we have the homoscedasticity, but, if the errors do not have a
constant variance we have the heteroscedasticity, that is a problem because is a violation of the second
assumption of the CLRM.
1) Graphic methods
2) Formal tests
Unfortunately, one rarely knows the cause or the form of the heteroscedasticity, so that a plot is likely to
reveal nothing.
For this reason, the best method for detecting heteroscedasticity is through the tests.
Fortunately, there are a number of formal statistical tests for heteroscedasticity, and one of the simplest
such methods is the Goldfeld– Quandt test. Their approach is based on splitting the total sample of length T
into two sub-samples of length T1 and T2. The regression model is estimated on each sub-sample and the
two residual variances are calculated. The null hypothesis is that the variances of the disturbances are
equal, H0: σ2= σ2
The test statistic is: GQ= S12/S22, and is distributed as an F (T1-k, T2-k).
If the errors are heteroscedastic, but this fact is ignored and the researcher proceeds with estimation and
inference, the OLS estimators will still give unbiased (and also consistent) coefficient estimates, but they are
no longer best linear unbiased estimators (BLUE) – that is, they no longer have the minimum variance
among the class of unbiased estimators. This because the error’s variance does not proof that the OLS
estimator is consistent and unbiased, but σ2 does appear in the formulae for the coefficient variances.
If the form (i.e., the cause) of the heteroscedasticity is known, then an alternative estimation method which
takes this into account can be used. One possibility is called generalised least squares (GLS). For example,
suppose that the error variance was related to zt by the expression: var (ut)= σ2z2t.
All that would be required to remove the heteroscedasticity would be to divide the regression equation
through by zt.
GLS can be viewed as OLS applied to transformed data that satisfy the OLS assumptions. GLS is also known
as weighted least squares (WLS), since under GLS a weighted sum of the squared residuals is minimised,
whereas under OLS it is an unweighted sum.
1.Transforming the variables into logs or reducing by some other measure of "size"
A VAR is a systems regression model (i.e., there is more than one dependent variable) that can be
considered a kind of hybrid between the univariate time series models and the simultaneous equations
models.
The simplest case that can be entertained is a bivariate VAR, where there are only two variables, y1t and
y2t, each of whose current values depend on different combinations of the previous k values of both
variables, and error terms:
Y1t= β10+β11y1t-1+…+β1ky1t-k+α11y2t-1+…+α1ky2t-k+u1t
Y2t= β20+β21y2t-1+…+β2ky2t-k+α21y1t-1+…+α2ky1t-k+u2t
Although, for simplicity, we usually assume that E (u1t u2t) = 0 so that the disturbances are uncorrelated
across equations, it is common and more realistic to allow them to be contemporaneously correlated, so
Cov (u1t u2t = σ12 ).
Another useful facet of VAR models is the compactness with which the notation can be expressed. For
example, consider the case from above where k = 1, so that each variable depends only upon the
immediately previous values of y1t and y2t , plus an error term. This could be written as:
Y1t= β10+β11y1t-1+α11y2t-1+u1t
Y2t= β20+β21y2t-1+α21y1t-1+u2t
Or:
( )( )(
y1 t
y2 t
β β α 11
= 10 + 11
β 20 α 21 β21 )( ) ( )
y1 t −1 u1 t
+
y2 t −1 u2 t
y t =β 0+ β1 yt −1 +ut
g x 1 g x 1 g x gg x 1 g x 1
In the equation, there are g = 2 variables in the system. Extending the model to the case where there are k
lags of each variable in each equation is also easily accomplished using this notation:
g x 1 g x 1 g x gg x 1 g x g g x 1 gxggx1 gx1
VAR models have several advantages compared with univariate time series models or simultaneous
equations structural models:
-The researcher does not need to specify which variables are endogenous or exogenous – all are
endogenous.
-VARs allow the value of a variable to depend on more than just its own lags or combinations of white noise
terms, so VARs are more flexible than univariate AR models.
-Provided that there are no contemporaneous terms on the RHS of the equations and that the disturbances
are uncorrelated across equations, it is possible to simply use OLS separately on each equation.
-The forecasts generated by VARs are often better than ‘traditional structural’ models.
-VARs are a-theoretical, since they use little theoretical information about the relationships between the
variables to guide the specification of the model, an upshot of this is that VARs are less amenable to
theoretical analysis and therefore to policy prescriptions. It is also often not clear how the VAR coefficient
estimates should be interpreted.
-There may happen to be too many parameters to estimate, with the risk that, for small samples, there may
be large standard errors and thus wide confidence intervals for the model coefficients.
-To examine the statistical significance of the coefficients, then it is essential that all of the components in
the VAR are stationary.
For choosing the optimal lag length for a VAR we may use 2 approaches: cross-equation restrictions and
information criteria.
1) Cross-equation restrictions:
It is worth noting here that in the spirit of VAR, the models should be as unrestricted as possible. A VAR with
different lag lengths for each equation could be viewed as a restricted VAR.
A possible approach would be to specify the same number of lags in each equation and to determine the
model order. This can be done using a likelihood ratio test (a method that works by finding the most likely
values of the parameters given the actual data). A disadvantage of this method is that the test is
cumbersome and requires normality assumption for the disturbances.
2) Information criteria:
An alternative approach to selecting the appropriate VAR lag length would be to use an information
criterion. Information criteria require no such normality assumptions concerning the distributions of the
errors. Instead, the criteria trade off a fall in the RSS of each equation as more lags are added, with an
increase in the value of the
penalty term. The univariate criteria could be applied separately to each equation but, again, it is usually
deemed preferable to require the number of lags to be the same for each equation. This requires the use of
multivariate versions of the information criteria, which can be defined as:
MAIC=ln |^Σ|+2 k / T
'
'
k
MSBIC=ln |^Σ|+¿ ln (T )¿
T
'
2k
MHQIC =ln |^Σ|+¿ ln (ln ( T ))¿
T