0% found this document useful (0 votes)

16 views16 pages

Econometria 2

The document discusses various statistical concepts including the central limit theorem, classical linear regression model assumptions, autocorrelation, and the Durbin-Watson statistic. It also covers autoregressive and moving average models, GARCH models, multicollinearity, heteroskedasticity, and forecasting techniques. Each topic is explained with relevant equations and properties to provide a comprehensive understanding of statistical modeling and analysis.

Uploaded by

luciaclobaton

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views16 pages

Econometria 2

Uploaded by

luciaclobaton

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

1) What is the central limit theorem and why it is important in statistics?

2) Discuss the assumptions underlying the classical linear regression model (CLRM) and the properties
of the OLS estimators.
3) What is meant for “autocorrelation”? Discuss the Durbin-Watson statistic.
4) What are the differences between autoregressive and moving average models? Please include
equations to justify your answer.
5) Modelling and forecasting stock market volatility has been the subject of vast empirical and
theoretical investigations.
6) Discuss the properties of Garch models.
7) Simultaneous equation models: discuss the difference between the structural form and reduced
form.
8) Building ARMA models (the box Jenkins Approach). Please discuss.
9) How do we measure multicollinearity and what are the solutions to the problem of multicollinearity
10) What is the difference between in-sample and out-sample forecasting?
11) What is the heteroskedasticity? Discuss whether that is an issue and how to solve it.
12) Discuss the vector autoregressive model.

Answers:

1) What is the central limit theorem and why it is important in statistics?

If a random sample of size N: y1, y2, y3, …, yN is drawn from a population that is normally distributed with
mean μ and variance σ2, the sample mean, Y is also normally distributed with mean μ and variance o²/N. In
fact, an important rule in statistics known as the central limit theorem states that the sampling distribution
of the mean of any random sample of observations will tend towards the normal distribution with mean
equal to the population mean, μ, as the sample size tends to infinity. This theorem is a very powerful result
because it states that the sample mean, Y will follow a normal distribution even if the original observations
(Y₁, Y2... YN) did not. This means that we can use the normal distribution as a kind of benchmark when
testing hypotheses.

2) Discuss the assumptions underlying the classical linear regression model (CLRM) and the properties
of the OLS estimators.

Regression analysis is a very important tool because the regression is concerned with describing and
evaluating the relationship between a given variable (y, the dependent variable) and one or more other
variables (x’s, the independent variables).

In fact, one of the goals of econometricians is to understand what kind of relationship there is between
variables. To describe it, we need a straight line that best fits the data. To found it, we may use the equation
of the line (y= α+β) and add a random disturbance term (that make the model more realistic). Now the
equation to considerer is yt=α+βx+ut. where α and β are the parameters of the model, ut is the random
disturbance term.

Α and β, the parameters of the model, are identified in such a way as to minimize the vertical distances
between the data points and the fitted lines. This process can be done ''by eye,'' but it runs the risk of being
tedious and inaccurate. For this reason, we can use the method known as ordinary least squares (OLS),
which is a technique that aims to find a function (represented by a curve) that is as close as possible to the
set of data being considered.
In order to use OLS, a model that is linear is required (the model must be linear in the parameters, α and β,
but not necessarily in the variables, x and y.

The model yt= α+β+ut together with the assumption listed below, is known as classical linear regression
model (CLRM). We observe data for xt, but since yt also depends on ut (the unobservable error terms) we
must be specific about how the ut are generated. We usually make the following set of assumptions about
the ut’ s:

1) E(ut) = 0  The errors have zero mean

2) Var(ut) = σ2<∞  The variance of the errors is constant and finite over all values of xt
3) Cov (ui, uj) = 0  The errors are linearly independent of one another
4) Cov (ut, xt) = 0  There is no relationship between the error and corresponding x variate
5) ut ~ N (0, σ2)  -i.e., that the ut is normally distributed.

As long as assumption 1 holds, assumption 4 can be equivalently written E (xt ut). Both formulations imply
that the regressor is orthogonal to (i.e. unrelated to) the error term.

An alternative assumption to 4, which is slightly stronger, is that the xt’s are non-stochastic or fixed in
repeated samples.

A fifth assumption is required to make valid inferences about the population parameters (the actual α and
β) from the sample parameters (α^ β^) estimated using a finite amount of data, namely that the
disturbance follow a normal distribution.

If assumption 1-4 hold, then the estimators α^ and β^ determined by OLS will have a number of desirable
properties and are known as best linear unbiased estimators (BLUE). This acronym means:

 Estimator  α^ and β^ are estimators of the true value of α and β

 Linear  α^ and β^ are linear estimators- that means that the formulae for α^ and β^ are linear
combinations of the random variables (in this case, y)
 Unbiased  on average, the actual values of α^ and β^ will be equal to their true values.
 Best  means that the OLS estimator β^ has minimum variance among the class of linear unbiased
estimators.

Under assumptions 1-4 listed above, the OLS estimator can be shown to have the desirable properties that
it is consistent, unbiased, and efficient. Even if we have already talked about unbiasedness and efficiency,
now let's go into more detail.

Consistency

The least squares estimators α^ and β^ are consistent. One way to state this algebraically for β^ (with the
obvious modifications made for α^) is:

lim ρ r [|^β−β|>δ ]=0 ∀ δ >0

T→∞

Where β^is the estimate, β is the estimator, and δ is the value of the distance from β^ and its true value.

This is a technical way of stating that the probability (Pr) thar β^ is more than dome arbitrary fixed distance
δ away from its true value tends to zero as the sample size tend to infinity, for all positive values of δ. Thus,
β is the probability limit of β^.

Unbiasedness

The least squares estimate of α^ and β^ are unbiased. That is:

E(α^) = α and E(β^) = β

Thus, on average, the estimated values for the coefficients will be equal to their true values. That is, there is
no systematic overestimation or underestimation of the true coefficients.

An unbiased estimator will also be consistent if its variance falls as the sample size increases.

Efficiency

An estimator β^ of a parameter β is said to be efficient if no other estimator has a smaller variance. Broadly,
if the estimator is efficient, it will be minimising the probability that it is a long way off from the true value
of β or, in sample terms, the variation in the parameter estimate from one sample within the population to
another would be minimised.

3) What is meant for “autocorrelation”? Discuss the Durbin-Watson statistic.

First of all, we have to show the 5 assumptions of CLRM:

1) E(ut) = 0  The errors have zero mean

Now, we analyse the third assumption: Cov (ui, uj) = 0  No autocorrelation.

This assumption that is made of the CLRM’s disturbance terms is that the covariance between the error
terms over time is zero. In other words, it is assumed that the errors are uncorrelated with one another. If
the errors are not uncorrelated with one another, it would be stated that they are “autocorrelated” or that
they are “serially correlated”, that is a problem because is a violation of the third assumption. A test of this
assumption is therefore required. Again, the population disturbances cannot be observed, so tests for
autocorrelation are conducted on the residuals,u^ . Before one can proceed to see how formal tests for
autocorrelation are formulated, the concept of the lagged value of a variable need to be defined.

In order to test autocorrelation, it is necessary to investigate whether any relationship exist between the
current value of u^ , u^ t , and any of its previous values, u^ t-1,u^ t-2, … The first step is to consider possible
relationships between the current residual and the immediately previous one, u^ t-1, via a graphical
exploration. Thus u^ t is plotted against u^ t-1, and u^ t is plotted over time. Some stereotypical patterns that may
be found in the residuals are discussed below.
The figures above show positive autocorrelation in the residuals, which is indicated by a cyclical residual
plot over time. This case is known as positive autocorrelation since on average if the residual at time t-1 is
positive, the residual at time t is likely to be also positive; similarly, if the residual at t-1 is negative, the
residual at t is also likely to be negative. The figure at left, shows that most of the dots representing
observations are in the first and third quadrants; the figure at right shows that a positively autocorrelated
series of residuals will not cross the time-axis very frequently.

The figures above show negative autocorrelation indicated by an alternating pattern in the residuals. This
case is known as negative autocorrelation since on average if the residual at time t-1 is positive, the residual
at time t is likely to be negative; similarly, if the residual at t-1 is negative, the residual at t is likely to be
positive. The figure at left shows that most of the dots are in the second and fourth quadrants; the figure at
right shows that a negatively autocorrelated series of residuals will cross the time-axis more frequently than
if they were distributed randomly.

Finally, the figures above show no pattern in the residuals at all: this is what is desirable to see. In the plot of
u^ t against u^ t −1(figure at left), the points are randomly spread across all four quadrants, and the time-series
plot of the residuals (figure at right) does not cross the x-asis either too frequently or too little.

One the simplest test we can do to detecting autocorrelation is the Durbin-Watson test.

The Durbin-Watson is a test for first order autocorrelation -i.e., it assumes that the relationship is between
an error and the previous one: ut=put-1+vt where vt ~ N (0, σv2).

The DW test statistic actually tests H0: p=0 and H1: p≠0
Thus, under the null hypothesis, the errors at time t − 1 and t are independent of one another, and if this
null were rejected, it would be concluded that there was evidence of a relationship between successive
residuals.

The test statistic is calculated by:

T
DW =∑ ¿ ¿ ¿ ¿
t =2

It is also possible to express the DW statistic as an approximate function of the estimated value of ρ:

DW ≈ 2(1-p^) where p^ is the estimated correlated coefficient that would have been obtained from an
estimation of equation: ut=put-1+vt (previously seen).

Since p^ is a correlation, it implies that -1≤p^≤1. That is, p^ is bounded to lie between −1 and +1.
Substituting in these limits for p^ to calculate DW from equation DW ≈ 2(1-p^) would give the
corresponding limits for DW as 0 ≤ DW ≤ 4.

Consider now the implication of DW taking one of three important values (0, 2, and 4):

- p^=0, DW = 2 This is the case where there is no autocorrelation in the residuals. So roughly
speaking, the null hypothesis would not be rejected if DW is near 2 → i.e., there is little evidence of
autocorrelation.
- P^=1, DW = 0 This corresponds to the case where there is perfect positive autocorrelation in the
residuals.
- P^=-1, DW = 4 This corresponds to the case where there is perfect negative autocorrelation in the
residuals.

The DW test does not follow a standard statistical distribution such as a t, F, or χ2. DW has 2 critical values:
an upper critical value (dU) and a lower critical value (dL ), and there is also an intermediate region where
the null hypothesis of no autocorrelation can neither be rejected nor not rejected! The rejection, non-
rejection and inconclusive regions are shown on the number line in the figure below:

So, to reiterate, the null hypothesis is rejected and the existence of positive autocorrelation presumed if DW
is less than the lower critical value; the null hypothesis is rejected and the existence of negative
autocorrelation presumed if DW is greater than 4 minus the lower critical value; the null hypothesis is not
rejected and no significant residual autocorrelation is presumed if DW is between the upper and 4 minus
the upper limits.

4) What are the differences between autoregressive and moving average models? Please include
equations to justify your answer.

An autoregressive model is one where the current value of a variable, y, depends upon only the variable
took in previous periods plus an error term. An autoregressive model of order p, denoted as AR(p), can be
expressed as:
yt=μ+ф1yt-1+ф2yt-2+…+фpyt-p+ ut, where ut is a white noise disturbance term, or, more compactly (and using a
lag operator) as:
P
y t =μ+ ∑ ф i Li y t + ut
t=1

Stationarity is a desirable property of an estimated AR model, for several reasons. One important reason is
that a model whose coefficients are non-stationary will exhibit the unfortunate property that previous
values of the error term will have a non-declining effect on the current value of yt as time progresses. This is
arguably counter-intuitive and empirically implausible in many cases.

A model is stationary if it possible to write: yt= ф(L)-1ut.

The condition for testing for the stationarity of a general AR (p) model is that the roots of the “characteristic
equation”: 1- ф1z- ф2z2-…- фpzp=0 all lie outside the unit circle.

A moving average model is one of the simplest models we can have. Let ut (t=1,2, 3…) be a white noises
process with E(μt) =0 and var(ut) = σ2. Then:

Yt= μ+μt+θ1ut-1+ θ2ut-2+…+ θqut-q is a qth order Moving average model, denoted MA(q). This can be expressed
using sigma notation as:
q
γ t =μ+ ∑ θi ut −i +ut
i=1

In fact, a Moving average model is simply a linear combination of white noise processes, so that yt depends
on the current and previous values of a white noise disturbance term. Considering the previous equation
and adding a lag operator notation we have to written:
q
γ t =μ+ ∑ θi Li ut + ut
i=1

A white noise process is one with no discernible structure. A definition of a white noise process is:

E(yt)= μ

Var(yt)= σ2
2

γ t =Σ σ0 if t=r
−r

otherwise

Thus, a white noise process has constant mean and variance, and zero autocovariance, except a lag zero.

The principal difference between the two models is that: in the moving average model, the yt depends on
lagged errors, while, in the autoregressive model, the yt depends not only on the errors, but also from the
previous values assumed by yt.

AR:
P
y t =μ+ ∑ ф i Li y t + ut
t=1

MA:
q
γ t =μ+ ∑ θi ut −i +ut
i=1

5) Modelling and forecasting stock market volatility has been the subject of vast empirical and
theoretical investigations.

The reasons for the many researches and theories regarding volatility is because it is a very important tool
in finance. In fact, volatility, as measured by the standard deviation or variance of returns, is often used as a
crude measure of the total risk of financial assets. Many value-at-risk models for measuring market risk
require the estimation or forecast of a volatility parameter. The volatility of stock market prices also enters
directly into the Black–Scholes formula for deriving the prices of traded options.

We will now examine several volatility models:

-Historical volatility, that involves in calculating the variance (or standard deviation) of returns in the usual
way over dome historical period, and this then becomes the volatility forecast for all future periods. In fact,
historical average variance was traditionally used as the volatility input to options pricing models.

-One particular non-linear model in widespread usage in finance is known as an ‘ARCH’ model. The arch
model is a non-linear model that tries to explain volatility in financial markets. It is used when the variance
of the errors is not constant (heteroscedasticity). Arch model equation is: σ2t = α0+α1u2t-1.

The above model is known as an ARCH (1), since the conditional variance depends on only one lagged
squared error. To see why this class of models is useful we have to recall the second assumption of CLRM
that poses the hypothesis of homoscedasticity. In fact, if the variance of the errors is constant is known as
homoscedasticity; if the variance of the errors isn’t constant is known as heteroscedasticity. If the errors are
heteroscedastic, but assumed homoscedastic, an implication would be that standard error estimates could
be wrong. It is unlikely in the context of financial time series that the variance of the errors will be constant
over time, and hence it makes sense to consider a model that does not assume that the variance is
constant, and which describes how the variance of the errors evolves.

- The GARCH model allows the conditional variance to be dependent upon previous own lags, so that the
conditional variance equation in the simplest case is now, that is GARCH model (1,1):

σ2t = α0+α1u2t-1 + β σ2t-1

σ2t is known as the conditional variance since it is a one-period ahead estimate for the variance calculated
based on any past information thought relevant.

6) Discuss the properties of Garch models.

The most popular non-linear financial models are the ARCH or GARCH models used for modelling and
forecasting volatility, and switching models, which allow the behaviour of a series to follow different
processes at different points in time.

One particular non-linear model in widespread usage in finance is known as an ‘ARCH’ model. The arch
model is a non-linear model that tries to explain volatility in financial markets. It is used when the variance
of the errors is not constant (heteroscedasticity). The formula is: σ2t = α0+α1u2t-1

The GARCH model allows the conditional variance to be dependent upon previous own lags, so that the
conditional variance equation in the simplest case is now, that is GARCH model (1,1): σ2t = α0+α1u2t-1 + β σ2t-1

σ2t is known as the conditional variance since it is a one-period ahead estimate for the variance calculated
based on any past information thought relevant.

Using the GARCH model it is possible to interpret the current fitted variance, ht, as a weighted function of a
long-term average value (dependent on α0), information about volatility during the previous period (α1u2t-1)
and the fitted variance from the model during the previous period (βσ2t-1).

More generally speaking, a GARCH model will be sufficient to capture the volatility clustering in the data.

GARCH model is better than the ARCH model, because there is less likely to breech non-negativity
constraints, is more parsimonious, and avoids overfitting.

7) Simultaneous equation models: discuss the difference between the structural form and reduced
form.

To introduce the discussion of structural and reduced form we have to take as reference one of the most
common structural models, that is given by the equation: y= Xβ+u.

We may say that all the variables contained in the X matrix are assumed to be exogenous – that is, their
values are determined outside that equation, y, on the other hand, is an endogenous variable. This is a
rather simplistic working definition of exogeneity, and, in fact, we could have various alternatives.

An example from economics to illustrate the demand and supply of a good (without explaining in detail all
the variables):

Qdt= α+βPt+ySt+ut

Qst= λ+ µPt+kTt+vt

Qdt=Qst

Where:

Qdt quantity of new houses demanded at time t.

Qst quantity of new houses supplied (built) at time t.

Pt (average) price of new houses prevailing at time t.

St  price of a substitute.

Tt some variable embodying the state of housebuilding technology.

ut and vt are the error terms.

Now, assuming that the market always clears, that is, that the market is always in equilibrium, and dropping
the time subscripts for simplicity, the previous equations can be written as:
Q= α+βP+yS+u

Q= λ+µP+kT+v

Those two equations together comprise a simultaneous structural form of the model, or a set of structural
equations. These are the equations incorporating the variables that economic or financial theory suggests
should be related to one another in a relationship of this form. The point is that price and quantity are
determined simultaneously (price affects quantity and quantity affects price).

P and Q are endogenous variables, while S and T are exogenous.

A set of reduced form equations corresponding to previous equations can be obtained by solving them for P
and for Q (separately). There will be a reduced form equation for each endogenous variable in the system.

So, the reduced form equations is given by (although they should be rearranged):

Solving for Q α+βP+yS+u= λ+µP+kT+v

Q α yS u Q λ kT v
Solving for P + + − = − − −
β β β β µ µ µ µ

8) Building ARMA models (the box Jenkins Approach). Please discuss.

yt=μ+ф1yt-1+ф2yt-2+…+фpyt-p+ ut, where μt is a white noise disturbance term, or, more compactly as:
P
y t =μ+ ∑ ф i Li y t + ut
t=1

A moving average model is one of the simplest models we can have. Let μt (t=1,2, 3…) be a white noises
process with E(μt) =0 and var(μt) = σ2. Then:

Yt= μ+μt+θ1μt-1+ θ2μt-2+…+ θqμt-q is a qth order Moving average model, denoted MA(q). This can be expressed
using sigma notation as:
q
γ t =μ+ ∑ θi ut −i +ut
i=1

A white noise process is one with no discernible structure. A definition of a white noise process is:

E(yt)= μ

Var(yt)= σ2
2

γ t =Σ σ0 if t=r
−r
otherwise

Thus, a white noise process has constant mean and variance, and zero autocovariance, except a lag zero.

By combining AR(p) and MA(q) models an ARMA (p, q) model is obtained. The model could be written as:

Ф(L)yt=μ+θ(L)μt.

The characteristic of an ARMA process will be a combination of those from the AR and MA parts. Note that
the pacf (that measures the correlation between an observation k periods ago and the current observation,
after controlling for observations at intermediate lags, i.e., all lags < k) is particularly useful in this context.
The acf (that reveals how the correlation between any two values of the signal changes as their separation
changes) alone can distinguish between a pure autoregressive and a pure moving average process.
However, an ARMA process will have a geometrically declining acf (autocorrelation function), as will a pure
AR process. So, the pacf is useful for distinguishing between an AR(p) process and an ARMA (p, q) process –
the former will have a geometrically declining autocorrelation function, but a partial autocorrelation
function which cuts off to zero after p lags, while the latter will have both autocorrelation and partial
autocorrelation functions which decline geometrically.

Box and Jenkins (1976) were the first to approach the task of estimating an ARMA model in a systematic
manner. Their approach was a practical and pragmatic one, involving three steps: (1) Identification (2)
Estimation (3) Diagnostic checking.

Step 1

This involves determining the order of the model required to capture the dynamic features of the data.
Graphical procedures are used (plotting the data over time and plotting the acf and pacf) to determine the
most appropriate specification.

Step 2

This involves estimation of the parameters of the model specified in step 1. This can be done using least
squares or another technique, known as maximum likelihood, depending on the model.

Step 3

This involves model checking – i.e., determining whether the model specified and estimated is adequate.
Box and Jenkins suggest two methods: overfitting and residual diagnostics. Overfitting involves deliberately
fitting a larger model than that required to capture the dynamics of the data as identified in stage 1. If the
model specified at step 1 is adequate, any extra terms added to the ARMA model would be insignificant.
Residual diagnostics imply checking the residuals for evidence of linear dependence which, if present,
would suggest that the model originally specified was inadequate to capture the features of the data. The
acf, pacf or Ljung–Box tests could be used.

It is usually the objective to form a parsimonious model, which is one that describes all of the features of
data of interest using as few parameters (i.e., as simple a model) as possible. A parsimonious model is
desirable because:

-The residual sum of squares is proportional to the number of degrees of freedom. A model which contains
irrelevant lags of the variable or of the error term (and therefore unnecessary parameters) will usually lead
to increased coefficient standard errors, implying that it will be more difficult to find significant relationships
in the data.
-Models that are profligate might be inclined to fit to data specific features, which would not be replicated
out-of-sample. This means that the models may appear to fit the data very well, with perhaps a high value
of R 2 (useful to know the goodness of the model, but would give very inaccurate forecasts.

9) How do we measure multicollinearity and what are the solutions to the problem of multicollinearity

First of all, we have to show the 5 assumptions of CLRM:

1) E(ut) = 0  The errors have zero mean

Now, we analyse the fourth assumption: Cov (ut, xt) = 0  no multicollinearity.

Normally, however, a small degree of correlation is present, but it will not cause an excessive loss of
accuracy. There is, however, a problem when variables are highly correlated known as multicollinearity, that
is a problem because violated the fourth assumption of CLRM.

There are 2 types of multicollinearity: perfect multicollinearity and near multicollinearity.

Perfect multicollinearity occurs when there is an exact relationship between two or more variables. In this
case, it is not possible to estimate all of the coefficients in the model.

Near multicollinearity occurs when here is a non-negligible, but not perfect, relationship between two or
more explanatory variables. Note that a high correlation between the dependent variable and one of the
independent variables is not multicollinearity.

Testing for multicollinearity is particularly complex. Two of the simplest methods are:

-Examining the matrix of correlations between individual variables if multicollinearity is suspected, the most
likely culprit likely would be a high correlation between two variables.

-Calculating the variance inflation factors (VIF), which provide an estimate of to what extent the variance of
a parameter estimates increases because the explanatory variables are correlated.

Whether multicollinearity is present but is ignored there are 3 problems:

1) R2 (measure the goodness of the model) will be high but the individual coefficients will have high
standard errors, so that the regression ‘looks good’ as a whole, but the individual variables are not
significant.
2) the regression becomes very sensitive to small changes in the specification.
3) will thus make confidence intervals for the parameters very wide, and significance tests might
therefore give inappropriate conclusions, and so make it difficult to draw sharp inferences.

There are several solutions of the problem:

1) Ignore it, in fact, the presence of near multicollinearity does not affect the BLUE properties of the
OLS estimator – i.e., it will still be consistent, unbiased and efficient.
2) Drop one of the collinear variables, so that the problem disappears.
3) Transform the highly correlated variables into a ratio and include only the ratio and not the
individual variables in the regression.
4) Near multicollinearity is more a problem with the data than with the model, so that there is
insufficient information in the sample to obtain estimates for all of the coefficients.

10) What is the difference between in-sample and out-sample forecasting?

Forecasting simply means an attempt to determine the values that a series is likely to take. Of course,
forecasts might also usefully be made in a cross-sectional environment. Although the discussion below
refers to time-series data, some of the arguments will carry over to the cross-sectional context. Determining
the forecasting accuracy of a model is an important test of its adequacy.

Forecasts are made essentially because they are useful! Financial decisions often involve a long-term
commitment of resources, the returns to which will depend upon what happens in the future.

It is useful to distinguish between two approaches to forecasting:

-Econometric (structural) forecasting relates a dependent variable to one or more independent variables.

-Time series forecasting involves trying to forecast the future values of a series given its previous values
and/or previous values of an error term.

In-sample forecasts are those generated for the same set of data that was used to estimate the model’s
parameters. One would expect the ‘forecasts’ of a model to be relatively good in-sample, for this reason.
Therefore, a sensible approach to model evaluation through an examination of forecast accuracy is not to
use all of the observations in estimating the model parameters, but rather to hold some observations back.
The latter sample, sometimes known as a holdout sample, would be used to construct out- of sample
forecasts.

To better understand this difference we can give an example: Suppose we have the monthly returns of the
FTSE for 120 months (1990-1999): Would it be possible to use all of them to build the model (and generate
only in-sample forecasts), or would it be possible to withhold some observations.

As we can see from the graph above, in this case would be to use data from 1990M1 until 1998M12 to
estimate the model parameters, and then the observations for 1999 would be forecast from the estimated
parameters. Of course, where each of the in-sample and out-of-sample periods should start and finish is
somewhat arbitrary and at the discretion of the researcher.

11) What is the heteroscedasticity? Discuss whether that is an issue and how to solve it.

First of all, we have to show the 5 assumptions of CLRM:

1) E(ut) = 0  The errors have zero mean

In fact, if the errors have a constant variance we have the homoscedasticity, but, if the errors do not have a
constant variance we have the heteroscedasticity, that is a problem because is a violation of the second
assumption of the CLRM.

There are two methods to detect the heteroscedasticity:

1) Graphic methods
2) Formal tests

The first method is visible through the graphic below:

Unfortunately, one rarely knows the cause or the form of the heteroscedasticity, so that a plot is likely to
reveal nothing.

For this reason, the best method for detecting heteroscedasticity is through the tests.

Fortunately, there are a number of formal statistical tests for heteroscedasticity, and one of the simplest
such methods is the Goldfeld– Quandt test. Their approach is based on splitting the total sample of length T
into two sub-samples of length T1 and T2. The regression model is estimated on each sub-sample and the
two residual variances are calculated. The null hypothesis is that the variances of the disturbances are
equal, H0: σ2= σ2

The test statistic is: GQ= S12/S22, and is distributed as an F (T1-k, T2-k).

A further popular test is White’s (1980) general test for heteroscedasticity.

This test has several phases:

1) Assume that the regression is: yt= β1+ β2x2t+ β3x3t+ut

2) Now run the auxiliary regression as: u^2t=α+α2x2t+α3x3t+α4x2t+α5x3t+ α6x2tx3t+vt
3) Obtain R2 (that measure the goodness of the model) from the previous regression and multiply it by
the number of observations, T. It can be shown that: TR2 ~ χ2 (m). where m is the number of
regressors in the auxiliary regression excluding the constant term.
4) If the χ2 test statistic from previous step is greater than the corresponding value from the statistical
table then reject the null hypothesis that the disturbance are homoscedastic.

If the errors are heteroscedastic, but this fact is ignored and the researcher proceeds with estimation and
inference, the OLS estimators will still give unbiased (and also consistent) coefficient estimates, but they are
no longer best linear unbiased estimators (BLUE) – that is, they no longer have the minimum variance
among the class of unbiased estimators. This because the error’s variance does not proof that the OLS
estimator is consistent and unbiased, but σ2 does appear in the formulae for the coefficient variances.

If the form (i.e., the cause) of the heteroscedasticity is known, then an alternative estimation method which
takes this into account can be used. One possibility is called generalised least squares (GLS). For example,
suppose that the error variance was related to zt by the expression: var (ut)= σ2z2t.

All that would be required to remove the heteroscedasticity would be to divide the regression equation
through by zt.

GLS can be viewed as OLS applied to transformed data that satisfy the OLS assumptions. GLS is also known
as weighted least squares (WLS), since under GLS a weighted sum of the squared residuals is minimised,
whereas under OLS it is an unweighted sum.

2. Use White's heteroscedasticity consistent standard error estimates.

12) Discuss the vector autoregressive model.

A VAR is a systems regression model (i.e., there is more than one dependent variable) that can be
considered a kind of hybrid between the univariate time series models and the simultaneous equations
models.

The simplest case that can be entertained is a bivariate VAR, where there are only two variables, y1t and
y2t, each of whose current values depend on different combinations of the previous k values of both
variables, and error terms:

Y1t= β10+β11y1t-1+…+β1ky1t-k+α11y2t-1+…+α1ky2t-k+u1t

Y2t= β20+β21y2t-1+…+β2ky2t-k+α21y1t-1+…+α2ky1t-k+u2t

Where uit is a white noise disturbance term with E(uit)=0, (i=1,2).

Although, for simplicity, we usually assume that E (u1t u2t) = 0 so that the disturbances are uncorrelated
across equations, it is common and more realistic to allow them to be contemporaneously correlated, so
Cov (u1t u2t = σ12 ).

Another useful facet of VAR models is the compactness with which the notation can be expressed. For
example, consider the case from above where k = 1, so that each variable depends only upon the
immediately previous values of y1t and y2t , plus an error term. This could be written as:

Y1t= β10+β11y1t-1+α11y2t-1+u1t

Y2t= β20+β21y2t-1+α21y1t-1+u2t

Or:
( )( )(
y1 t
y2 t
β β α 11
= 10 + 11
β 20 α 21 β21 )( ) ( )
y1 t −1 u1 t
+
y2 t −1 u2 t

Or even more compactly as:

y t =β 0+ β1 yt −1 +ut

g x 1 g x 1 g x gg x 1 g x 1

In the equation, there are g = 2 variables in the system. Extending the model to the case where there are k
lags of each variable in each equation is also easily accomplished using this notation:

y t =β 0+ β1 yt −1 + β 2 yt−2 +…+ β kyt−k +u t

g x 1 g x 1 g x gg x 1 g x g g x 1 gxggx1 gx1

VAR models have several advantages compared with univariate time series models or simultaneous
equations structural models:

-The researcher does not need to specify which variables are endogenous or exogenous – all are
endogenous.

-VARs allow the value of a variable to depend on more than just its own lags or combinations of white noise
terms, so VARs are more flexible than univariate AR models.

-Provided that there are no contemporaneous terms on the RHS of the equations and that the disturbances
are uncorrelated across equations, it is possible to simply use OLS separately on each equation.

-The forecasts generated by VARs are often better than ‘traditional structural’ models.

-VARs are a-theoretical, since they use little theoretical information about the relationships between the
variables to guide the specification of the model, an upshot of this is that VARs are less amenable to
theoretical analysis and therefore to policy prescriptions. It is also often not clear how the VAR coefficient
estimates should be interpreted.

-Choice of ideal lag length.

-There may happen to be too many parameters to estimate, with the risk that, for small samples, there may
be large standard errors and thus wide confidence intervals for the model coefficients.

-To examine the statistical significance of the coefficients, then it is essential that all of the components in
the VAR are stationary.

For choosing the optimal lag length for a VAR we may use 2 approaches: cross-equation restrictions and
information criteria.

1) Cross-equation restrictions:

It is worth noting here that in the spirit of VAR, the models should be as unrestricted as possible. A VAR with
different lag lengths for each equation could be viewed as a restricted VAR.

A possible approach would be to specify the same number of lags in each equation and to determine the
model order. This can be done using a likelihood ratio test (a method that works by finding the most likely
values of the parameters given the actual data). A disadvantage of this method is that the test is
cumbersome and requires normality assumption for the disturbances.

2) Information criteria:

An alternative approach to selecting the appropriate VAR lag length would be to use an information
criterion. Information criteria require no such normality assumptions concerning the distributions of the
errors. Instead, the criteria trade off a fall in the RSS of each equation as more lags are added, with an
increase in the value of the

penalty term. The univariate criteria could be applied separately to each equation but, again, it is usually
deemed preferable to require the number of lags to be the same for each equation. This requires the use of
multivariate versions of the information criteria, which can be defined as:

MAIC=ln |^Σ|+2 k / T
'

'
k
MSBIC=ln |^Σ|+¿ ln ⁡(T )¿
T
'
2k
MHQIC =ln |^Σ|+¿ ln ⁡(ln ( T ))¿
T

Peter H. Westfall, Andrea L. Arias - Understanding Regression Analysis - A Conditional Distribution Approach-Chapman and Hall - CRC (2020)
No ratings yet
Peter H. Westfall, Andrea L. Arias - Understanding Regression Analysis - A Conditional Distribution Approach-Chapman and Hall - CRC (2020)
515 pages
Chap 1-4 Reviewer Psych Stats
No ratings yet
Chap 1-4 Reviewer Psych Stats
3 pages
Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Set Domande Econometria 2
No ratings yet
Set Domande Econometria 2
19 pages
Week 2 - The Simple Linear Regression Model PDF
No ratings yet
Week 2 - The Simple Linear Regression Model PDF
47 pages
Chapter 2 The Classical Linear Regression Model (CLRM)
No ratings yet
Chapter 2 The Classical Linear Regression Model (CLRM)
20 pages
CLRM Assumptions
No ratings yet
CLRM Assumptions
20 pages
Theme 2 Ordinary Least Squares Regression
No ratings yet
Theme 2 Ordinary Least Squares Regression
10 pages
CLRM Assumptions
No ratings yet
CLRM Assumptions
24 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
Ch3_slides_Ed4_2024_20(1)
No ratings yet
Ch3_slides_Ed4_2024_20(1)
72 pages
CHAPTER 2
No ratings yet
CHAPTER 2
17 pages
Lecture I - Docx - 12
No ratings yet
Lecture I - Docx - 12
10 pages
HW-1
No ratings yet
HW-1
9 pages
1-Chap II Econometrics ABC DR Mitiku
No ratings yet
1-Chap II Econometrics ABC DR Mitiku
80 pages
Ch3 Slides Ed4 2024
No ratings yet
Ch3 Slides Ed4 2024
72 pages
Lecture # 2 (The Classical Linear Regression Model) PDF
No ratings yet
Lecture # 2 (The Classical Linear Regression Model) PDF
3 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
17 pages
Gujarati D, Porter D, 2008: Basic Econometrics 5Th Edition Summary of Chapter 3-5
No ratings yet
Gujarati D, Porter D, 2008: Basic Econometrics 5Th Edition Summary of Chapter 3-5
64 pages
EC226 - Econometrics (Revision Guide - Simple Linear Regression)
No ratings yet
EC226 - Econometrics (Revision Guide - Simple Linear Regression)
9 pages
CHP 3 Notes, Gujarati
No ratings yet
CHP 3 Notes, Gujarati
4 pages
Eco 3
No ratings yet
Eco 3
68 pages
Group 6 - FNC01 - K48 - HW1
No ratings yet
Group 6 - FNC01 - K48 - HW1
11 pages
Ecom 165 Notes
No ratings yet
Ecom 165 Notes
98 pages
Econometrics Theory Note
No ratings yet
Econometrics Theory Note
13 pages
Chapter 1
No ratings yet
Chapter 1
17 pages
Lecture 5
No ratings yet
Lecture 5
30 pages
Chapter 2 - Simple Linear Regression Function
100% (1)
Chapter 2 - Simple Linear Regression Function
49 pages
Chapter Two: Simple Linear Regression Models: Assumptions and Estimation
100% (3)
Chapter Two: Simple Linear Regression Models: Assumptions and Estimation
34 pages
Classical Linear Regression Model (CLRM)
100% (1)
Classical Linear Regression Model (CLRM)
68 pages
12 Autocorrelation
No ratings yet
12 Autocorrelation
9 pages
M2L2 CLRM & Simple Linear Regression Analysis
No ratings yet
M2L2 CLRM & Simple Linear Regression Analysis
13 pages
Econometrics 2013-2014 Entire Semester
No ratings yet
Econometrics 2013-2014 Entire Semester
30 pages
CHAPTER 5
No ratings yet
CHAPTER 5
10 pages
BEC 340 Econometrics I Course Outline
No ratings yet
BEC 340 Econometrics I Course Outline
6 pages
Welcome To The Course: Financial Econometrics I
No ratings yet
Welcome To The Course: Financial Econometrics I
14 pages
Econometrics Test Prep
100% (2)
Econometrics Test Prep
7 pages
Introductory Econometrics For Finance Chris Brooks Solutions To Review - Chapter 3
100% (2)
Introductory Econometrics For Finance Chris Brooks Solutions To Review - Chapter 3
7 pages
HW1 (1)
No ratings yet
HW1 (1)
7 pages
5384-1696308354598-week 1-2
No ratings yet
5384-1696308354598-week 1-2
57 pages
Block 2
No ratings yet
Block 2
105 pages
CH-3
No ratings yet
CH-3
123 pages
Chapter 2 - Regression Analysis
No ratings yet
Chapter 2 - Regression Analysis
49 pages
Econ 3049: Econometrics: Department of Economics The University of The West Indies, Mona
No ratings yet
Econ 3049: Econometrics: Department of Economics The University of The West Indies, Mona
16 pages
Chapter 2
No ratings yet
Chapter 2
58 pages
2 Basic Regression
No ratings yet
2 Basic Regression
69 pages
Econometric estimation BETA
No ratings yet
Econometric estimation BETA
36 pages
Simple Regression
No ratings yet
Simple Regression
45 pages
Ec 384 Applied Econometrics Topic 1 - 2023
No ratings yet
Ec 384 Applied Econometrics Topic 1 - 2023
99 pages
CH-15 - IInd Sem 23-24
No ratings yet
CH-15 - IInd Sem 23-24
99 pages
SIMPLE LINEAR REGRESSION ANALYSIS..
No ratings yet
SIMPLE LINEAR REGRESSION ANALYSIS..
51 pages
ECN 318 Lecture Notes Weeks 3-4
No ratings yet
ECN 318 Lecture Notes Weeks 3-4
25 pages
Outline _ Simple Regression
No ratings yet
Outline _ Simple Regression
51 pages
Applied Eco No Metrics
No ratings yet
Applied Eco No Metrics
75 pages
Wooldridge Notes
No ratings yet
Wooldridge Notes
15 pages
Chapter Two Metrics (I)
No ratings yet
Chapter Two Metrics (I)
35 pages
Lecture 2. Simple Linear Regression
No ratings yet
Lecture 2. Simple Linear Regression
49 pages
Ref. CH 3 Gujarati Book
No ratings yet
Ref. CH 3 Gujarati Book
51 pages
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Statistical Paradises and Paradoxes in Big Data (I) : Law of Large Populations, Big Data Paradox, and The 2016 Us Presidential Election
No ratings yet
Statistical Paradises and Paradoxes in Big Data (I) : Law of Large Populations, Big Data Paradox, and The 2016 Us Presidential Election
42 pages
Stat-II CH-TWO
No ratings yet
Stat-II CH-TWO
68 pages
03 - Causality PDF
No ratings yet
03 - Causality PDF
80 pages
Machine Learning Basics Dl2 Rk (1)
No ratings yet
Machine Learning Basics Dl2 Rk (1)
16 pages
Cluster Sampling
No ratings yet
Cluster Sampling
8 pages
ETF2121/ETF5912 Data Analysis in Business: Week 5: Estimation and Hypothesis Testing For Two Populations
No ratings yet
ETF2121/ETF5912 Data Analysis in Business: Week 5: Estimation and Hypothesis Testing For Two Populations
56 pages
D6518
No ratings yet
D6518
19 pages
Signal Estimation & Detection Theory
No ratings yet
Signal Estimation & Detection Theory
6 pages
ISLR Chap 6 Shaheryar
No ratings yet
ISLR Chap 6 Shaheryar
22 pages
Skewness Kurtosis
No ratings yet
Skewness Kurtosis
52 pages
STA301 SHORT NOTES (23 To 45) Final Term by JUNAID
100% (2)
STA301 SHORT NOTES (23 To 45) Final Term by JUNAID
16 pages
Time_Series_Analysis (3)
No ratings yet
Time_Series_Analysis (3)
115 pages
1. Basic Summation Notation
No ratings yet
1. Basic Summation Notation
16 pages
Stats300a Fall15 Lecture1
No ratings yet
Stats300a Fall15 Lecture1
7 pages
Week 1 Review and Posterior Probability Calculation Example: Duke University
No ratings yet
Week 1 Review and Posterior Probability Calculation Example: Duke University
6 pages
InTech-A Discussion On Current Quality Control Practices in Mineral Exploration
100% (1)
InTech-A Discussion On Current Quality Control Practices in Mineral Exploration
17 pages
Prediction-Powered Inference with Imputed Covariates and Nonuniform Sampling
No ratings yet
Prediction-Powered Inference with Imputed Covariates and Nonuniform Sampling
61 pages
ps8 Sol
No ratings yet
ps8 Sol
4 pages
Regents Practice Test 1: Algebra 2 Trigonometry
No ratings yet
Regents Practice Test 1: Algebra 2 Trigonometry
4 pages
All Merged Final KRD Cfal1 Quick Revision 2023
No ratings yet
All Merged Final KRD Cfal1 Quick Revision 2023
289 pages
Unit 1
No ratings yet
Unit 1
28 pages
Regression Techniques
No ratings yet
Regression Techniques
14 pages
Amazon ML Summer School India Previous Year Questions Detailed Solutions Step by Step by Vikram Gaur
100% (1)
Amazon ML Summer School India Previous Year Questions Detailed Solutions Step by Step by Vikram Gaur
20 pages
4.4 Sampling Distribution Models and The Central Limit Theorem
No ratings yet
4.4 Sampling Distribution Models and The Central Limit Theorem
35 pages
The Optimizers Curse
No ratings yet
The Optimizers Curse
12 pages
Distribucion Log Normal
No ratings yet
Distribucion Log Normal
52 pages
C 24 05
No ratings yet
C 24 05
58 pages
Measurement of Non-Circular Home Range: I. Jennrich
No ratings yet
Measurement of Non-Circular Home Range: I. Jennrich
11 pages

Econometria 2

Uploaded by

Econometria 2

Uploaded by

1) What is the central limit theorem and why it is important in statistics?

1) What is the central limit theorem and why it is important in statistics?

1) E(ut) = 0  The errors have zero mean

 Estimator  α^ and β^ are estimators of the true value of α and β

lim ρ r [|^β−β|>δ ]=0 ∀ δ >0

The least squares estimate of α^ and β^ are unbiased. That is:

3) What is meant for “autocorrelation”? Discuss the Durbin-Watson statistic.

First of all, we have to show the 5 assumptions of CLRM:

1) E(ut) = 0  The errors have zero mean

Now, we analyse the third assumption: Cov (ui, uj) = 0  No autocorrelation.

The test statistic is calculated by:

A model is stationary if it possible to write: yt= ф(L)-1ut.

We will now examine several volatility models:

σ2t = α0+α1u2t-1 + β σ2t-1

6) Discuss the properties of Garch models.

Qdt quantity of new houses demanded at time t.

Qst quantity of new houses supplied (built) at time t.

Pt (average) price of new houses prevailing at time t.

Tt some variable embodying the state of housebuilding technology.

ut and vt are the error terms.

P and Q are endogenous variables, while S and T are exogenous.

Solving for Q α+βP+yS+u= λ+µP+kT+v

8) Building ARMA models (the box Jenkins Approach). Please discuss.

First of all, we have to show the 5 assumptions of CLRM:

1) E(ut) = 0  The errors have zero mean

Now, we analyse the fourth assumption: Cov (ut, xt) = 0  no multicollinearity.

There are 2 types of multicollinearity: perfect multicollinearity and near multicollinearity.

Whether multicollinearity is present but is ignored there are 3 problems:

There are several solutions of the problem:

10) What is the difference between in-sample and out-sample forecasting?

It is useful to distinguish between two approaches to forecasting:

First of all, we have to show the 5 assumptions of CLRM:

1) E(ut) = 0  The errors have zero mean

There are two methods to detect the heteroscedasticity:

The first method is visible through the graphic below:

A further popular test is White’s (1980) general test for heteroscedasticity.

This test has several phases:

1) Assume that the regression is: yt= β1+ β2x2t+ β3x3t+ut

Other method to solve the problem is:

2. Use White's heteroscedasticity consistent standard error estimates.

12) Discuss the vector autoregressive model.

Where uit is a white noise disturbance term with E(uit)=0, (i=1,2).

Or even more compactly as:

y t =β 0+ β1 yt −1 + β 2 yt−2 +…+ β kyt−k +u t

-Choice of ideal lag length.

You might also like