Assignment QMT533 RUJUKN
Assignment QMT533 RUJUKN
ASSESSMENT 2
GROUP MEMBERS:
GROUP: N4CS2416T2
1. Inertia
Inertia in economic time-series is a great reason for autocorrelation where the data
remain unchanged throughout the time series. For example, data such as GNP,
production, price index, employment, and unemployment exhibit business cycles.
Starting at the bottom of the recession when the economic recovery starts, most of
these series start moving upward. In this upswing, the value of a series at one point
in time is greater than its previous values. These successive periods are likely to be
interdependent.
, where, ,
the error or disturbance term will give back a systematic pattern. Thus creating false
autocorrelation, due to exclusion of 𝑋3𝑡 variable from the model. The effect of 𝑋3𝑡 will
4. Cobweb Phenomenon
The Cobweb phenomenon happens when quantity supplied in the period t of many
agricultural commodities depends on their price in period t-1. This is because the
decision to plant a crop in a period of t is determined by the price of the commodity in
that period. However, the actual supply of the commodity is available in the period
t-1. Production is influenced by pricing where when price increases, productivity also
increases. If the price decreases, then productivity also decreases. If the price in
period t+1 is low, the farmer will produce less in period t+2 than they did in period
t+1. Thus disturbances in the case of the Cobweb phenomenon are not expected to
be random but they will exhibit a systematic pattern and thus cause a problem of
autocorrelation.
5. Lags
Many times in business and economic research the lagged values of the dependent
variable are used as the independent variables. For example, to study the effect of
tastes and habits on consumption in a period t, consumption in period t-1 is used as
an explanatory variable since consumers do not change their consumption habits
easily. If the lagged terms are not included in the above consumption function, the
resulting error term will reflect a systematic pattern due to the impact of habits and
tastes on current consumption and therefore there will be an autocorrelation problem.
6. Manipulation of data
Often raw data are manipulated in the experimental analysis. For example, in
time-series regression that uses quarterly data, the data are usually obtained from
the monthly data by simply adding three monthly observations and dividing the sum
by 3. This averaging introduces smoothness into the data by decreasing the
fluctuations on the monthly data. This smoothness may itself lend to a systematic
pattern in the disturbances, thereby introducing autocorrelation. Interpolation or
extrapolation of data is also another example of data manipulation.
7. Non-Stationary
It is quite possible that both Y and X are non-stationary and therefore, the error is
also non-stationary. In this case, the error term will show an autocorrelation problem.
● HETEROSCEDASTICITY
The homoscedasticity assumption states that the variance of the unobserved error, u,
and conditional on the explanatory variables is constant (Wooldridge, 2018, p. 285).
When the segments are determined by the different values of the explanatory
variables, it shows that the homoscedasticity fails whenever the variance of the
unobserved factors changes across different segments of the population.
1. Heteroscedasticity occurs when the data that sets the variance of error for all
observations is not the same.
● Ordinary Least Squares (OLS) regression assumes that all residuals are
drawn from a population that has a constant variance (homoscedasticity).
● For example, there would be an increase in the individual’s income, but their
savings would remain constant.
● MULTICOLLINEARITY
Multicollinearity occurs when the independent variables in a regression model are correlated.
This correlation is concerning because the independent variables should be independent. A
high degree of correlation between variables can cause problems when fitting the model and
analyzing the results. When a researcher or analyst tries to discover how well each
independent variable can be used to predict or understand the dependent variable in a
statistical model, multicollinearity can lead to biased or incorrect conclusions.
In general, when two or more independent variables are related to each other, then at least
one or more variables are redundant because of their tendency to contribute overlapping
information. Although it is possible to obtain least-square estimates when multicollinearity
occurs, the use of statistical testing techniques and interpretation of the regression
coefficients become troublesome. This is because it's difficult to say which variable has the
most impact on the dependent variable. As a result, multicollinearity is mostly a data problem
rather than a model problem.
● Data-based multicollinearity
(OLS) estimators of the βs are still unbiased and consistent but the minimum
variance property is not satisfied.
2. The OLS estimators will be inefficient and therefore, no longer be the best linear
unbiased estimator (BLUE).
3. The estimated variance of the regression coefficients will be biased and inconsistent
and will be greater than the variances of estimate calculated by other methods,
2
therefore the hypothesis testing is no longer valid. In most of the cases, 𝑅 will be
overestimated which indicates a better fit than the one that truly exists. The
t-statistics and F-statistics tend to be higher.
4. The variance of random term 𝑢 may be underestimated if the 𝑢’s are autocorrelated.
2
2 ∑𝑢𝑖
2
That is, the random variance σ = 𝑛−2
is likely to be under-estimate the true σ .
autocorrelated then the OLS estimates are not asymptotic. That is βs are not
asymptotically efficient.
● HETEROSCEDASTICITY
1. The regression parameter estimates are still unbiased, but not truly accurate.
2. Because the variances of the estimates are too big, we cannot use the variances of
the coefficient formula to conduct significance tests and generate confidence
intervals, however the test is still acceptable.
3. The estimation of the dependent variable (y) for a particular value of the independent
variable (X) based on an estimation 𝛽 from the original data, which may have a lot of
variances, would be inefficient.
4. Generalized least square (GLS) E(𝛽) is more efficient than ordinary least squares
(OLS) E(𝛽) in the presence of heteroscedasticity.
● MULTICOLLINEARITY
i) Formal
There are two methods that can be used to detect autocorrelation formally. There is
the Run test and Durbin Watson test.
1. Run test
The Run test also known as Geary test which is a nonparametric test. In this test, run
is defined as an uninterrupted sequence of one symbol. Length of a run has a
number of elements in it. Negative serial correlation can be seen when there are too
many runs as it means that the residual change sign frequently. Meanwhile, too few
runs mean that the residuals change sign frequently, thus indicating positive serial
correlation.
1) Hypothesis
2) Test statistic:
R = Number of runs
3) Confidence interval:
𝐸(𝑅) ± 1. 96 × 𝑉𝑎𝑟(𝑅)
2𝑁1𝑁2 2𝑁1𝑁2(2𝑁1𝑁2−𝑁)
Where 𝐸(𝑅) = 𝑁
+ 1 𝑉𝑎𝑟(𝑅) = 2
𝑁 (𝑁−1)
𝑁1= Number of +
𝑁2= Number of -
4) Decision rule:
Reject H0 if R does not fall in the interval.
Durbin Watson statistic (d statistic) provides a test for first-order autocorrelation only.
It is important to note the assumption underlying the d statistic.
5. The regression model does not include the lagged value(s) of the
dependent variable as the one of the explanatory variables.
According to the rule of thumb, if the value of Durbin Watson lies between 1.8 and
2.2, there is no autocorrelation. The formula of Durbin Watson statistic is given as
follow:
∑(µ𝑡− (µ𝑡−1)2
𝑑 =
2
∑µ𝑡
ii) Informal
When there is positive autocorrelation, the graph of data against its error will show a
snake-like shape. Meanwhile, the graph will show the zigzag pattern if the plot is
connected when there is negative autocorrelation, as shown on the right.
The correlogram in Figure 3 is for the data shown in Figure 2. We can see in this plot
that at lag 0, the correlation is 1. It means that data is correlated with itself. At lag 1,
the correlation is shown to be around 0.5. There are also negative correlations when
the points are at lags 3, 4 and 5 apart.
● HETEROSCEDASTICITY
Heteroscedasticity occurs when the variance disturbance term differs across observations.
There are few ways to detect heteroscedasticity problems. The hypothesis for all the formal
test for heteroscedasticity as below:
(Formal way)
1. Park Test
The Park test is used for heteroscedasticity if there are some variables Z that might explain
the different variances of the residuals. The formula shown as below:
where vi is the stochastic disturbance term. Since σ^2i is generally not known. Park
suggests using u^2 i as proxy and running the following regression.
Test procedure:
Step 1: Run regression using ordinary least square (OLS): Yi= β0 + β1Xi +i to obtain
residual, ȗ^2 i
2. Glejser Test
The amount of random error increases proportionately to changes in one or more exogenous
variables, which is a test for heteroscedasticity. The test is carried out by regressing the
absolute values of the main regression equation's ordinary least squares residuals on the
variables in question. The formula for this test is shown as below:
Test procedure:
Step 1: Run regression using ordinary least square (OLS): Yi= β0 + β1Xi + mi, to obtain
residual, ȗ^2 i
where,
Test procedure:
Step 1: Run regression using ordinary least square (OLS): Yi= β0 + β1Xi + mi, to obtain
residual, ȗ^2 i
Step 2: Rank both | | and Xi in ascending or descending order, then compute the previously
calculated Spearman's rank correlation coefficient
Step 3: Assuming that the population rank correlation coefficient s is zero and that n > 8, the
significance of the sample rs may be assessed using the t – test, as shown below.
Test procedure:
Step 1: Sort the observations by X value, starting with the smallest X value.
Step 2: Obtain and omit c central observations. Separate the remaining (n-c) observations
into two groups.
Step 3: Run separate ordinary least squares (OLS) regression for the group 1 and group 2
and obtain Residual sum of square 1 (RSS1) and Residual sum of square 2 (RSS2).
5. Breush-Pagan-Godfrey Test
Step 1: Run regression using ordinary least square (OLS): Yi= β0 + β1Xi + mi, to obtain
residual, ȗ^2 i .
Step 5: Obtain the ESS (explained sum of squares) from above and define the formula
below.
Test procedure:
Step 1: Run regression using ordinary least square (OLS): Yi= β0 + β1Xi + mi, to obtain
residual, ȗ^2 i. .
Step 3: Calculate test statistics: n ×R^2 . If statistics n × R^2 > then Reject H0.
(Informal way)
If the points are distributed randomly about zero without any systematic pattern, this
indicates that there is no heteroscedasticity in the data. The figure shown below:
Figure 4
a)
This graph shows that it is not linear and nature is unknown.
b)
Based on the graph above show that the graph is Linear increase and presence of
heteroscedasticity.
c)
This graph shows that the graph is Heteroscedasticity with quadratic relationship.
d)
● MULTICOLLINEARITY
Formal
Multicollinearity occurs due to various problems and it can be solved by detecting the
problems. There are few ways in detecting multicollinearity.
Firstly, the value of is approximately more than 0.8, the F test in most cases will reject the
hypothesis that the partial slope coefficients are simultaneously equal to zero but the
individual t-tests will show that none or few of the partial slope coefficients are statistically
different from zero.
By checking the significance of each individual coefficient and the significance of the overall
model. If none of the individual coefficients is significant but for the overall model is
significant, then it can be concluded that multicollinearity exists.
VIF measures how much of the variation in one variable is explained by another variable.
The variance inflation factor (VIF) is calculated for each of the predictor variable where;
A VIF value greater than 10 indicates the presence of multicollinearity. Therefore, can use
tolerance statistics (TOL) as a measure of multicollinearity as this is the reciprocal of the
variance inflation factor (VIF). A tolerance value of less than 0.2 indicates there is a
multicollinearity.
The formula for tolerance statistics is:
If the pair-wise or zero-order correlation coefficient between two regressors is high, say in
excess of 0.8, then multicollinearity is a serious problem. The problem with this criterion is
that, although high zero-order correlations may suggest collinearity, it is not necessary that
they be high to have collinearity in any specific case.
Informal (graph/table/diagram)
For informal ways of detecting multicollinearity is by using scatter plot and correlation
matrices. Scatter plot can show the types of relationship between the independent variables.
If the scatter plot shows a linear relationship between variables, this indicates that it has the
presence of multicollinearity. It is a good practice to use scatter plots to see how the various
variables in a regression model are related.
Figure 5: Scatter Plot
Figure above shows the scatter plot of blood pressure, age, weight and body surface area.
This figure below shows a correlation of blood pressure, age, weight, body surface area,
duration of hypertension, basal pulse and stress index.
As we can see that there is a strong correlation between body surface area (BSA) and
weight where r = 0.875. Moreover, weight and pulse are fairly correlated since the value of r
= 0.659. Lastly, none of the pairwise correlations among age, weight, duration and stress are
particularly strongly correlated since the value of r is less than 0.40 in each case.
(d) Remedial measures
● AUTOCORRELATION
When autocorrelation error terms are found to be present:
2. Try to find out if the autocorrelation is pure autocorrelation and not the result
of misspecification of the model.
● HETEROSCEDASTICITY
The unbiasedness and consistency qualities of the Ordinary Least Square (OLS) estimator
are not destroyed by heteroscedasticity, since it remains unbiased and consistent in the
existence of heteroscedasticity. However, they are no longer efficient, even asymptotically.
The standard hypothesis testing approach is suspect due to its inefficiency. As a result,
some heteroscedasticity remedial actions should be implemented. There are two
approaches to remediation: one when the variance is known and the other when the
variance is unknown.
i) Variance is known.
2
Heteroscedasticity is present if 𝑉(µ𝑖) = σ . Weighted least squares (WLS), which is a
specific case of Generalized Least Squares (GLS), can be used to correct heteroscedasticity
given the variance values. The OLS technique of estimation for the converted model is
weighted least squares. When heteroscedasticity is identified using a statistical test, the best
option is to alter the original model so that the modified disturbance factor has constant
variance. The transformed model is nothing more than a data modification. The modified
error term like homoscedastic has a constant variance. Following equation:
µ
𝑉(µ𝑖 *) = 𝑉( σ𝑖 )
𝑖
1
= 2
𝑉𝑎𝑟(µ𝑖)
σ𝑖
1 2
= 2
σ𝑖 = 1
σ𝑖
Since the individual error variance is not always known a priori, this method has
limitations. Reasonable estimations of the genuine error variances can be made and used
for variance in the event of significant sample information.
● MULTICOLLINEARITY
For multicollinearity there are some remedial measures to remedy the violation of basic
regression assumption.
1. Dropping a variable(s)
When faced with severe multicollinearity, the simplest ways are to drop one or more of the
correlated independent variables. However, after having dropped the variable(s), it may lose
some information. The omitted variables result in biased coefficient estimates for the
remaining independent variables
2. Do nothing
Secondly, leave the model as is, despite multicollinearity. The presence of multicollinearity
does not affect the fitted model as the predictor variables remain the same pattern in the
regression model.
Other than that, obtain more data to increase the sample size. However, it is possible that in
another sample involving the same variables collinearity may not be too serious as in the
first sample. This way will decrease the standard errors and reduce the collinearity problems.
But simply increasing the size of the sample may attenuate the collinearity problem.
4. Transform variable(s)
Transformation used is to correct the model inadequacies. From the model, transform the
highly correlated variables by transforming them into first differences of logs.
This data set is used to show an example for autocorrelation problems using R software.
This Crime Rate data set was taken in the United State of America. The variables are mostly
discrete variables as they measure populations per 1000, there are some continuous
variables such as those measuring expenditure. It consists of 14 variables where the
dependent variable is CrimeRate and it has 47 observations in total. Based on table 1, the fit
regression model is set to detect autocorrelation problems.
Based on figure 7, the data shows an autocorrelation problem detected by using the
Durbin-Watson test. According to the rule of thumb, if the value of Durbin-Watson lies
outside 1.8 and 2.2, there is an autocorrelation problem. Since the Durbin-Watson value is
equal to 0.9830947, therefore autocorrelation exists in the model.
● HETEROSCEDASTICITY
This data set is used to show an example for heteroscedasticity problems using R software.
This Clothing Sales data set was taken in the Netherlands. The data is on the annual sales
of men’s fashion stores. It consists of 8 variables where the dependent variable is Tsales
which refers to annual sales in Dutch guilders and the independent variables are sales per
square meter as sales, gross-profit-margin as margin, number of full-timers as nfull, total
number of hours worked as hours, investment in shop-premises as inv1, investment in
automation as inv2 and sales floor space of the store as ssize. This data set has 392
observations in total. Based on table 2, the fit regression model is set to detect
heteroscedasticity problems.
Table 2 : Clothing Sales data set.
Figure 8: The scripts and command in R studio to check heteroscedasticity
Since the p-value=2.2e^-16 is less than alpha=0.05, the null hypothesis is rejected.
Therefore, there is heteroscedasticity present in the error variance.
● MULTICOLLINEARITY
This data set is used to show an example for multicollinearity problems using R software.
This GPA and Medical School Admission data set was taken at a midwestern liberal arts
college. It consists of 7 variables where the dependent variable is GPA which refers to the
Grade Point Average of the college students and the independent variables are Verbal
reasoning (subscore) as VR, Physical sciences (subscore) as PS, Writing sample (subcore)
as WS,Biological sciences (subscore) as BS, score on the MCAT exam which is the sum of
(VR+PS+WS+BS) as MCAT and number of medical schools applied to as Apps. This data
set has 55 observations in total. Based on table 3, the fit regression model is set to detect
multicollinearity problems.
Table 3 : GPA and Medical School Admission data set.
Figure 9 : The scripts and command in R studio to check multicollinearity problems.
Based on Zach (2021), the error in vif.default(model) occurs when a multicollinearity problem
exists in a regression model. This multicollinearity problem happens when two or more
predictor variables in the model are highly correlated. Based on figure 9, it shows that the
multicollinearity problem exists in this model because the error occurs when running the test.
Reference
Zach, . (2021, October 21). How to fix in R: There are aliased coefficients in the model.
Statology. Retrieved January 4, 2022, from
https://www.statology.org/r-aliased-coefficients-in-the-model/
PennState Eberly College of Science. (n.d.). 10.7 - detecting multicollinearity using variance
inflation factors. 10.7 - Detecting Multicollinearity Using Variance Inflation Factors | STAT
462. Retrieved January 4, 2022, from https://online.stat.psu.edu/stat462/node/180/
Frost, J., Pitcher, S., Shamshad, D., Matt, Utterback, C., Knaub, J., Nousheen, R., Debono,
K., Mike, Lombardi, J., López, J. M. P., Knaub, J., Knaub, J. R., Joey, Ghidena, Hart, R.,
Phabdallah, Albers, M., C., J., … Nick. (2019, March 15). Heteroscedasticity in regression
analysis. Statistics By Jim. Retrieved January 4, 2022, from
https://statisticsbyjim.com/regression/heteroscedasticity-regression/
INTRODUCTION………………………………………………………………………………………
…. 2 A) THE NATURE OF AUTOCORRELATION, HETEROSCEDASTICITY AND
MULTICOLLINEARITY……………………………………………………………………………….3
B) CONSEQUENCES OF AUTOCORRELATION, HETEROSCEDASTICITY AND
MULTICOLLINEARITY……………………………………………………………………………….7
C) DETECTION OF AUTOCORRELATION, HETEROSCEDASTICITY AND
MULTICOLLINEARITY……………………………………………………………………………….9
D) REMEDIAL
MEASURES……………………………………………………………………..............26 E)
EXAMPLE FOR AUTOCORRELATION, HETEROSCEDASTICITY AND
MULTICOLLINEARITY……………………………………………………………………………...3
1
REFERENCES…………………………………………………………………………………………
…39