HS Breakdown
HS Breakdown
analysis, where the variance of the errors or residuals is not constant across all levels of the
independent variables. Instead, the variance may exhibit a systematic pattern or change with
the values of the independent variables.
1. Definition: Heteroscedasticity occurs when the spread or dispersion of the residuals is not
consistent across the range of predicted values. This violates the assumption of
homoscedasticity, which assumes that the variance of the error term is constant.
c. Invalid hypothesis tests: Heteroscedasticity can lead to invalid hypothesis tests, such as t-
tests or F-tests, as the assumption of constant variance required for these tests is violated.
As a result, the p-values associated with these tests may be inaccurate, leading to incorrect
conclusions.
a. Residual plots: Plotting the residuals against the predicted values or against the
independent variables can reveal patterns or trends. If the spread of the residuals varies
systematically with the predicted values or exhibits a specific pattern, it indicates
heteroscedasticity.
b. Breusch-Pagan test: The Breusch-Pagan test is a statistical test that assesses the
presence of heteroscedasticity in a regression model. It compares the sum of squared
residuals from an auxiliary regression with a set of independent variables to a model with no
heteroscedasticity. If the test statistic is significant, it suggests the presence of
heteroscedasticity.
c. White test: The White test is another statistical test for heteroscedasticity. It examines
whether squared residuals are linearly related to the independent variables in an auxiliary
regression. If the test statistic is significant, it indicates heteroscedasticity.
a. Weighted least squares (WLS): WLS is a technique that accounts for heteroscedasticity
by assigning weights to observations based on their estimated variances. Observations with
higher variances receive lower weights, reducing the impact of influential observations on
the regression estimates.
When the residual variance is not constant, as indicated by VAR(ui) = σi2, it implies the
presence of heteroscedasticity. This has implications for the traditional ordinary least
squares (OLS) results in regression analysis. Here's how heteroscedasticity influences the
OLS results:
1. Biased coefficient estimates: Heteroscedasticity does not bias the point estimates of the
regression coefficients themselves. The OLS estimator remains unbiased in the presence of
heteroscedasticity. However, heteroscedasticity affects the precision and reliability of the
coefficient estimates.
What will the difference look like on scatter plots? When there is HS and when there isn't
HS
When examining scatter plots, the presence or absence of heteroscedasticity can manifest in
different ways. Here's how scatter plots may appear under heteroscedasticity and when
homoscedasticity is present:
1. Heteroscedasticity:
Scatter plots under heteroscedasticity exhibit a systematic change in the spread or
dispersion of the residuals across the range of predicted values. Here are some
characteristics of scatter plots indicating heteroscedasticity:
a. Fan-shaped pattern: The scatter plot may show a fan-shaped pattern, where the spread
of the residuals widens or narrows as the predicted values increase or decrease. This
suggests that the variability of the residuals is not constant across the range of predicted
values.
b. Increasing or decreasing spread: The spread of the residuals may systematically increase
or decrease as the predicted values increase or decrease. This indicates that the variability of
the residuals is related to the magnitude of the predicted values.
2. Homoscedasticity:
Scatter plots under homoscedasticity exhibit a consistent spread or dispersion of the
residuals across the range of predicted values. Here are some characteristics of scatter plots
indicating homoscedasticity:
a. Random scattering of residuals: The residuals are randomly scattered around the
horizontal line of zero residual, without any discernible pattern or trend. The spread of the
residuals remains relatively constant across different levels of predicted values.
b. Uniform distribution of residuals: The residuals are uniformly distributed above and
below the zero residual line, without any systematic variation. The spread of the residuals is
consistent, suggesting a constant variance of the errors.
By visually inspecting scatter plots, one can gain an initial indication of the presence of
heteroscedasticity. However, it is important to complement visual analysis with statistical
tests, such as the Breusch-Pagan test or White test, to confirm the presence of
heteroscedasticity and quantify its severity.
Sources of HS
1. Error learning models
2. Impact of income
3. Data collection techniques
4. Outliers
5. Distribution of the response variable / error 6. Model specification
1. Error learning models: Heteroscedasticity can arise from error learning models, where the
magnitude of the error term in the regression model depends on past errors or residuals.
For example, in financial markets, traders may adjust their strategies based on previous
errors, leading to changing volatility or dispersion of the residuals over time.
2. Impact of income: Heteroscedasticity can occur when the variability of the residuals is
influenced by income or wealth levels. For example, in economic studies, the income effect
may result in greater variability in the dependent variable as income increases, leading to
heteroscedasticity.
4. Outliers: The presence of outliers in the data can contribute to heteroscedasticity. Outliers
are extreme values that deviate significantly from the general pattern of the data. These
outliers may have a disproportionate impact on the variability of the residuals, leading to
heteroscedasticity.
5. Distribution of the response variable / error: Heteroscedasticity can arise due to the
inherent characteristics of the response variable or the error term. Certain distributions,
such as the exponential or Poisson distribution, exhibit increasing or decreasing variance
with the mean, resulting in heteroscedasticity. Similarly, when the error term is non-constant
across the range of the independent variables, it can lead to heteroscedasticity.
Consequences of HS
1. V AR(βˆ2) is biased
2. Inference might be misleading
GLS (still to be defined) and not OLS yields BLUE estimates.
GLS, as the name suggests, generalizes the OLS estimation method by incorporating a weight
matrix that reflects the structure of heteroscedasticity. The weight matrix is typically
estimated using methods such as the Weighted Least Squares (WLS) approach or Maximum
Likelihood Estimation (MLE) based on assumptions about the variance structure. By applying
GLS, the estimates of the coefficients can be adjusted for the heteroscedasticity, resulting in
more accurate and efficient inference.
Detection of HS
1. Informal methods
The nature of the problem
Graphical methods -
Strategy based on yhe way to formal methods.
– Devide the data in 2 equal parts - based on the values of X. – Calculate the mean of uˆ2
for each of the parts.
– See clear difference in the 2 mean values.
– Meaning???
– Devide the data in 3 approx equal parts - based on the values of X. – Calculate the mean
of uˆ2 for each of the parts.
– See clear difference in the 3 mean values. – Meaning???
2. Formal methods
Park test
Formalises graphical method - residual variance σi2 is a function of the explanatory
variable Xi. He suggests modelling (as a secondary regression)
Test β for significance. If significant it suggests heteroscedasticity. Why?
1. Informal methods:
a. The nature of the problem: Heteroscedasticity is often associated with patterns in the
residuals, such as increasing or decreasing spread or fan-shaped patterns. Observing such
patterns in the residuals can provide an informal indication of heteroscedasticity.
b. Graphical methods: Plotting the residuals against the predicted values or the independent
variables can help visualize the presence of heteroscedasticity. If the spread of the residuals
systematically changes with the predicted values or exhibits a clear pattern, it suggests
heteroscedasticity.
c. Dividing the data into two or three equal parts based on the values of the independent
variable (X) and calculating the mean of the squared residuals (uˆ2) for each part: If there is
a significant difference in the mean values of uˆ2 between the parts, it indicates
heteroscedasticity. This method aims to identify whether the spread of the residuals varies
across different ranges or levels of the independent variable.
2. Formal methods:
a. Park test: The Park test formalizes the graphical method by estimating a secondary
regression to test for the significance of the coefficient (β) of the independent variable (X).
The Park test involves the following steps:
i. Fit the initial regression model and obtain the residuals (uˆ).
ii. Estimate a secondary regression by regressing uˆ2 (squared residuals) on the
independent variable (X).
iii. Test the coefficient (β) of X in the secondary regression for statistical significance using a
t-test or other appropriate test.
iv. If the coefficient (β) of X in the secondary regression is statistically significant, it suggests
heteroscedasticity. The significance indicates that the variance of the residuals is related to
the values of the independent variable.
The Park test provides a formal way to assess heteroscedasticity by examining whether there
is a significant relationship between the squared residuals and the independent variable. If
there is a significant relationship, it indicates that the variability of the residuals is not
constant across different levels of the independent variable, suggesting the presence of
heteroscedasticity.
In summary, informal methods involve observing graphical patterns in the residuals, dividing
the data into parts based on the independent variable, and comparing the mean values of
squared residuals. Formal methods, such as the Park test, provide a statistical framework to
test for the significance of the relationship between the squared residuals and the
independent variable, providing a more rigorous assessment of heteroscedasticity.
Goldfeld-Quandt test
The Goldfeld-Quandt test is a commonly used formal test for heteroscedasticity. It is based
on the premise that if heteroscedasticity is present, the variance of the error term may differ
between two subgroups of the data based on a specific criterion, typically a partition of the
independent variable.
1. Partition the data: The first step of the Goldfeld-Quandt test involves dividing the data
into two subgroups based on a specific criterion. The most common criterion is the values of
the independent variable. The data is usually sorted in ascending or descending order based
on the independent variable.
2. Estimate two separate regressions: Next, two separate regressions are estimated, one for
each subgroup. The regression models are typically the same as the original regression
model, but they are estimated using only the data from the respective subgroup.
3. Calculate the test statistic: The Goldfeld-Quandt test statistic is computed based on the
ratio of the residual sum of squares (RSS) from the second subgroup regression to the RSS
from the first subgroup regression. The test statistic is given by:
GQ = (RSS2 / RSS1)
where RSS2 is the residual sum of squares from the second subgroup regression, and RSS1
is the residual sum of squares from the first subgroup regression.
4. Hypothesis testing: The test statistic is compared to a critical value from the F-distribution
at a chosen significance level. The critical value depends on the degrees of freedom
associated with the two regression models and the desired significance level. If the test
statistic exceeds the critical value, it suggests evidence of heteroscedasticity.
The Goldfeld-Quandt test provides a formal statistical test for the presence of
heteroscedasticity by comparing the variances of the error term between two subgroups of
the data. The test is widely used because it is relatively straightforward to implement and
interpret. However, it is important to note that the Goldfeld-Quandt test assumes that the
error terms are normally distributed and independent.