0% found this document useful (0 votes)
103 views

HS Breakdown

The document discusses heteroscedasticity, which refers to unequal variance of residuals across predictor variable values in a regression model. This violates the assumption of homoscedasticity. Heteroscedasticity can bias coefficient estimates and standard errors, rendering hypothesis tests invalid. It can be detected graphically using residual plots or statistically using tests like Breusch-Pagan. Weighted least squares and robust standard errors help address heteroscedasticity issues.

Uploaded by

Thane Snyman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views

HS Breakdown

The document discusses heteroscedasticity, which refers to unequal variance of residuals across predictor variable values in a regression model. This violates the assumption of homoscedasticity. Heteroscedasticity can bias coefficient estimates and standard errors, rendering hypothesis tests invalid. It can be detected graphically using residual plots or statistically using tests like Breusch-Pagan. Weighted least squares and robust standard errors help address heteroscedasticity issues.

Uploaded by

Thane Snyman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Heteroscedasticity refers to a violation of the assumption of homoscedasticity in regression

analysis, where the variance of the errors or residuals is not constant across all levels of the
independent variables. Instead, the variance may exhibit a systematic pattern or change with
the values of the independent variables.

1. Definition: Heteroscedasticity occurs when the spread or dispersion of the residuals is not
consistent across the range of predicted values. This violates the assumption of
homoscedasticity, which assumes that the variance of the error term is constant.

2. Consequences: Heteroscedasticity can have several implications for regression analysis:

a. Biased standard errors: When heteroscedasticity is present, the estimated standard


errors of the coefficient estimates can be biased. This affects the reliability of hypothesis
tests and confidence intervals associated with the coefficients, potentially leading to
incorrect inferences about the significance of the variables.

b. Inefficient coefficient estimates: Heteroscedasticity can lead to inefficiency in the


estimation of the coefficients. In particular, the ordinary least squares (OLS) estimators may
no longer be the Best Linear Unbiased Estimators (BLUE), meaning they may not be the most
efficient estimators available.

c. Invalid hypothesis tests: Heteroscedasticity can lead to invalid hypothesis tests, such as t-
tests or F-tests, as the assumption of constant variance required for these tests is violated.
As a result, the p-values associated with these tests may be inaccurate, leading to incorrect
conclusions.

d. Inaccurate confidence intervals: Heteroscedasticity affects the construction of


confidence intervals for the coefficients, making them wider or narrower than they should
be. This can result in overconfidence or underconfidence in the precision of the coefficient
estimates.

3. Detection: There are several graphical and statistical methods to detect


heteroscedasticity:

a. Residual plots: Plotting the residuals against the predicted values or against the
independent variables can reveal patterns or trends. If the spread of the residuals varies
systematically with the predicted values or exhibits a specific pattern, it indicates
heteroscedasticity.

b. Breusch-Pagan test: The Breusch-Pagan test is a statistical test that assesses the
presence of heteroscedasticity in a regression model. It compares the sum of squared
residuals from an auxiliary regression with a set of independent variables to a model with no
heteroscedasticity. If the test statistic is significant, it suggests the presence of
heteroscedasticity.
c. White test: The White test is another statistical test for heteroscedasticity. It examines
whether squared residuals are linearly related to the independent variables in an auxiliary
regression. If the test statistic is significant, it indicates heteroscedasticity.

4. Remedial measures: When heteroscedasticity is detected, there are several approaches to


address the issue:

a. Weighted least squares (WLS): WLS is a technique that accounts for heteroscedasticity
by assigning weights to observations based on their estimated variances. Observations with
higher variances receive lower weights, reducing the impact of influential observations on
the regression estimates.

b. Transformations: Transforming the dependent variable or the independent variables can


sometimes help mitigate heteroscedasticity. Examples include taking logarithms, square
roots, or inverse transformations.

c. Heteroscedasticity-consistent standard errors: Robust standard errors, such as White's


heteroscedasticity-consistent standard errors or Huber-White standard errors, provide
consistent estimates of the standard errors in the presence of heteroscedasticity. These
robust standard errors can be used to obtain reliable hypothesis tests and confidence
intervals.

VAR(ui) = σi2 indicates the the residual variance is not constant.


How does this influence the traditional OLS results?

When the residual variance is not constant, as indicated by VAR(ui) = σi2, it implies the
presence of heteroscedasticity. This has implications for the traditional ordinary least
squares (OLS) results in regression analysis. Here's how heteroscedasticity influences the
OLS results:

1. Biased coefficient estimates: Heteroscedasticity does not bias the point estimates of the
regression coefficients themselves. The OLS estimator remains unbiased in the presence of
heteroscedasticity. However, heteroscedasticity affects the precision and reliability of the
coefficient estimates.

2. Inefficient standard errors: Heteroscedasticity violates the assumption of


homoscedasticity, which assumes constant error variance. Consequently, the standard errors
estimated by OLS become inefficient. In other words, the standard errors of the coefficient
estimates are no longer the most efficient estimates available. As a result, hypothesis tests
based on these standard errors may lead to incorrect conclusions regarding the statistical
significance of the coefficients.

3. Incorrect inference: The violation of homoscedasticity undermines the validity of


hypothesis tests and confidence intervals based on the assumption of constant error
variance. The traditional t-tests, F-tests, and confidence intervals may provide inaccurate p-
values and confidence intervals, potentially leading to incorrect inferences.
4. Inflated standard errors: Heteroscedasticity tends to inflate the standard errors of the
coefficient estimates for variables that have a larger impact on the dependent variable in
areas where the variance of the errors is high. Consequently, these variables may appear
statistically insignificant due to the increased uncertainty associated with their estimates.

5. Invalid hypothesis tests: Heteroscedasticity can render hypothesis tests invalid,


particularly those that assume constant error variance. This includes t-tests for individual
coefficients and F-tests for overall model significance. The calculated p-values based on the
assumption of homoscedasticity may not accurately reflect the true significance levels.

To address the issues associated with heteroscedasticity, alternative estimation techniques


and statistical tests that account for heteroscedasticity, such as weighted least squares
(WLS), robust standard errors, or tests like the White test, can be employed. These
techniques provide more reliable and efficient estimates of the coefficient standard errors
and improve the accuracy of hypothesis tests and confidence intervals.

What will the difference look like on scatter plots? When there is HS and when there isn't
HS

When examining scatter plots, the presence or absence of heteroscedasticity can manifest in
different ways. Here's how scatter plots may appear under heteroscedasticity and when
homoscedasticity is present:

1. Heteroscedasticity:
Scatter plots under heteroscedasticity exhibit a systematic change in the spread or
dispersion of the residuals across the range of predicted values. Here are some
characteristics of scatter plots indicating heteroscedasticity:

a. Fan-shaped pattern: The scatter plot may show a fan-shaped pattern, where the spread
of the residuals widens or narrows as the predicted values increase or decrease. This
suggests that the variability of the residuals is not constant across the range of predicted
values.

b. Increasing or decreasing spread: The spread of the residuals may systematically increase
or decrease as the predicted values increase or decrease. This indicates that the variability of
the residuals is related to the magnitude of the predicted values.

c. Outliers with high residuals: Heteroscedasticity can be indicated by the presence of


outliers with high residuals in specific regions of the scatter plot. These outliers may have
larger residuals due to the increased variability associated with the heteroscedasticity.

2. Homoscedasticity:
Scatter plots under homoscedasticity exhibit a consistent spread or dispersion of the
residuals across the range of predicted values. Here are some characteristics of scatter plots
indicating homoscedasticity:
a. Random scattering of residuals: The residuals are randomly scattered around the
horizontal line of zero residual, without any discernible pattern or trend. The spread of the
residuals remains relatively constant across different levels of predicted values.

b. Uniform distribution of residuals: The residuals are uniformly distributed above and
below the zero residual line, without any systematic variation. The spread of the residuals is
consistent, suggesting a constant variance of the errors.

c. No specific pattern or shape: In contrast to the fan-shaped or systematically changing


patterns seen in heteroscedasticity, scatter plots under homoscedasticity display no specific
pattern or shape related to the predicted values.

By visually inspecting scatter plots, one can gain an initial indication of the presence of
heteroscedasticity. However, it is important to complement visual analysis with statistical
tests, such as the Breusch-Pagan test or White test, to confirm the presence of
heteroscedasticity and quantify its severity.

Sources of HS
1. Error learning models
2. Impact of income
3. Data collection techniques
4. Outliers
5. Distribution of the response variable / error 6. Model specification

1. Error learning models: Heteroscedasticity can arise from error learning models, where the
magnitude of the error term in the regression model depends on past errors or residuals.
For example, in financial markets, traders may adjust their strategies based on previous
errors, leading to changing volatility or dispersion of the residuals over time.

2. Impact of income: Heteroscedasticity can occur when the variability of the residuals is
influenced by income or wealth levels. For example, in economic studies, the income effect
may result in greater variability in the dependent variable as income increases, leading to
heteroscedasticity.

3. Data collection techniques: Heteroscedasticity can be introduced due to the data


collection techniques used. For instance, surveys or measurements with varying levels of
accuracy or precision can lead to differences in the variability of the observed data, resulting
in heteroscedasticity.

4. Outliers: The presence of outliers in the data can contribute to heteroscedasticity. Outliers
are extreme values that deviate significantly from the general pattern of the data. These
outliers may have a disproportionate impact on the variability of the residuals, leading to
heteroscedasticity.

5. Distribution of the response variable / error: Heteroscedasticity can arise due to the
inherent characteristics of the response variable or the error term. Certain distributions,
such as the exponential or Poisson distribution, exhibit increasing or decreasing variance
with the mean, resulting in heteroscedasticity. Similarly, when the error term is non-constant
across the range of the independent variables, it can lead to heteroscedasticity.

6. Model specification: Heteroscedasticity can also be caused by the misspecification of the


regression model. If important variables or functional forms are omitted from the model, it
can lead to biased and inefficient coefficient estimates, resulting in heteroscedasticity.

It is important to identify the specific sources of heteroscedasticity in order to appropriately


address them. By understanding the factors contributing to heteroscedasticity, researchers
can choose appropriate remedial measures or modeling techniques to account for or
mitigate its effects.

Consequences of HS
1. V AR(βˆ2) is biased
2. Inference might be misleading
GLS (still to be defined) and not OLS yields BLUE estimates.

When heteroscedasticity is present in a regression model, it has several consequences that


can impact the validity of the ordinary least squares (OLS) estimates and the associated
inference. Here are the main consequences of heteroscedasticity:

1. Biased estimates of the coefficient variances (VAR(βˆ)): Heteroscedasticity can lead to


biased estimates of the variances of the coefficient estimates (VAR(βˆ)). The OLS estimates
of the coefficient variances assume homoscedasticity, but in the presence of
heteroscedasticity, these estimates are biased. This means that the estimated variances of
the coefficients may not accurately reflect the true variability of the coefficient estimates.

2. Misleading inference: Heteroscedasticity can result in misleading inference, leading to


incorrect conclusions about the statistical significance of the coefficients. The t-tests, F-tests,
and confidence intervals based on the assumption of homoscedasticity may provide
inaccurate p-values and confidence intervals. This can lead to incorrect interpretations of the
statistical significance of the variables and potentially incorrect decisions based on the
results.

3. Generalized Least Squares (GLS) as a remedy: In the presence of heteroscedasticity,


Generalized Least Squares (GLS) is a preferred estimation technique compared to OLS. GLS
allows for the estimation of the model parameters while accounting for heteroscedasticity
by incorporating information about the variance structure of the errors. GLS estimates are
considered to be Best Linear Unbiased Estimators (BLUE) under heteroscedasticity, unlike
OLS estimates. By using GLS, unbiased and efficient estimates of the coefficients can be
obtained, addressing the bias and inefficiency associated with heteroscedasticity.

GLS, as the name suggests, generalizes the OLS estimation method by incorporating a weight
matrix that reflects the structure of heteroscedasticity. The weight matrix is typically
estimated using methods such as the Weighted Least Squares (WLS) approach or Maximum
Likelihood Estimation (MLE) based on assumptions about the variance structure. By applying
GLS, the estimates of the coefficients can be adjusted for the heteroscedasticity, resulting in
more accurate and efficient inference.

In summary, heteroscedasticity introduces bias in the estimation of the coefficient variances,


leading to misleading inference based on traditional OLS techniques. GLS estimation
provides a remedy by accounting for heteroscedasticity and yielding unbiased and efficient
estimates of the coefficients, making it the preferred approach when heteroscedasticity is
present in the data.

Detection of HS
1. Informal methods
ˆ The nature of the problem
ˆ Graphical methods -
ˆ Strategy based on yhe way to formal methods.
– Devide the data in 2 equal parts - based on the values of X. – Calculate the mean of uˆ2
for each of the parts.
– See clear difference in the 2 mean values.
– Meaning???
– Devide the data in 3 approx equal parts - based on the values of X. – Calculate the mean
of uˆ2 for each of the parts.
– See clear difference in the 3 mean values. – Meaning???
2. Formal methods
ˆ Park test
Formalises graphical method - residual variance σi2 is a function of the explanatory
variable Xi. He suggests modelling (as a secondary regression)
Test β for significance. If significant it suggests heteroscedasticity. Why?

1. Informal methods:
a. The nature of the problem: Heteroscedasticity is often associated with patterns in the
residuals, such as increasing or decreasing spread or fan-shaped patterns. Observing such
patterns in the residuals can provide an informal indication of heteroscedasticity.

b. Graphical methods: Plotting the residuals against the predicted values or the independent
variables can help visualize the presence of heteroscedasticity. If the spread of the residuals
systematically changes with the predicted values or exhibits a clear pattern, it suggests
heteroscedasticity.

c. Dividing the data into two or three equal parts based on the values of the independent
variable (X) and calculating the mean of the squared residuals (uˆ2) for each part: If there is
a significant difference in the mean values of uˆ2 between the parts, it indicates
heteroscedasticity. This method aims to identify whether the spread of the residuals varies
across different ranges or levels of the independent variable.

2. Formal methods:
a. Park test: The Park test formalizes the graphical method by estimating a secondary
regression to test for the significance of the coefficient (β) of the independent variable (X).
The Park test involves the following steps:
i. Fit the initial regression model and obtain the residuals (uˆ).
ii. Estimate a secondary regression by regressing uˆ2 (squared residuals) on the
independent variable (X).
iii. Test the coefficient (β) of X in the secondary regression for statistical significance using a
t-test or other appropriate test.
iv. If the coefficient (β) of X in the secondary regression is statistically significant, it suggests
heteroscedasticity. The significance indicates that the variance of the residuals is related to
the values of the independent variable.

The Park test provides a formal way to assess heteroscedasticity by examining whether there
is a significant relationship between the squared residuals and the independent variable. If
there is a significant relationship, it indicates that the variability of the residuals is not
constant across different levels of the independent variable, suggesting the presence of
heteroscedasticity.

In summary, informal methods involve observing graphical patterns in the residuals, dividing
the data into parts based on the independent variable, and comparing the mean values of
squared residuals. Formal methods, such as the Park test, provide a statistical framework to
test for the significance of the relationship between the squared residuals and the
independent variable, providing a more rigorous assessment of heteroscedasticity.

Goldfeld-Quandt test

The Goldfeld-Quandt test is a commonly used formal test for heteroscedasticity. It is based
on the premise that if heteroscedasticity is present, the variance of the error term may differ
between two subgroups of the data based on a specific criterion, typically a partition of the
independent variable.

Here's an overview of the Goldfeld-Quandt test:

1. Partition the data: The first step of the Goldfeld-Quandt test involves dividing the data
into two subgroups based on a specific criterion. The most common criterion is the values of
the independent variable. The data is usually sorted in ascending or descending order based
on the independent variable.

2. Estimate two separate regressions: Next, two separate regressions are estimated, one for
each subgroup. The regression models are typically the same as the original regression
model, but they are estimated using only the data from the respective subgroup.

3. Calculate the test statistic: The Goldfeld-Quandt test statistic is computed based on the
ratio of the residual sum of squares (RSS) from the second subgroup regression to the RSS
from the first subgroup regression. The test statistic is given by:

GQ = (RSS2 / RSS1)
where RSS2 is the residual sum of squares from the second subgroup regression, and RSS1
is the residual sum of squares from the first subgroup regression.

4. Hypothesis testing: The test statistic is compared to a critical value from the F-distribution
at a chosen significance level. The critical value depends on the degrees of freedom
associated with the two regression models and the desired significance level. If the test
statistic exceeds the critical value, it suggests evidence of heteroscedasticity.

- If GQ > F-critical value, reject the null hypothesis of homoscedasticity.


- If GQ ≤ F-critical value, fail to reject the null hypothesis of homoscedasticity.

The Goldfeld-Quandt test provides a formal statistical test for the presence of
heteroscedasticity by comparing the variances of the error term between two subgroups of
the data. The test is widely used because it is relatively straightforward to implement and
interpret. However, it is important to note that the Goldfeld-Quandt test assumes that the
error terms are normally distributed and independent.

In conclusion, the Goldfeld-Quandt test is a frequently employed test for heteroscedasticity,


providing a formal statistical approach to examine whether the variance of the error term
differs between two subgroups of the data based on a specific criterion, such as the values
of the independent variable.

You might also like