Econ Ex1 Final
Econ Ex1 Final
Heteroskedasticity occurs when the variance of the error terms in a regression model is not
constant across observations. This violates one of the key assumptions of Ordinary Least Squares
(OLS), known as homoskedasticity, where the variance of the errors is assumed to be constant.
Under the assumption of homoskedasticity (i.e., the error variance is constant), the usual
standard error for an OLS estimate β^jβ^j is given by:
SE(β^j)=σ^2(X′X)jj−1SE(β^j)=σ^2(X′X)jj−1
Where:
σ^2=∑i=1nu^i2n−kσ^2=n−k∑i=1nu^i2
where u^iu^i are the residuals (differences between the observed and predicted values
of YY), nn is the number of observations, and kk is the number of parameters in the model.
This formula assumes that the variance of the residuals (or errors) is the same for all
observations. When heteroskedasticity is present, this assumption is violated, and the usual
standard errors are no longer valid.
Key points:
Homoskedasticity assumes the error terms have equal variance across all values of the
independent variables.
Heteroskedasticity means error variances differ, which biases standard errors if
uncorrected.
Robust standard errors adjust for this by taking into account the varying residuals
across observations. They are still based on OLS estimates of the coefficients, but the
adjustment compensates for the non-constant variance.
Using a logarithmic transformation in regression models is common for addressing issues like
heteroskedasticity and improving the interpretation of coefficients. There are different ways in
which log transformations can be applied, and each has its implications for interpretation.
Similarities:
Both models rely on the same fundamental assumptions of OLS (like unbiasedness, etc.),
but log-transformed models tend to perform better when heteroskedasticity or skewness
is an issue.
Both models provide estimates for the relationship between the independent and
dependent variables, but the interpretation of these relationships differs.
Differences:
1. Interpretation:
o In a usual linear model, coefficients represent the absolute change in the
dependent variable for a 1-unit change in the independent variable.
o In a log-linear or log-log model, coefficients represent percentage changes (log-
linear) or elasticities (log-log), which can be more meaningful in economics,
finance, and growth models.
2. Effect on Heteroskedasticity:
o A log transformation often reduces heteroskedasticity, improving model
performance and validity of standard errors without requiring robust standard
errors.
o In the usual linear model, robust standard errors are necessary if
heteroskedasticity is present to make valid inferences.
3. Non-linearity:
o Log transformations allow us to capture non-linear relationships between
variables (e.g., diminishing returns), while a usual linear model captures only
linear relationships.
Key Takeaways:
Heteroskedasticity is a common issue that can distort inference in OLS, but robust
standard errors can correct for it.
Log transformations are useful for dealing with non-linearity, heteroskedasticity, and
skewness, and they change how we interpret the coefficients (as percentage changes or
elasticities).
The decision to use a log transformation or robust standard errors depends on the data
and the specific objectives of the analysis.
The t-test and F-test are fundamental statistical tools used in multiple
regression analysis for hypothesis testing and inference. Let's explore each
in detail:
The t-Test
The t-test is used to determine whether an individual regression coefficient is
statistically significant, i.e., whether it's significantly different from zero.
Key Components:
1. Null Hypothesis (H₀): Typically assumes that the population
parameter (β) is zero.
2. Alternative Hypothesis (H₁): States that β ≠ 0 (two-sided) or β > 0
or β < 0 (one-sided).
3. Test Statistic: t = (β̂ - β₀) / se(β̂), where β̂ is the estimated coefficient
and se(β̂) is its standard error.
Interpretation:
The t-statistic follows a t-distribution with (n-k-1) degrees of freedom,
where n is the sample size and k is the number of independent
variables.
A larger absolute t-value indicates stronger evidence against the null
hypothesis.
The p-value is the probability of observing a t-statistic as extreme as
the calculated value, assuming the null hypothesis is true.
Example:
In a wage regression model:
text
use WAGE1, clear
reg lwage educ exper tenure
Here, we would examine the t-statistics for each coefficient (educ, exper,
tenure) to determine their individual significance in explaining log wages
1
.
One-sided vs. Two-sided Tests:
Two-sided: Tests whether the coefficient is different from zero in
either direction.
One-sided: Tests whether the coefficient is greater than or less than
zero.
The choice depends on the research question and prior expectations about
the relationship between variables.
The F-Test
The F-test is used to test multiple linear restrictions simultaneously, often to
assess the joint significance of a group of variables.
Key Components:
1. Null Hypothesis (H₀): A set of linear restrictions on the coefficients.
2. Alternative Hypothesis (H₁): At least one of the restrictions in H₀ is
false.
3. Test Statistic: F = [(SSR_r - SSR_ur) / q] / [SSR_ur / (n - k - 1)], where
SSR_r and SSR_ur are the sum of squared residuals for the restricted
and unrestricted models, q is the number of restrictions, n is the
sample size, and k is the number of independent variables in the
unrestricted model.
Interpretation:
The F-statistic follows an F-distribution with (q, n-k-1) degrees of
freedom.
A larger F-value provides stronger evidence against the null
hypothesis.
The p-value is the probability of observing an F-statistic as large as the
calculated value, assuming the null hypothesis is true.
Example:
In a Major League Baseball players' salary model:
text
use MLB1, clear
reg lsalary years gamesyr bavg hrunsyr rbisyr
An F-test could be used to test whether batting average (bavg), home runs
(hrunsyr), and runs batted in (rbisyr) jointly affect salary
1
.
Applications:
1. Exclusion Restrictions: Testing whether a group of variables can be
excluded from the model.
2. Comparing Nested Models: Determining if additional variables
significantly improve the model's explanatory power.
3. Testing for Structural Breaks: Assessing if coefficients change
across different subsamples.
Relationship between F and t Statistics
For testing a single restriction, the F-statistic is equal to the square of the t-
statistic. This relationship highlights the connection between these two
fundamental tests in regression analysis.Understanding and applying t-tests
and F-tests are crucial for conducting rigorous statistical inference in multiple
regression analysis, allowing researchers to draw meaningful conclusions
about the relationships between variables and the overall fit of their models.
SLR Assumptions
1. Linearity: The relationship between X and Y is linear in parameters.
2. Random Sampling: The sample is randomly drawn from the
population.
3. Sample Variation in X: The sample outcomes of X are not all the
same value.
4. Zero Conditional Mean: $E(u|X) = 0$, meaning the error term has an
expected value of zero given any value of X.
5. Homoskedasticity: $Var(u|X) = \sigma^2$, meaning the error term
has constant variance given any value of X.
MLR Assumptions
The Multiple Linear Regression model extends SLR to include multiple
independent variables: Yi=β0+β1X1i+β2X2i+...+βkXki+uiYi=β0+β1X1i+β2X2i+...
+βkXki+uiMLR assumptions include all SLR assumptions, plus:
6. No Perfect Collinearity: None of the independent variables is a
perfect linear combination of the others.
7. Normality: The error term u is normally distributed (for exact
statistical inference).
If the error term is not normally distributed in Ordinary Least Squares (OLS) regression, it has
several implications:
6. Efficiency:
- While OLS remains BLUE, it may not be the most efficient estimator when errors are not
normal. Other estimation methods might perform better in such cases.
7. Robustness:
- Non-normal errors, especially those with heavy tails, can make OLS estimates less robust to
outliers.
8. Alternative Approaches:
- If non-normality is a concern, especially in small samples, alternative methods like:
- Robust regression techniques
- Bootstrapping for inference
- Generalized Linear Models (GLMs)
- Quantile regression
might be more appropriate.
9. Diagnostic Importance:
- While not critical for unbiasedness or consistency, checking for normality remains important
as part of a comprehensive model diagnostic process.
In summary, while non-normal errors don't invalidate OLS estimates or their basic properties,
they can affect the reliability of statistical inference, especially in small samples. It's important
to consider the nature and extent of non-normality and potentially use alternative methods if
the violation is severe or the sample size is small.
Which of the following can cause the usual OLS t statistics to be invalid (that is, not to
have t distributions under H0)?
(i) Heteroskedasticity.
(ii) A sample correlation coefficient of .95 between two independent variables that are in
the model.
(iii) Omitting an important explanatory variable.
**Answer:** C. A restricted model will always have fewer parameters than its unrestricted model.
**Answer:** C. Taking logarithmic form of variables make the slope coefficients more responsive to
rescaling.
- **A.** There is a perfect collinearity problem in the following regression equation: $$cons = \beta_0 + \
beta_1 inc + \beta_2 inc^2 + u$$
- **B.** There is a perfect collinearity problem in the following regression equation: $$\log(cons) = \
beta_0 + \beta_1 \log(inc) + \beta_2 \log(inc^2) + u$$
- **C.** No perfect collinearity does not allow the independent variables to be correlated.
- **D.** In multiple regression analysis, we can have exact linear relationships among the independent
variables.
**Answer:** A. There is a perfect collinearity problem in the following regression equation: $$cons = \
beta_0 + \beta_1 inc + \beta_2 inc^2 + u$$
These answers provide insights into hypothesis testing, transformations, collinearity issues, and
effects of changing units in regression models.
Question 3
In the following equation, gdp refers to gross domestic product, and
FDI refers to foreign direct
investment.log(gdp)=2.65+0.527log(bankcredit)
+0.222FDIlog(gdp)=2.65+0.527log(bankcredit)+0.222FDI Which of the
following statements is then true?
A. If FDI increases by 1, gdp increases by approximately 0.222%, the
amount of bank credit remaining constant.
B. If FDI increases by 1%, gdp increases by approximately 22.2%, the
amount of bank credit remaining constant.
C. If FDI increases by 1, gdp increases by approximately 22.2%, the
amount of bank credit remaining constant.
D. If FDI increases by 1%, gdp increases by approximately
log(0.222)%, the amount of bank credit remaining constant.
Answer: C. If FDI increases by 1, gdp increases by approximately 22.2%, the
amount of bank credit remaining constant.
Question 4
Which of the following can cause OLS estimators to be biased?
A. Heteroskedasticity.
B. Omitting an important explanatory variable.
C. A sample correlation coefficient of .95 between two independent
variables both included in the model.
D. All of the above.
Answer: B. Omitting an important explanatory variable.
Question 5
Which of the following will NOT cause the usual OLS t statistics to
be invalid ("invalid" means not to have t distributions under H₀)?
i. Heteroskedasticity.
ii. Omitting an important explanatory variable.
iii. A sample correlation coefficient of .95 between two independent
variables both included in the model.
iv. None of the above.
Answer: i. Heteroskedasticity.
Question 6
Female is a binary variable taking on the value one for females and
the value zero for males. Our regression model is
log(wage)=(β0+δ0female)+(β1+δ1female)educ+u What does $\delta_0
< 0; \delta_1 < 0$ mean?
A. Women earn less than men at low levels of education, but the gap
narrows as education increases.
B. Women earn less than men at low levels of education, and the gap
increases as education increases.
C. Women earn more than men at low levels of education, but the gap
narrows as education increases.
D. Women earn more than men at low levels of education, and the gap
increases as education increases.
Answer: B. Women earn less than men at low levels of education, and the
gap increases as education increases.
Image 3
Question: Scatterplot Interpretation
Question: Consider a scatterplot showing a negative linear
relationship between X and Y; what are β^β^ and R2R2 ?
o A. β^=1β^=1 ; R2=1R2=1
o B. β^=−1β^=−1 ; R2=1R2=1
o C. β^=−1β^=−1 ; R2=−1R2=−1
Answer: B. β^=−1β^=−1 ; R2=1R2=1
Image 4
Question: Time Series Assumptions
Question: Which statement is correct regarding time series data?
o A. Like cross-sectional observations, we can assume most time
series observations are independently distributed.
o B. The OLS estimator in a time series regression is biased under
the first three Gauss-Markov assumptions.
o C. A trending variable cannot be used as the dependent variable
in multiple regression analysis.
Answer: C. A trending variable cannot be used as the dependent
variable in multiple regression analysis.
These answers provide insights into understanding regression models,
interpreting coefficients, and recognizing potential biases in statistical
analysis.
1. X1 Correlated to X2 and X3
Bias is large hence they are different
2. X1 Correlated to X2 and X3
Bias is small hence they are similar
3. Var B1 hat >> Var B1 so SE B1 hat >>
4. R1^2 will be close to zero X2 and X3 will have large partial effects on y
hence SSR reduces hence SE B1 hat<< Se B1