0% found this document useful (0 votes)
6 views

CH-3

Operational research chapter 3 ppt

Uploaded by

yeabsrabelesti82
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

CH-3

Operational research chapter 3 ppt

Uploaded by

yeabsrabelesti82
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 123

Chapter Three

Multiple Linear Regression


Analysis:
Estimation and Hypothesis
Testing
1.1 Introduction Multiple Regression Analysis

In the foregoing chapter we considered the simple


regression model where the dependent variable is
related to one explanatory variable.
In practice the situation is often more involved in the
sense that there exists more than one variable that
influences the dependent variable.
•The MLRM in its general form may be written as:
Yi =β1+β2X2i +β3X3i + ... +βkXki +ui .....................................Eq 3.1
•The subscript i denotes the ith observation.
Cont…
•For ease of exposition we will write Eq. (3.1) as:
Yi = βX +ui ............................................................................Eq 3.2
•Where βX is a short form for β1+β2X2i +β3X3i + ... +βkXki
•Equation (3.1), or its short form (3.2), is known as the
population or true model. It consists of two components:
(1) a deterministic component, βX, and
(2) a nonsystematic, or random component, Ui.
As shown below, βX can be interpreted as the conditional mean
of Yi, X) conditional upon the given X values.
Therefore, Eq. (3.2) states that an individual Y i value is equal to
the mean value of the population of which he or she is a member
plus or minus a random term.
Cont…
•For example, if Y represents family expenditure on food and X
represents family income. Eq. (3.2) states that the food
expenditure of an individual family is equal to the mean food
expenditure of all the families with the same level of income,
plus or minus a random component that may vary from
individual to individual and that may depend on several factors.
•In regression analysis our primary objective is to explain the
mean, or average, behavior of Y in relation to the regressors,
that is, how mean Y responds to changes in the values of the X
variables.
Cont…
•It should be emphasized that the causal relationship between Y and the
Xs, if any, should be based on the relevant theory.
•Each slope coefficient measures the (partial) rate of change in the mean
value of Y for a unit change in the value of a regressor, holding the
values of all other regressors constant, hence the adjective partial. How
many regressors are included in the model depends on the nature of the
problem and will vary from problem to problem.

•The error term ui is a catchall for all those variables that cannot be
introduced in the model for a variety of reasons. However, the average
influence of these variables on the regressand is assumed to be
negligible.
Cont…
 The nature of the Y variable
•It is generally assumed that Y is a random variable. It can be measured on four
different scales:
1. Ratio scale,
2. Interval scale,
3. Ordinal scale, and
4. Nominal scale.
 Ratio scale: A ratio scale variable has three properties:
(1) ratio of two variables,
(2) distance between two variables, and
(3) ordering of variables.
 On a ratio scale if, say, Y takes two values, Y 1 and Y2, the ratio and the distance
(Y2 - Y1) are meaningful quantities, as are comparisons or ordering such as Y 2 ≤
Y1 or Y2 ≥Y1. Most economic variables belong to this category. Thus we can talk
about whether GDP is greater this year than the last year, or whether the ratio of
GDP this year to the GDP last year is greater than or less than one.
Cont…
 Interval scale: Interval scale variables do not satisfy the first
property of ratio scale variables. For example, the distance
between two time periods, say, 2007 and 2000 (2007-2000) is
meaningful, but not the ratio 2007/2000. Fixed scale: the difference
between 70Kg and 80Kg is the same interval as that between 80Kg and
90Kg and so on.
 Ordinal scale: Variables on this scale satisfy the ordering
property of the ratio scale, but not the other two properties. For
examples, grading systems, such as A, B, C, or income
classification, such as low income, middle income, and high
income, are ordinal scale variables, but quantities such as grade
A divided by grade B are not meaningful.
 For example, suppose we didn’t ask for an exact weight but for which
group of weights a person belonged to such as 50-60Kg, 60Kg, 60-70Kg,
80Kg, over 90Kg.---ordinal
Cont…
 Nominal scale: Do not have any of the features of
the ratio scale variables.
 These variables cannot be rank ordered at all.
 Variables such as gender, marital status, and
religion are nominal scale variables. …. Hot, Sweet,
salty
 Such variables are often called dummy or categorical
variables. They are often "quantified" as 1 or 0, 1
indicating the presence of an attribute and 0
indicating its absence. Thus, we can" quantify"
gender as male = 1 and female = 0, or vice versa.
Cont…
 The nature of X variables or regressors

•The regressors can also be measured on anyone of the scales


we have just discussed, although in many applications the
regressors are measured on ratio or interval scales. In the
standard, or classical linear regression model (CLRM),
which we will discuss shortly, it is assumed that the regressors
are nonrandom, in the sense that their values are fixed in
repeated sampling. As a result, our regression analysis is
conditional, that is, conditional on the given values of the
regressors.
Cont…
 The nature of the stochastic error term, u
•The stochastic error term is a catchall that includes all those variables
that cannot be readily quantified. It may represent variables that
cannot be included in the model for lack of data availability, or errors
of measurement in the data, or intrinsic randomness in human
behavior. Whatever the source of the random term u, it is assumed that
the average effect of the error term on the regressand is marginal at
best/negligible. However, we will have more to say about this shortly.
 The nature of regression coefficients, the βs
•In the CLRM it is assumed that the regression coefficients are some
fixed numbers and not random, even though we do not know their
actual values. It is the objective of regression analysis to estimate their
values on the basis of sample data. A branch of statistics known as
Bayesian statistics treats the regression coefficients as random.
Cont…
 The meaning of linear regression
•For our purpose the term "linear" in the linear regression
model refers to linearity in the regression coefficients,
the βs, and not linearity in the Y and X variables.
•For instance, the Y and X variables can be logarithmic
(e.g. lnX2), or reciprocal () or raised to a power (e.g.
X23), where ln stands for natural logarithm, that is,
logarithm to the base e. Linearity in the β coefficients
means that they are not raised to any power (e.g. β22) or
are divided by other coefficients (e.g.) or transformed,
such as lnβ4.
1.2 Multivariate Case of CLRM

1.3 Estimation of the Linear Regression Model


•Having obtained the data, the important question is: how do we
estimate the LRM given in Eq. (3.l)? How then do we estimate
Eq. (3.l)? The answer follows. The Method of Ordinary Least
Squares (OLS) a commonly used method to estimate the
regression coefficients is the method of ordinary least squares
(OLS). To explain this method, we rewrite Eq. (3.1) as follows:
ui = Yi – (β1+β2X2i+β3X3i+...+βkXki)......................... Eq. 3.3
= Yi –βX
•Equation (3.3) states that the error term is the difference
between the actual Y value and the Y value obtained from the
regression model.
Cont…
•One way to obtain estimates of the β coefficients would be to
make the sum of the error term Ui (=) as small as possible,
ideally zero. For theoretical and practical reasons, the method
of OLS does not minimize the sum of the error term, but
minimizes the sum of the squared error term as follows:
) 2... Eq. 3.4
•Where the sum is taken over all observations. We call is the
error sum of squares (ESS).
• Now in Eq. (3.4) we know the sample values of Yi and the Xs,
but we do not know the values of the β coefficients.
Therefore, to minimize the error sum of squares (ESS) we
have to find those values of the β coefficients that will make
ESS as small as possible.
Cont…
•Obviously, ESS is now a function of the β coefficients. The actual
minimization of ESS involves calculus techniques. We take the
(partial) derivative of ESS with respect to each β coefficient,
equate the resulting equations to zero, and solve these equations
simultaneously to obtain the estimates of the k regression
coefficients. Since we have k regression coefficients, we will have
to solve k equations simultaneously. We need not solve these
equations here, for software packages do that routinely. We will
denote the estimated β coefficients with a lower case b, and
therefore the estimating regression can be written as:
•Yi = b1+ b2X2i + b3X3i+...+ bkXki +ei ............................. Eq. 3.5
•which may be called the sample regression model, the
counterpart of the population model given in Eq. (3.1).
Cont…
•Letting
•Ŷi = b1+ b2X2i + b3X3i+...+ bkXki = bX............................. Eq. 3.6
•we can write Eq. (3.5) as
•Yi = Ŷi + ei = bX+ ei................................................................... Eq. 3.7
• Where Ŷi is an estimator of βX. Just as βX (i.e. E(Yi | X)) can
be interpreted as the population regression function (PRF),
we can interpret bX as the sample regression function
(SRF). We call the b coefficients the estimators of the β
coefficients and ei, called the residual, an estimator of the
error term ui. An estimator is a formula or rule that tells us
how we go about finding the values of the regression
parameters. A numerical value taken by an estimator in a
sample is known as an estimate.
Cont…

•Notice carefully that the estimators, the bs, are random variables, for their values
will change from sample to sample. On the other hand, the (population)
regression coefficients or parameters, the βs, are fixed numbers, although we do
not what they are. On the basis of the sample we try to obtain the best guesses of
them.
•The distinction between population and sample regression function is important,
for in most applications we may not be able to study the whole population for a
variety of reasons, including cost considerations. It is remarkable that in
Presidential elections in the USA, polls based on a random sample of, say, 1,000
people often come close to predicting the actual votes in the elections. In
regression analysis our objective is to draw inferences about the population
regression function on the basis of the sample regression function, for in reality
we rarely observe the population regression function; we only guess what it might
be. This is important because our ultimate objective is to find out what the true
values of the Bs may be.
•For this we need a bit more theory, which is provided by the classical linear
1.4. The Classical Linear Regression Model (CLRM )

•The CLRM makes the following assumptions:

•A-1: The regression model is linear in the parameters as in Eq. (3.1);


it may or may not be linear in the variables Y and the Xs.
•A-2: The regressors are assumed to be fixed or nonstochastic in the
sense that their values are fixed in repeated sampling. This
assumption may not be appropriate for all economic data, if X and u
are independently distributed the results based on the classical
assumption discussed below hold true provided our analysis is
conditional on the particular X values drawn in the sample. However,
if X and u are uncorrelated, the classical results hold true
asymptotically (i.e. in large samples.)
Cont…
•A-3: Given the values of the X variables, the expected, or mean, value of the
error term is zero. That is,
•E(ui|X) = 0................................................................... Eq. 3.8

•Where, for brevity of expression, X (the bold X) stands for all X variables in the
model. In words, the conditional expectation of the error term, given the values
of the X variables, is zero. Since the error term represents the influence of factors
that may be essentially random, it makes sense to assume that their mean or
average value is zero.
•As a result of this critical assumption, we can write (3.2) as:
• E(Yi | X) = βX + E(ui | X).................................................... Eq. 3.9
• = βX
• which can be interpreted as the model for mean or average value of Yi
conditional on the X values. This is the population (mean) regression function
Cont…
•In regression analysis our main objective is to estimate this function. If
there is only one X variable, you can visualize it as the (population)
regression line. If there is more than one X variable, you will have to
imagine it to be a curve in a multi-dimensional graph. The estimated PRF,
the sample counterpart of Eq. (3.9), is denoted by Ŷi = bx. That is, Ŷi = bx

is an estimator of E (Yi | X).

•A-4: The variance of each ui, given the values of X, is constant, or


homoscedastic (homo means equal and scedastic means variance). That
is,
•var(ui | X) = σ2……………………………………………………. Eq. 3.10

•Note: There is no subscript on σ2


Cont…
•A-5: There is no correlation between two error terms. That is, there is no
autocorrelation. Symbolically,
•Cov (ui, uj| X) = 0 i≠j……………………………………………..Eq. 3.11

•where Cov stands for covariance and i and j are two different error terms. Of
course, if i = j, Eq. (3.11) will give the variance of ui given in Eq. (3.10).
•A-6: There are no perfect linear relationships among the X variables. This is the
assumption of no multicollinearity. For example, relationships like X 5 = 2X3 +

4X4 are ruled out.

•A-7: The regression model is correctly specified. Alternatively, there is no


specification bias or specification error in the model used in empirical analysis.
It is implicitly assumed that the number of observations, n, is greater than the
number of parameters estimated.
Cont…
•Although it is not a part of the CLRM, it is assumed that the error term follows

•A-8: ui ∼ N (0, σ2) ……………………………………………..Eq. 3.12


the normal distribution with zero mean and (constant) variance. Symbolically,

•On the basis of Assumptions A-1 to A-7, it can be shown that the method of
ordinary least squares (OLS), the method most popularly used in practice,
provides estimators of the parameters of the PRF that have several desirable
statistical properties, such as:
1. The estimators are linear, that is, they are linear functions of the dependent
variable Y. Linear estimators are easy to understand and deal with compared to
nonlinear estimators.
2. The estimators are unbiased, that is, in repeated applications of the method, on
average, the estimators are equal to their true values.
3. In the class of linear unbiased estimators, OLS estimators have minimum
variance. As a result, the true parameter values can be estimated with least
possible uncertainty; an unbiased estimator with the least variance is called an
efficient estimator.
Cont…
•In short, under the assumed conditions, OLS estimators are
BLUE: best linear unbiased estimators. This is the essence of
the well-known Gauss-Markov theorem, which provides a
theoretical justification for the method of least squares.
•With the added Assumption A-8, it can be shown that the OLS
estimators are themselves normally distributed. As a result, we
can draw inferences about the true values of the population
regression coefficients and test statistical hypotheses. With the
added assumption of normality, the OLS estimators are best
unbiased estimators (BUE) in the entire class of unbiased
estimators, whether linear or not. With normality assumption,
CLRM is known as the normal classical linear regression
model (NCLRM).
Cont…
•Before proceeding further, several questions can be raised.
•How realistic are these assumptions?
•What happens if one or more of these assumptions are not
satisfied?
•In that case, are there alternative estimators?
•Why do we confine to linear estimators only?
•All these questions will be answered as we move forward
in other chapters of this course. But it maybe added that in
the beginning of any field of enquiry we need some
building blocks. The CLRM provides one such building
block.
1.5. Variances and Standard Errors of OLS estimators
•As noted before, the OLS estimators, the bs, are random variables, for
their values will vary from sample to sample. Therefore we need a
measure of their variability. In statistics the variability of a random
variable is measured by its variance σ2, or its square root, the standard
deviation σ. In the regression context the standard deviation of an
estimator is called the standard error, but conceptually it is similar to
standard deviation. For the LRM, an estimate of the variance of the error
term ui, σ2, is obtained as
σˆ2 = ...................................................... Eq.3.13
• that is, the residual sum of squares (RSS) divided by (n - k), which is
called the degrees of freedom (df), n being the sample size and k
being the number of regression parameters estimated, an intercept
and (k-1) slope coefficients(βs). σˆ is called the standard error of the
regression (SER) or root mean square. It is simply the standard
deviation of the Y values about the estimated regression line and is
often used as a summary measure of "goodness of fit" of the
estimated regression line.
Cont…
•Note that a "hat" or caret over a parameter denotes an estimator of that
parameter.
•It is important to bear in mind that the standard deviation of Y values,
denoted by SY, is expected to be greater than SER, unless the regression
model does not explain much variation in the Y values. If that is the case,
there is no point in doing regression analysis, for in that case the X
regressors have no impact on Y.
•Then the best estimate of Y is simply its mean value, Ȳ. Of course we use
a regression model in the belief that the X variables included in the model
will help us to better explain the behavior of Y that Ȳ alone cannot.
•Given the assumptions of the CLRM, we can easily derive the variances
and standard errors of the b coefficients, but we will not present the actual
formulas to compute them because statistical packages produce them
easily, as we will show with an example.
Probability distributions of OLS estimators

•If we invoke Assumption A-8, ui ~ N(0, σ2), it can be shown that


each OLS estimator of regression coefficients is itself normally
distributed with mean value equal to its corresponding population
value and variance that involves σ2 and the values of the X
variables. In practice, σ2 is replaced by its estimator σˆ2 given in Eq.
(3.13).
•In practice, therefore, we use the t probability distribution rather
than the normal distribution for statistical inference (i.e.
hypothesis testing). But remember that as the sample size
increases, the t distribution approaches the normal distribution.
•The knowledge that the OLS estimators are normally distributed is
valuable in establishing confidence intervals and drawing inferences
about the true values of the parameters. How this is done will be
1.6. Testing hypotheses about the true or population regression coefficients

•Suppose we want to test the hypothesis that the (population) regression


coefficient βk = 0. To test this hypothesis, we use the t test of statistics,
which is:
•where se(bk) is the standard error of bk. This t value has (n-k) degrees of
freedom (df); recall that associated with a t statistic is its degrees of
freedom.
•In the k variable regression, df is equal to the number of observations
minus the number of coefficients estimated.
•Once the t statistic is computed, we can look up the t table to find out the
probability of obtaining such a t value or greater.
•If the probability of obtaining the computed t value is small, say 5% or
less, we can reject the null hypothesis that βk = 0.
•In that case we say that the estimated t value is statistically significant,
Cont…
•The commonly chosen probability values are 10%, 5%, and 1%.
These values are known as the levels of significance (usually denoted
by the Greek letter α (alpha) and also known as a Type I error), hence
the name t tests of significance.
•We need not do this labor manually as statistical packages provide the
necessary output. These software packages not only give the estimated
t values, but also their p (probability) values, which are the exact level
of significance of the t values.
•If a p value is computed, there is no need to use arbitrarily chosen α
values. In practice, a low p value suggests that the estimated
coefficient is statistically significant.
•This would suggest that the particular variable under consideration
has a statistically significant impact on the regressand, holding all
other regressor values constant.
Cont…
•Some software packages, such as Excel and Stata, also compute
confidence intervals for individual regression coefficients usually a 95%
confidence interval (CI). Such intervals provide a range of values that has a
95% chance of including the true population value. 95% (or similar measure)
is called the confidence coefficient (CC), which is simply one minus the value
of the level of significance, α, times 100 - that is, CC = 100(1 - α). The (1- α)
confidence interval for any population coefficient βk is established as follows:
•Pr {bk ± tα/2 se (bk)} = (1-α)............................................ Eq.3.14
•where Pr stands for probability and where tα/2 is the value of the t statistic obtained
from the t distribution (table) for α/2 level of significance with appropriate degrees of
freedom, and se(bk) is the standard error of bk.
•In other words, we subtract or add tα/2 times the standard error of bk to bk to obtain
the (1-α) confidence interval for true βk.
•{bk - tα/2se(bk)} is called the lower limit and {bk + tα/2se(bk)} is called the upper limit
of the confidence interval. This is called the two-sided confidence interval.
Cont…
•Confidence intervals thus obtained need to be interpreted carefully. In
particular note the following:
1. The interval in Eq. (3.14) does not say that the probability of the true βk
lying between the given limits is (1- α). Although we do not know what
the actual value of βk is, it is assumed to be some fixed number.
2. The interval in Eq. (3.14) is a random interval- that is; it will vary from
sample to sample because it is based on bk, which is random.
3. Since the confidence interval is random, a probability statement such as
Eq. (3.14) should be understood in the long-run sense - that is in
repeated sampling: if, in repeated sampling, confidence intervals like Eq.
(3.14) are constructed a large number of times on the (1- α) probability
basis, then in the long run, on average, such intervals will enclose in (1-
α) of the cases the true βk.
- Any single interval based on a single sample may or may not contain the
true βk.
Cont…
• As noted, the interval in Eq. (3.14) is random. But once we
have a specific sample and once we obtain a specific
numerical value of βk the interval based on this value is not
random but is fixed. So we cannot say that the probability is
(1- α) that the given fixed interval includes the true parameter.
In this case βk either lies in this interval or it does not. Therefore
the probability is 1 or 0. Suppose we want to test the hypothesis
that all the slope coefficients in Eq. (3.1)/Y =β +β X +β X + ... +β X +u are
i 1 2 2i 3 3i k ki i

simultaneously equal to zero.


• This is to say that all regressors in the model have no impact
on the dependent variable. In short, the model is not helpful to
explain the behavior of the regressand.
• This is known in the literature as the overall significance of
the regression.
Cont…
1. This hypothesis is tested by the F test of statistics. Verbally the F statistic is
defined as: ................................................................. Eq.3.15
•where ESS is the part of the variation in the dependent variable Y explained by
the model and RSS is the part of the variation in Y not explained by the model.
The sum of these is the total variation in Y, call the total sum of squares (TSS).
• As Eq. (3.15) shows, the F statistic has two sets of degrees of freedom, one
for the numerator and one for the denominator. The denominator df is
always (n-k) - the number of observations minus the number of parameters
estimated, including the intercept - and the numerator df is always (k-1) -
that is, the total number of regressors in the model excluding the constant
term, which is the total number of slope coefficients estimated. The
computed F value can be tested for its significance by comparing it with the
F value from the F tables. If the computed F value is greater than its critical
or benchmark F value at the chosen level of α, we can reject the null
hypothesis and conclude that at least one regressor is statistically
significant. Like the p value of the t statistic, most software packages also
present the p value of the F statistic.
Cont…
•All this information can be gleaned from the Analysis
of Variance (ANOVA) table that usually accompanies
regression output; an example of this is presented
shortly.
•It is very important to note that the use of the t and F
tests is explicitly based on the assumption that the error
term, ui, is normally distributed, as in Assumption A-8.
If this assumption is not tenable/acceptable, the t and F
testing procedure is invalid in small samples, although
they can still be used if the sample is sufficiently large
(technically infinite), a point to which we will return in
on specification errors.
1.7 Partial Correlation Coefficients and their Interpretation

• In probability theory and statistics, partial


correlation measures the degree of association
between two random variables, with the effect of a
set of controlling random variables removed.
• Partial correlation measures the strength of a
relationship between two variables, while controlling
for the effect of one or more other variables.
• For example, you might want to see if there is a
correlation between amount of food eaten and blood
pressure, while controlling for weight or amount of
exercise.
1.8. Coefficients of Multiple Determination

• (Symbol: R2) a numerical index that reflects the degree to


which variation in a response or outcome variable (e.g.,
workers' incomes) is accounted for by its relationship with
two or more predictor variables (e.g., age, gender, years of
education).
• More specifically, it is a measure of the percentage of
variance in a dependent variable that is accounted for by its
relationship with a weighted linear combination of a set
of independent variables.
• Obtained by multiplying the value of the multiple correlation
coefficient (R) by itself, the coefficient of multiple
determination ranges in value from 0 to 1.
• Low values indicate that the outcome is relatively unrelated
to the predictors, whereas values closer to 1 indicate that the
Cont…
For example, if R = 0.40, then the coefficient of multiple
determination is 0.402 = 0.16 and interpreted to mean 16% of the
variance in outcome is explainable by the set of predictors. Also
called multiple correlation coefficient squared; squared multiple
correlation coefficient.
Hence, the coefficient of determination, denoted by R 2, is an overall
measure of goodness of fit of the estimated regression line (or plane,
if more than one regressor is involved), that is, it gives the
proportion or percentage of the total variation in the dependent
variable Y (TSS) that is explained by all the regressors. To see how
R2 is computed, let us define:
Total Sum of Squares (TSS) = 2
Explained Sum of Squares (ESS) = 2
Residual Sum of Squares (RSS) =
Cont…
• Now it can be shown that
• .................................................. Eq.3.16
• This equation states that the total variation of the actual Y values
about their sample mean (TSS) is equal to sum of the total
variation of the estimated Y values about their mean value (which
is the same as and the sum of residuals squared. In words,
• TSS = ESS + RSS.................................................................Eq.3.17
• Now we define R2 as:
R2 ....................................................Eq.3.18
R2.................................................Eq.3.19
• One disadvantage of R2 is that it is an increasing function of the
number of regressors. That is, if you add a variable to model, the
R2 values increases. So sometimes researchers pay the game of
"maximizing" R2, meaning the higher the R2, the better the model.
Cont…
To avoid this temptation, it is suggested that we use an R 2 measure that
explicitly takes into account the number of regressors included in the
model. Such an R2 is called an adjusted R2, denoted as R̄ 2 (R-bar
squared), and is computed from the (unadjusted) R2 as follows:
R̄ 2 = 1-(1- R2) ...........................................................Eq.3.20
The term "adjusted" means adjusted for the degrees of freedom, which
depend on the number of regressors (k) in the model.
Notice two features of R̄ 2:
1. If k > 1, R̄ 2 < R2, that is, as the number of regressors in the model
increases, the adjusted R2 becomes increasingly smaller than the unadjusted
R2. Thus, R̄ 2 imposes a "penalty" for adding more regressors.
2. The unadjusted R2 is always positive, but the adjusted R2 can sometimes be
negative.
Adjusted R2 is often used to compare two or more regression models that have
the same dependent variable. Of course, there are other measures of
Cont…
• Adjusted R Squared or Modified R^2 determines the
extent of the variance of the dependent variable,
which the independent variable can explain.
• The specialty of the modified R^2 is that it does not
consider the impact of all independent variables but
only those which impact the variation of the
dependent variable.
• The higher the adjusted R^2 the better the regression
equation as it implies that the independent variable
chosen to determine the dependent variable can
explain the variation in the dependent variable.
• The value of the adjusted R square will go up
with the addition of an independent variable
only when the variation of the
independent variable impacts the variation in
the dependent variable.
3.9 Introduction to Multivariate Normal Distribution

• A multivariate normal distribution is a vector in multiple normally


distributed variables, such that any linear combination of the
variables is also normally distributed. It is mostly useful in
extending the central limit theorem to multiple variables, but also
has applications to Bayesian inference and thus machine learning,
where the multivariate normal distribution is used to approximate
the features of some characteristics; for instance, in detecting faces
in pictures.
• The multivariate normal distribution is useful in analyzing the
relationship between multiple normally distributed variables, and
thus has heavy application to biology and economics where the
relationship between approximately-normal variables is of great
interest.
• Having covered the basic theory underlying the CLRM, we now
provide a concrete example illustrating the various points discussed
above. This example is a prototype of multiple regression models.
3.10 Point and Interval Forecasting Using Multiple Linear Regression

• An Illustrative Example: the Determinants of Hourly Wages


• For illustrative purpose we use the Population Survey (PS), undertaken by the
U.S. Census Bureau, periodically conducts a variety of surveys on a variety of
topics. In this example we look at a cross-section of 1,289 persons interviewed
in March 1995 to study the factors that determine hourly wage (in dollars) in
this sample. Keep in mind that these 1,289 observations are a sample from a
much bigger population
• The variables used in the analysis are defined as follows:
• Wage: Hourly wage in dollars, which is the dependent variable.
• The explanatory variables, or regressors, are as follows:
• Female: Gender, coded 1 for female, 0 for male
• Non-white: Race, coded 1 for non white workers, 0 for white workers
• Union: Union status, coded 1 if in a union job, 0 otherwise
• Education: Education (in years)
• Exper: Potential work experience (in years), defined as age minus years of
schooling minus 6. (It is assumed that schooling starts at age 6).
Cont…
• Although many other regressors could be added to the
model, for now we will continue with these variables to
illustrate a prototype multiple regression model. Note that
wage, education, and work experience are ratio scale
variables and female, non white, and union are nominal
scale variables, which are coded as dummy variables. Also
note that the data here are cross-section data. In this course
we will use the Eviews and Stata software packages to
estimate the regression models.
• Although for a given data set they give similar results, there
are some variations in the manner in which they present
them. To familiarize the reader with these packages, in this
chapter we will present results based on both these
packages. In later chapters we may use one or both of these
packages, but mostly Eviews because of its easy accessibility.
Cont…
Using Eviews 6, we obtained the results in Table 1.2.
Table 1.2 Wage regressions.
Dependent Variable: WAGE
Method: Least Squares
Sample: 1 1289
Included observations: 1289
Coefficient Std. Error t-Statistic Prob.
-7.183338 1.015788 -7.071691 0.0000
FEMALE -3.074875 0.364616 -8.433184 0.0000
NONWHITE -1.565313 0.509188 -3.074139 0.0022
UNION 1.095976 0.506078 2.165626 0.0305
EDUCATION 1.370301 0.065904 20.79231 0.0000
EXPER 0.166607 0.016048 10.38205 0.0000
R-squared 0.323339 Mean dependent var 12.36585
Adjusted R-squared 0.320702 S.D. dependent var 7.896350
S.E. of regression 6.508137 Akaike info criterion 6.588627
Sum squared resid 54342.54 Schwarz criterion 6.612653
Log likelihood -4240.370 Durbin-Watson stat 1.897513
F-statistic 122.6149 Prob(F-statistic) 0.000000
Cont…
• The format of Eviews is highly standardized.
• The first part of the table shows the name of the
dependent variable, the estimation method (least
squares), the number of observations, and the sample
range. Sometimes we may not use all the sample
observations, and save some observations, called
holdover observations, for forecasting purposes.
• The second part of the table gives the names of the
explanatory variables, their estimated coefficients, the
standard errors of the coefficients, the t statistic of
each coefficient, which is simply the ratio of estimated
coefficient divided by its standard error, and the p
value, or the exact level of significance of the t statistic.
Cont…
• For each coefficient, the null hypothesis is that the population value of that
coefficient (the big β) is zero, that is, the particular regressor has no
influence on the regressand, after holding the other regressor values
constant.
• The smaller the p value, the greater the evidence against the null
hypothesis.
• For example, take the variable experience, Exper. Its coefficient value of
about 0.17 has a t value of about 10.38. If the hypothesis is that the
coefficient value of this variable in the PRF is zero, we can soundly reject
that hypothesis because the p value of obtaining such a t value or higher is
practically zero.
• In this situation we say that the coefficient of the experience variable is
highly statistically significant, meaning that it is highly significantly
different from zero. To put it differently, we can say work experience is an
important determinant of hourly wage, after allowing for the influence of
the other variables in the model - an unsurprising finding.
• If we choose a p value of 5%, Table 1.2 shows that each of the estimated
coefficients is statistically significantly different from zero, that is, each is an
Cont…
• The third part of Table 1.2 gives some descriptive statistics.
• The R2 (the coefficient of determination) value of =0.32
means about 32% of the variation in hourly wages is
explained by the variation in the five explanatory variables.
• It might seem that this R2 value is rather low, but keep in
mind that we have 1,289 observations with varying values
of the regressand and regressors.
• In such a diverse setting the R2 values are typically low, and
they are often low when individual-level data are analyzed.
• This part also gives the adjusted R2 value, which is slightly
lower than the unadjusted R2 values, as noted before.
Since we are not comparing our wage model with any
other model, the adjusted R2 is not of particular
importance.
Cont…
• If we want to test the hypothesis that all the slope
coefficients in the wage regression are
simultaneously equal to zero, we use the F test
discussed previously. In the present example this F
value is = 123. This null hypothesis can be rejected if
the p value of the estimated F value is very low.
• In our example, the p value is practically zero,
suggesting that we can strongly reject the
hypothesis that collectively all the explanatory
variables have no impact on the dependent
variable, hourly wages here.
• At least one regressor has significant impact on the
regressand.
Cont…
• The table also lists several other statistics, such as Akaike and
Schwarz information criteria, which are used to choose
among competing models, the Durbin - Watson statistic,
which is a measure of correlation in the error term, and the
log likelihood statistic, which is useful if we use the ML
method. We will discuss the use of these statistics as we
move along.
• Although Eviews does not do so, other software packages
present a table known as the Analysis of variance (ANOVA)
table, but this table can be easily derived from the
information provided in the third part of Table 1.2.
• However, Stata produces not only the coefficients, their
standard errors, and the aforementioned information, but
also the ANOVA table. It also gives the 95% confidence
interval for each estimated coefficient, as shown in Table 1.3.
Cont…
Table 1.3 Stata output of the wage function.
w Coef. Std. Err. t P>ltl [95% Conf. Interval]
female -3.074875 .3646162 -8.43 0.000 -3.790185 -2.359566
nonwhite -1.565313 .5091875 -3.07 0.002 -2.564245 -.5663817
union 1.095976 .5060781 2.17 0.031 .1031443 2.088807
education 1.370301 .0659042 20.79 0.000 1.241009 1.499593
experience .1666065 .0160476 10.38 0.000 .1351242 .1980889
cons -7.183338 1.015788 -7.07 0.000 -9.176126 -5.190551
Note: /t/ means the absolute t value because t can be positive or negative.
Cont…
• As you can see, there is not much difference
between Eviews and Stata in the estimates of the
regression coefficients. A unique feature of Stata is
that it gives the 95% confidence interval for each
coefficient, computed from Eq. (3.14).
• Consider, for example, the education variable.
Although the single best estimate of the true
education coefficient is 1.3703, the 95% confidence
interval is (1.2410 to 1.4995). Therefore, we can
say that we are 95% confident that the impact of
an additional year of schooling on hourly earnings
is at least $1.24 and at most $ 1.49, ceteris paribus
(holding other things constant).
Cont…
• Impact on mean wage of a unit change in the value of a
regressor.
• The female coefficient of = -3.07 means, holding all other variables
constant, that the average female hourly wage is lower than the
average male hourly wage by about 3 dollars.
• Similarly, ceteris paribus, the average hourly wages of a nonwhite
worker is lower by about $1.56 than a white worker's wage.
• The education coefficient suggests that the average hourly wages
increases by about $1.37 for every additional year of education,
ceteris paribus.
• Similarly, for every additional year of work experience, the average
hourly wage goes up by about 17 cents, ceteris paribus.
• Test of the Overall Significance of the Regression
• To test the hypothesis that all slope coefficients are simultaneously
equal to zero (i.e. all the regressors have zero impact on hourly
wage), Stata produced Table 1.4.
Cont…
Table 1.4. ANOVA Table
Source SS df Ms Number of obs = 1289
Model 25967.2805 5 5193.45611 F(5,1283) = 122.61
Residual 54342.5442 1283 42.3558411 Prob>F = 0.0000
Total 80309.8247 1288 62.3523484 R-square = 0.3233
Adj R-square = 0.3207
Root MSE = 6.5081
Source of Sum of Degrees of Mean F
variation squares /SS/ freedom /df/ square /MS/
Model ESS K-1 MsE= ESS/K-1 MsE/MsR
Residual RSS n-K MsR= RSS/n-k
Total SST=ESS+RSS n-1
Cont…
• The ANOVA gives the breakdown of the total sum of squares
(TSS) into two components; one explained by the model,
called the explained sum of squares (ESS) - that is the sum of
squares explained by the chosen model, and the other not
explained by the model, called the residual sum of squares
(RSS), terms we have encountered before.
• Now each sum of squares has its associated degrees of
freedom. The TSS has (n-1) df, for we lose one df in
computing the mean value of the dependent variable Y from
the same data.
• ESS has (k -1) degrees of freedom, the k regressors excluding
the intercept term, and
• RSS has (n - k) degrees of freedom, which is equal to the
number of observations, n, minus the number of parameters
estimated (including the intercept).
Cont…
• Now if you divide the ESS by its df and divide RSS by its df,
you obtain the mean sums of squares (MS) of ESS and RSS.
• And if you take the ratio of the two MS i.e. (F=MSe/MSr),
you obtain the F value. It can be shown that under the null
hypothesis all slope coefficients are simultaneously equal to
zero, and assuming the error term ui is normally distributed,
the computed F value follows the F distribution with
numerator df of (k-1) and denominator df of (n - k).
• In our example, this F value is about 123, which is the same
as that obtained from Eviews output.
• As the table shows, the probability of obtaining such an F or
greater is practically zero, suggesting that the null hypothesis
can be rejected. There is at least one regressor that is
significantly different from zero.
Cont…
• If the ANOVA table is not available, we can test the null hypothesis that all
slope coefficients are simultaneously equal to zero, that is,
β2 = β3 =... = βk = 0, by using an interesting relationship between F and R 2,
which is as follows:
• ......................................................................Eq.3.21
• Since the R2 value is produced by all software packages, it may be easier to
use Eq. (3.21) to test the null hypothesis. For our example the computed
R2 is 0.3233. Using this value, we obtain:
• ............................................ Eq.3.22
• This value is about the same as that shown in the Stata ANOVA table.
• It should be emphasized that the formula given in Eq. (3.21) is to be used
only if we want to test that all explanatory variables have zero impact on
the dependent variable.
• As noted before, R2 is the proportion of the variation in the dependent
variable explained by the regressor included in the model. This can be
verified if you take the ratio of ESS to TSS from the ANOVA table (=
25967.2805/80309.8247) = R2 = 0.3233.
End of Chapter Three
3.11 Relaxing the CLRM basic assumptions
3.11.1 Violations of the Assumptions of Classical
Regression Model
3.11.1.1 Multicollinearity
• One of the assumptions of the classical linear regression model (CLRM)
is that there is no exact linear relationship among the regressors. If
there are one or more such relationships among the regressors we call it
multicollinearity or collinearity, for short. At the outset, we must
distinguish between perfect collinearity and imperfect collinearity. To
explain, consider the k-variable linear regression model:
• Yi = β1+β2X2i+...+βkXki +ui ................................. eq.5.1
• If, for example, X2i +3X3i =1 we have a case of perfect collinearity for X2i =
1-3X3i. Therefore, if we were to include both X2i and X3i in the same
regression model, we will have perfect collinearity, that is, a perfect
linear relationship between the two variables. In situations like this we
cannot even estimate the regression coefficients, let alone perform any
kind of statistical inference.
Cont…
• On the other hand, if we have X2i + 3X3i + Vi = 1, where
Vi is a random error term, we have the case of
imperfect collinearity, for X2i = 1-3X3i-Vi. Therefore, in
this case there is no perfect linear relationship
between the two variables; so to speak, the presence
of the error term Vi dilutes the perfect relationship
between these variables.
• In practice, exact linear relationship(s) among
regressors is a rarity, but in many applications the
regressors may be highly collinear. This case may be
called imperfect collinearity or near-collinearity.
Therefore, in this section focus our attention on
imperfect collinearity.
A. Consequences of Imperfect Collinearity

1. OLS estimators are still BLUE, but they have large variances and
covariances, making precise estimation difficult.
2. As a result, the confidence intervals tend to be wider. Therefore, we
may not reject the "zero null hypothesis" (i.e. the true population
coefficient is zero).
3. Because of (1), the t ratios of one or more coefficients tend to be
statistically insignificant.
4. Even though some regression coefficients are statistically
insignificant, the R2 value may be very high.
5. The OLS estimators and their standard errors can be sensitive to
small changes in the data.
6. Adding a collinear variable to the chosen regression model can alter
the coefficient values of the other variables in the model.
In short, when regressors are collinear, statistical inference becomes
shaky, especially so if there is near-collinearity. This should not be
surprising, because if two variables are highly collinear it is very difficult
to isolate the impact of each variable separately on the regressand.
B. Detection of multicollinearity

• There is no unique test of multicollinearity. Some of the diagnostics


discussed in the literature are as follows.
• 1 High R2 but few significant t ratios. But this should not be
surprising in cross-sectional data with several diverse observations.
However, quite a few t ratios are statistically insignificant, perhaps
due to collinearity among some regressors.
• 2 High pair-wise correlations among explanatory variables or
regressors. Recall that the sample correlation coefficient between
variables Y and X is defined as:
• rxy = ........................ eq.5.2
• where the variables are defined as deviations from their mean
values (e.g. yi = Yi -Ȳ). We will not produce all these correlations.
Most of the correlation coefficients are not particularly high, but
some are in excess. It is believed that high pair wise correlations
between regressors are a sign of collinearity. Therefore one should
drop highly correlated regressors.
Cont…
3. Partial correlation coefficients: To hold the other variables constant, we
have to compute partial correlation coefficients. Suppose we have three
variables Xl, X2, and X3. Then we will have three pair wise correlations, r 12,
r13, and r23 and three partial correlations, r12.3, r13.2, and r23.1; r23.1' for
example, means the correlation between variables X2 and X3, holding the
value of variable Xl constant. It is quite possible that the correlation
between X2 and X3(= r23) is high, say, 0.85. But this correlation does not
take into account the presence of the third variable Xl.
• If the variable Xl influences both X2 and X3, the high correlation
between the latter two may in fact be due to the common influence of
Xl on both these variables. The partial correlation r 23.1 computes the net
correlation between X2 and X3 after removing the influence of Xl. In
that case it is quite possible that the high observed correlation of 0.85
between X2 and X3 may be reduced to, say, 0.35.
• However, there is no guarantee that the partial correlations will provide
an infallible guide to multicollinearity. Software packages can compute
partial correlations for a group of variables with simple instructions.
Cont…
• 4. Auxiliary regressions: To find out which of the regressors
are highly collinear with the other regressors included in the
model, we can regress each regressor on the remaining
regressors and obtain the auxiliary regressions result. If we
have 15 regressors, there will be 15 auxiliary regressions.
• We can test the overall significance of each regression by
the F test. The null hypothesis here is that all the regressor
coefficients in the auxiliary regression are zero. If we reject
this hypothesis for one or more of the auxiliary regressions,
we can conclude that the auxiliary regressions with
significant F values are collinear with the other variables in
the model.
• Of course, if we have several regressors, as in our example,
calculating several auxiliary regressors in practice will be
Cont…
• 5. The variance inflation (VIF) and tolerance (TOL) factors.
Calculating the variance inflation factor (VIF) for every independent variable is
another way to check for multicollinearity. VIF measures the linear association
between an independent variable and all the other independent variables. A
VIF for any given independent variable is calculated by ............eq.5.3
Where Rk2 is the R-squared value obtained by regressing independent variable
Xk on all the other independent variables in the model. Most econometric
software programs have a command that you can execute after estimating a
regression to obtain the VIFs for each independent variable. As a rule of
thumb, VIFs greater than 10 signals a highly likely multicollinearity problem,
and VIFs between 5 and 10 signal a somewhat likely multicollinearity issue.
Tolerance can range from 0 (no independence from other variables) to 1
(complete independence); larger values are desired. The variance inflation
factor (VIF) is the reciprocal of tolerance and is “an index of the amount that
the variance of each regression coefficient is increased” over that with
uncorrelated independent variables. Small values for tolerance and large
values for VIF signal the presence of multicollinearity. The inverse of the VIF is
called tolerance: TOL =
If the VIF of the variables exceed 10%, multicollinearity can be a potential problem
(Hair et al.,2013). As illustrated in table 4.11 the value of variance inflation factor for
all explanatory variables is less than 10%. Therefore, it implies that there is no
multicollinearity between explanatory variables.
Table 4.11: VIF and Tolerance Statistics for Multicollinearity
Source SPSS(2021)output
Variables Tolerance Variance of Inflation
Factor(VIF)
Physical work environment .556 1.800
Performance feedback .648 1.544
Job aids .404 2.476
Reward .361 2.772
Democratic leadership style .462 2.164
Training .478 2.052
Workload .815 1.227
D. Remedial measures

• There are several remedies suggested in the literature.


Whether any of them will work in a specific situation is
debatable.
• Since the OLS estimators are BLUE as long as collinearity is
not perfect, it is often suggested that the best remedy is to
do nothing but simply present the results of the fitted
model.
• This is so because very often collinearity is essentially a
data deficiency problem, and in many situations we may
not have choice over the data we have available for
research.
• But sometimes it is useful to rethink the model we have
chosen for analysis to make sure that all the variables
included in the model may not be essential.
Cont…
• The method of principal components (PC) A statistical method, known
as the principal component analysis (PCA), can transform correlated
variables into orthogonal or uncorrelated variables. The orthogonal
variables thus obtained are called the principal components.
• The basic idea behind PCA is simple. It groups the correlated variables
into sub-groups so that variables belonging to any sub-group have a
"common" factor that moves them together. That common factor, which
is not always easy to identify, is what we call a principal component.
There is one PC for each common factor. Hopefully, these common
factors or PCs are fewer in number than the original number of
regressors.
• Drop a Redundant Variable On occasion, the simple solution of
dropping one of the multicollinear variables is a good one.
• Increase the Size of the Sample. Another way to deal with
multicollinearity is to attempt to increase the size of the sample to
reduce the degree of multicollinearity. Although such an increase may
be impossible, it’s a useful alternative to be considered when feasible.
1.11.1.2 Heteroscedasticity- tests and weighted least squares

• One of the problems commonly encountered in cross-sectional


data is hetero-scedasticity (unequal variance) in the error term.
There are various reasons for heteroscedasticity, such as the
presence of outliers in the data, or incorrect functional form of
the regression model, or incorrect transformation of data, or
mixing observations with different measures of scale (e.g. mixing
high-income households with low-income households) etc.
A. Consequences of heteroscedasticity
• The classical linear regression model (CLRM) assumes that the
error term Ui in the regression model has homoscedasticity
(equal variance) across observations, denoted by σ2. For
instance, in studying consumption expenditure in relation to
income, this assumption would imply that low-income and high-
income households have the same disturbance variance even
though their average level of consumption expenditure is
different.
Cont…
• However, if the assumption of homoscedasticity, or equal
variance, is not satisfied, we have the problem of
heteroscedasticity, or unequal variance, denoted by σi2 (note
the subscript i).
• Thus, compared to low-income households, high-income
households have not only higher average level of consumption
expenditure but also greater variability in their consumption
expenditure. As a result, in a regression of consumption
expenditure in relation to household income we are likely to
encounter heteroscedasticity.
• As noted, a commonly encountered problem in cross-sectional
data is the problem of heteroscedasticity.
• Heteroscedasticity has the following consequences:
• 1 Heteroscedasticity does not alter the unbiasedness and
consistency properties of OLS estimators.
Cont…
• 2 But OLS estimators are no longer of minimum variance or efficient.
That is, they are not best linear unbiased estimators (BLUE); they are
simply linear unbiased estimators (LUE).
• 3 As a result, the t and F tests based under the standard assumptions
of CLRM may not be reliable, resulting in erroneous conclusions
regarding the statistical significance of the estimated regression
coefficients.
• In the presence of heteroscedasticity, the BLUE estimators are
provided by the method of weighted least squares (WLS). Because of
these consequences, it is important that we check for
heteroscedasticity, which is usually found in cross-sectional data.
• if there is heteroscedasticity, the estimated t values may not be
reliable. Again, keep in mind that the F statistic may not be reliable if
there is heteroscedasticity.
• that the significant F does not mean that each explanatory variable is
statistically significant, only some of the explanatory variables are
B. Detection of Heteroscedasticity

• Besides the graphic methods, we can use two commonly used tests of
heteroscedasticity, namely, the Breusch-Pagan and White tests.
• Breusch-Pagan (BP) test- This test involves the following steps:
1. Estimate the OLS regression as usual and obtain the squared OLS residuals, ℮i2, from
this regression. Assume we want to study what are the factors that determine the
abortion rate across the 50 states in the USA? The model we consider the following
linear regression model:
ABRi = βl +β2Reli +β3Pricei +β4Lawsi +β5Fundsi +β6EduCi +β7Incomei + β8Picketi +ui, i=1,
2,... 50 ......eq.5.4. This is the primary regression model.
2. Regress ℮i2 on the k regressors included in the model; the idea here is to see if the
squared residuals (a proxy for true squared error term) are related to one or more X
variables. You can choose other regressors also that might have some bearing on the
error variance. Now run the following regression:
• ℮i2=Al+A2Reli+A3Pricei+A4Lawsi+A5Fundsi+A6EduCi+A7Incomei+A8Picketi+vi ….eq.5.5
• Where vi is the error term.
• Save R2 from regression (5.5); call it R2Aux, where aux stands for auxiliary, since Eq.
(5.2) is auxiliary to the primary regression (5.4) (see Table 5.1). The idea behind Eq.
(5.5) is to find out if the squared error term is related to one or more of the
regressors, which might indicate that perhaps heteroscedasticity is present in the
Cont…
3. The null hypothesis here is that the error variance is homoscedastic -
that is, all the slope coefficients in Eq. (5.5) are simultaneously equal to
zero. You can use the F statistic from this regression with (k-1) and (n-k) in
the numerator and denominator df, respectively, to test this hypothesis. If
the computed F statistic in Eq. (5.5) is statistically significant, we can
reject the hypothesis of homoscedasticity. If it is not, we may not reject
the null hypothesis. As the results in Table 5.1 show, the F statistic (7 df in
the numerator and 42 df in the denominator) is highly significant, for its p
value is only about 2%. Thus we can reject the null hypothesis.
4. Alternatively, you can use the chi-square statistic. It can be shown that
under the null hypothesis of homoscedasticity, the product of R2Aux
(computed in step 2) and the number of observations follows the chi-
square distribution, with df equal to the number of regressors in the
model. If the computed chi-square value has a low p value, we can reject
the null hypothesis of homoscedasticity. As the results in Table 5.1 show,
the observed chi-square value (= n R2Aux) of about 16 has a very low p
value, suggesting that we can reject the null hypothesis of
homoscedasticity.
Table 5.1 The Breusch-Pagan test of heteroscedasticity
Heteroskedasticity Test: Breusch-Pagan-Godfrey
F-statistic 2.823820 Prob. F(7,42) 0.0167
Obs*R-squared 16.00112 Prob. Chi-Square(7) 0.0251
Scaled explained SS 10.57563 Prob. Chi-Square(7) 0.1582
Test Equation:
Dependent Variable: RESID^2
Method: Least Squares
Date: 10/05/09 Time: 13:14
Sample: 150
Included observations: 50

Coefficient Std. Error t-Statistic Prob.


C 16.68558 110.1532 0.151476 0.8803
RELIGION -0.134865 0.631073 -0.213707 0.8318
PRICE 0.286153 0.162357 1.762492 0.0853
LAWS -8.566472 17.36257 -0.493387 0.6243
FUNDS 24.30981 20.33533 1.195447 0.2386
EDUC -1.590385 1.457893 -1.090879 0.2815
INCOME 0.004710 0.003325 1.416266 0.1641
PICKE -0.576745 0.308155 -1.871606 0.0682
R-squared 0.320022 Mean dependent var 41.89925
Adjusted R-squared 0.206693 S.D. dependent var 57.93043
S.E. of regression 51.59736 Akaike info criterion 10.87046
Sum squared resid 111816.1 Schwarz criterion 11.17639
Log likelihood -263.7616 Durbin-Watson stat 2.060808
F-statistic 2.823820 Prob(F-statistic) 0.016662
Cont…
•A cautionary note: this test is a large sample test and may not
be appropriate in small samples. In sum, it probably seems that
the abortion rate regression suffers from heteroscedasticity.
 White's test of heteroscedasticity
• We proceed in the spirit of the BP test and regress the
squared residuals on the seven regressors, the squared
terms of these regressors, and the pair wise cross-product
term of each regressor, for a total of 33 coefficients. As in the
BP test, we obtain the R2 value from this regression and
multiply it by the number of observations. Under the null
hypothesis that there is homoscedasticity, this product
follows the chi-square distribution with df equal to the
number of coefficients estimated. The White test is more
general and more flexible than the BP test.
Cont…
•In the present example, if we do not add the squared and cross-
product terms to the auxiliary regression, we obtain nR2 =
15.7812, which has a chi-square distribution with 7 df. The
probability of obtaining such a chi-square value or greater is about
0.03, which is quite low. This would suggest that we can reject the
null hypothesis of homoscedasticity.
•If we add the squared and cross-product terms to the auxiliary
regression, we obtain nR2 = 32.1022, which has a chi-square value
with 33 df. The probability of obtaining such a chi-square value is
about 0.51. In this case, we will not reject the null hypothesis.
Cont…
•As this exercise shows, White's chi-square test is sensitive to whether
we add or drop the squared and cross-product terms from the auxiliary
regression. Remember that the White test is a large sample test.
Therefore, when we include the regressors and their squared and cross-
product terms, which results in a loss of 33 df, the results of the
auxiliary regression are likely to be very sensitive, which is the case
here. To avoid the loss of so many degrees of freedom, White's test
could be shortened by regressing the squared residuals on the estimated
value of the regressand and its squares. That is, we regress:
•℮i2 =α1 + α2Abortionf + α3Abortionf2 + vi................................... eq.5.6
• where Abortionf = forecast value of abortion rate from Eq. (5.4).
Since the estimated abortion rate is a linear function of the
regressors included in the model of Eq. (5.4), in a way we are
indirectly incorporating the original regressor and their squares in
estimating Eq. (5.6), which is in the spirit of the original White test.
Cont…
• But note that in (5.6) there is no scope for the cross-product term,
thus obviating the cross-product terms as in the original White test.
Therefore the abridged White test saves several degrees of freedom.
The results of this regression are given in Table 5.2.
• The interesting statistic in this table is the F statistic, which is
statistically highly significant, for its p value is very low. So the
abridged White test reinforces the BP test and concludes that the
abortion rate function does indeed suffer from heteroscedasticity. And
this conclusion is arrived at with the loss of fewer degrees of freedom.
• Notice that even though the F statistic is significant, the two partial
slope coefficients are individually not significant. Incidentally, if you
drop the squared ABORTIONF term from (5.6), you will find that the
ABORTIONF term is statistically significant. The reason for this is that
the terms ABORTIONF and its square is functionally related, raising
the spectre of multicollinearity. But keep in mind that multicollinearity
refers to linear relationships between variables and not nonlinear
relationships, as in Eq. (5.6).
Cont…
Table 5.4 Abridged White test.
Dependent Variable: RES^2
Method: Least Squares
Sample: 1 50
Included observations: 50
White Heteroskedasticity-Consistent Standard Errors & Covariance
Coefficient Std. Error t -Statistic Prob.
C 20.20241 27.09320 0.745663 0.4596
ABORTIONF -1.455268 3.121734 -0.466173 0.6432
ABORTIONF^2 0.107432 0.081634 1.316014 0.1946
R-squared 0.193083 Mean dependent var 41.89925
Adjusted R-squared 0.158746 S.D. dependent var 57.93043
S,E. of regression 53,13374 Akaike info criterion 10.84163
Sum squared resid 132690.1 Schwarz criterion 10.95635
Log likelihood -268.0406 Durbin-Watson stat 1.975605
F-statistic 5.623182 Prob(F-statistic) 0.006464
Cont
•It should be noted that whether we use the BP or White or
any other test of heteroscedasticity, these tests will only
indicate whether the error variance in a specific case is
heteroscedastic or not. But these tests do not necessarily
suggest what should be done if we do encounter
heteroscedasticity.
C. Remedial Measures

• Knowing the consequences of heteroscedasticity, it may be


necessary to seek remedial measures. The problem here is
that we do not know the true heteroscedastic variances, σ i2,
for they are rarely observed. If we could observe them, then
we could obtain BLUE estimators by dividing each observation
by the (heteroscedastic) σi and estimate the transformed
model by OLS. This method of estimation is known as the
method of weighted least squares (WLS). Unfortunately, the
true σi2 is rarely known. Then what is the solution?
• In practice, we make educated guesses about what σ i2 might
be and transform the original regression model in such a way
that in the transformed model the error variance might be
homoscedastic. Some of the transformations used in practice
are as follows:
Cont…
1. If the true error variance is proportional to the square of one of the
regressors, we can divide both sides of Eq. (5.4) by that variable and run the
transformed regression. Suppose in Eq. (5.4) the error variance is
proportional to the square of income. We therefore divide Eq. (5.4) by the
income variable on both sides and estimate this regression. We then
subject this regression to heteroscedasticity tests, such as the BP and White
tests. If these tests indicate that there is no evidence of heteroscedasticity,
we may then assume that the transformed error term is homoscedastic.
2. If the true error variance is proportional to one of the regressors, we can
use the so-called square transformation, that is, we divide both sides of
(5.4) by the square root of the chosen regressor. We then estimate the
regression thus transformed and subject that regression to
heteroscedasticity tests. If these tests are satisfactory, we may rely on this
regression. There are practical problems in the applications of these
procedures. First, how do we know which regressor to pick for
transformation if there are several regressors? We can proceed by trial and
error, but that would be a time-consuming procedure. Second, if some of
the values of the chosen regressor are zero, then dividing by zero obviously
Cont…
The choice of the regressor problem can sometimes be avoided by using the estimated
Y value (i.e. Ŷi), which is a weighted average value of all the regressors in the model,
the weights being their regression coefficients, the bs. It may be noted that all these
methods of transformations are somewhat ad hoc. But there is not much we can do
about it, for we are trying to guess what the true error variances are. All we can hope
for is that the guess turns out to be reasonably good.
3. The logarithmic transformation: sometimes, instead of estimating regression (5.4),
we can regress the logarithm of the dependent variable on the regressors, which may
be linear or in log form. The reason for this is that the log transformation compresses
the scales in which the variables are measured, thereby reducing a tenfold difference
between two values to a twofold difference. For example, the number 80 is 10 times
the number 8, but In80 (= 4.3280) is about twice as large as In8 (= 2.0794). The one
caveat about using the log transformation is that we can take logs of positive numbers
only.
• White's heteroscedasticity-consistent standard errors or robust standard errors
• If the sample size is large, White has suggested a procedure to obtain
heteroscedasticity-corrected standard errors. In the literature these are known as
robust standard errors. White's routine is now built in several software packages.
The procedure does not alter the values of the coefficients, but corrects the
standard errors to allow for heteroscedasticity. But do not forget that the White
3.6.1.3 Autocorrelation-tests and Feasible Generalized Least Squares Estimation

• Time-series studies have some characteristics that make


them more difficult to deal with than cross-sections.
• 1 The order of observations in a time series is fixed. With a
cross-sectional data set, you can enter the observations in
any order you want, but with time series data, you must keep
the observations in chronological order.
• 2. Time-series samples tend to be much smaller than cross-
sectional ones. Most time-series populations have many
fewer potential observations than do cross-sectional ones,
and these smaller data sets make statistical inference more
difficult. In addition, it’s much harder to generate a time-
series observation than a cross-sectional one. After all, it
takes a year to get one more observation in an annual time
series!
Cont…
• 3. The theory underlying time-series analysis can be quite
complex. In part because of the problems mentioned above,
time-series econometrics includes a number of complex
topics that require advanced estimation techniques.
• 4. The stochastic error term in a time-series equation is often
affected by events that took place in a previous time period.
This is serial correlation, the topic of our section, so let’s get
started!
• A common problem in regression analysis involving time
series data is autocorrelation. Recall that one of the
assumptions of the classical linear regression model (CLRM) is
that the error terms, ut, are uncorrelated that is the error
term at time t is not correlated with the error term at time (t
- 1) or any other error term in the past.
Cont…
• If the error terms are correlated, the following
consequences follow:
• 1 The OLS estimators are still unbiased and consistent.
• 2 They are still normally distributed in large samples.
• 3 But they are no longer efficient. That is, they are no
longer BLUE (best linear unbiased estimator). In most cases
OLS standard errors are underestimated, which means the
estimated t values are inflated, giving the appearance that a
coefficient is more significant than it actually may be.
• 4 As a result, as in the case of heteroscedasticity, the
hypothesis-testing procedure becomes suspect, since the
estimated standard errors may not be reliable, even
asymptotically (i.e. in large samples). In consequence, the
usual t and F tests may not be valid.
Cont…
• As in the case of heteroscedasticity, we need to find
out if autocorrelation exists in a specific application
and take corrective action or find alternative
estimating procedures that will produce BLUE
estimators.
• Since we are dealing with time series data, we have
to guard against auto-, or serial, correlation. If there
is autocorrelation in the error term, the estimated
standard errors and, ipso facto, /because of that
fact, the estimated t values will be suspect.
Therefore, before we accept the results, we need to
check for the presence of autocorrelation.
A. Tests of autocorrelation

• Although there are several tests of autocorrelation, we will discuss only a


few here, namely, the graphical method, the Durbin-Watson test, and
the Breusch-Godfrey (BG) test.
1. Graphical method
• In evaluating regression results it is always good practice to plot the
residuals from the estimated model for clues regarding possible violation
of one or more OLS assumptions. As one author notes: "Anyone who tries
to analyse a time series without plotting it is asking for trouble." For
example, in our discussion of heteroscedasticity, we plotted the squared
residuals against the estimated value of the regressand to find some
pattern in these residuals, which may suggest the type of transformation
one can make of the original model so that in the transformed model we
do not face heteroscedasticity.
• Since autocorrelation refers to correlation among the error terms, u t, a
rough and ready method of testing for autocorrelation is to simply plot
the values of ut chronologically. Unfortunately, we do not observe u ts
directly. What we observe are their proxies, the ℮ ts, which we can
Cont…
• Although the ℮ts are not the same thing as uts, they are
consistent estimators of the latter, in the sense that as the
sample size increases; ℮ts converge to their true values, uts.
By plotting the data on ℮ts chronologically we can get a
visual impression of the possibility of autocorrelation.
2. Durbin-Watson d test
• The Durbin–Watson test is used to determine if there is
first-order serial correlation in the error term of an equation
by examining the residuals of a particular estimation of that
equation. The most celebrated, and often over-used, test for
detecting serial correlation was developed by statisticians
Durbin and Watson, and is popularly known as the Durbin-
Watson d statistic, which is defined as:
• ....................................eq.5.7
Cont…
• This is the ratio of the sum of squared differences in successive
residuals to the residual sum of squares. Note that the df in the
numerator is (n-1), as we lose one observation in taking successive
differences of residuals. Also note that the d value always lies
between 0 and 4.
• The assumptions underlying the d statistic are:
• 1 The regression model includes an intercept term.
• 2 The explanatory variables, or regressors, are fixed in repeated
sampling.
• 3 The error term Ut follows the first-order autoregressive (ARl)
scheme:
• Ut = ρut-1 + vt ...................................... eq.5.8
• where ρ (rho) is the coefficient of autocorrelation and it lies in the
range -1 ρ It is called first-order AR because it involves only the
current and one-period lagged error term. Vt is a random error term.
Cont…
4. The error term Ut is normally distributed.
5. The regressors do not include the lagged value(s) of
the dependent variable, Yt, that is, regressors do not
include Yt-1, Yt-2 and other lagged terms of Y.
• Based on the sample size and the number of
regressors, Durbin and Watson were able to
establish two critical values of the d statistic, dL and
dU, called the lower and upper limits, so that if the
computed d value lies below the lower limit, or
above the upper limit, or in between the two limits,
a decision could be made about the presence of
autocorrelation.
Cont…
• The decision rules are as follows:
• 1 If d < dL there probably is evidence of positive autocorrelation.
• 2 If d > dU, there probably is no evidence of positive autocorrelation.
• 3 If dL < d < dU, no definite conclusion about positive autocorrelation
may be made.
• 4 If dU < d < 4-dU, there is probably no evidence of positive or negative
autocorrelation.
• 5 If 4-dU< d < 4-dL, no definite conclusion about negative
autocorrelation may be made.
• 6 If 4-dL < d < 4 there probably is evidence of negative autocorrelation.
• As noted, the d value lies between 0 and 4. The closer it is to zero, the
greater is the evidence of positive autocorrelation, and the closer it is
to 4, the greater is the evidence of negative autocorrelation. If d is
about 2, there is no evidence of positive or negative (first-) order
autocorrelation.
Cont…

3. Breusch-Godfrey (BG) general test of autocorrelation


• To avoid some of the restrictive features of the d test, Breusch and
Godfrey have developed a test of autocorrelation that is more general in
that it allows for
• (1) lagged values of the dependent variables to be included as regressors,
• (2) higher-order autoregressive schemes, such as AR (2) and AR (3), and
• (3) moving average terms of the error term, such as u t-l, ut-2 and so on.
• The BG test involves the following steps:
• 1 Estimate the basic regression model by OLS and obtain the residuals, e t.
• 2 Regress et on the regressors in the basic regression model and the ρ
autoregressive terms given in the auxiliary regression model and obtain
R2 from this auxiliary regression.
• 3 If the sample size is large (technically, infinite), BG have shown that
• (n-ρ)R2 X2ρ
• That is, in large sample, (n - p) times R 2 follows the chi-square distribution
with p degrees of freedom.
Cont…
• 4 As an alternative, we can use the F value obtained from
auxiliary regression to test the null hypothesis that is the null
hypothesis Ho is ρ1= ρ2 = ρp = 0 (That is, there is no serial
correlation of any order). This F value has (p, n- k -p) degrees
of freedom in the numerator and denominator, respectively,
where k represents the number of parameters in (the basic
regression model) (including the intercept term).
• Therefore, if in an application the chi-square value thus
computed exceeds the critical chi-square value at the chosen
level of significance, we can reject the null hypothesis of no
autocorrelation, in which case at least one ρ value in the
auxiliary regression is statistically significantly different from
zero. In other words, we have some form of autocorrelation.
Most statistical packages now present the p value of the
estimated chi-square value, so we need not choose the level of
Cont…
• Similarly, if the computed F value exceeds the
critical F value for a given level of significance, we
can also reject the null hypothesis of no
autocorrelation.
• Instead of choosing the level of significance, we
can rely on the p value of the estimated F statistic
and reject the null hypothesis if this p value is
low.
B. Remedial measures
• After you determine that autocorrelation is likely, you need to
modify the estimation of your econometric model to obtain
accurate results. The two most common solutions to
autocorrelation are feasible generalized least squares (FGLS) and
serial correlation robust standard errors.
• If we find autocorrelation in an application, we need to take care
of it, for depending on its severity, we may be drawing misleading
conclusions because the usual OLS standard errors could be
severely biased. Now the problem we face is that we do not know
the correlation structure of the error terms u t, since they are not
directly observable.
• Hence, as in the case of heteroscedasticity, we need to resort to
some educated guesswork or some kind of transformation of the
original regression model so that in the transformed model we do
not face the serial correlation problem. There are several
methods that we could try.
Cont…
• First-difference transformation
• If we know the value of ρ, we can subtract from the current
value of the error term ρ times the previous value of the
error term that is (ut - ρut-1 = Vt). The resulting error term, Vt
will satisfy the standard OLS assumptions.
• The transformed model can therefore be estimated by OLS.
All we have to do is transform each variable by subtracting
from its current value ρ times its previous value and run the
regression. The estimators obtained from the transformed
model are BLUE.
Cont…
• Generalized transformation (Feasible generalized least
squares (FGLS) )
• FGLS estimation has several names, depending on the
precise method used to modify the estimation of the
econometric model. The two FGLS techniques used to
address AR(1) autocorrelation are:
• The Cochrane-Orcutt (CO) transformation
• The Prais-Winsten (PW) transformation
• The CO and PW techniques transform the original
model with autocorrelation into one without
autocorrelation. So the goal of the CO and PW
transformations is to make the error term in the
original econometric model uncorrelated.
3.6.1.4 Model Specification Errors: (Omission of Relevant Variables, Inclusion of Irrelevant Variables,
Misspecification, etc.)

• Model Specification Errors - One of the assumptions of


the classical linear regression model (CLRM) is that the
model used in analysis is" correctly specified".
• This is indeed a tall order, for there is no such thing as a
perfect model. An econometric model tries to capture the
main features of an economic phenomenon, taking into
account the underlying economic theory, prior empirical
work, intuition, and research skills.
• If we want to take into account every single factor that
affects a particular object of research, the model will be
so unwieldy as to be of little practical use.
Cont…
• By correct specification we mean one or more of the following:
• 1 The model does not exclude any "core" variables.
• 2 The model does not include superfluous variables.
• 3 The functional form of the model is suitably chosen.
• 4 There are no errors of measurement in the regressand and
regressors.
• 5 Outliers in the data, if any, are taken into account.
• 6 The probability distribution of the error term is well specified.
• 7 What happens if the regressors are stochastic?
• 8 The Simultaneous Equation Problem: the simultaneity bias.
• In what follows we will discuss the consequences of what
happens if one or more of the specification errors are
committed, how we can detect them, and what remedial
measures we can take.
Cont…
1. Omission of Relevant Variables
• We do not deliberately set out to omit relevant variables from
a model. But sometimes they are omitted because we do not
have the data, or because we have not studied the underlying
economic theory carefully, or because we have not studied
prior research in the area thoroughly, or sometimes just
because of carelessness. This is called underfitting a model.
Whatever the reason, omission of important or "core"
variables has the following consequences.
• 1 If the left-out, or omitted, variables are correlated with the
variables included in the model, the coefficients of the
estimated model are biased. Not only that, the bias does not
disappear as the sample size gets larger. In other words, the
estimated coefficients of the misspecified model are biased
as well as inconsistent.
Cont…
• 2 Even if the incorrectly excluded variables are not correlated
with the variables included in the model, the intercept of the
estimated model is biased.
• 3 The disturbance variance σ2 is incorrectly estimated.
• 4 The variances of the estimated coefficients of the misspecified
model are biased. As a result, the estimated standard errors are
also biased.
• 5 In consequence, the usual confidence intervals and
hypothesis-testing procedures become suspect, leading to
misleading conclusions about the statistical significance of the
estimated parameters.
• 6 Furthermore, forecasts based on the incorrect model and the
forecast confidence intervals based on it will be unreliable.
• As you can see, the consequences of omitting relevant variables
can be very serious.
Cont…
• Tests of omitted variables
• Although we have illustrated the consequences of omitting
relevant variables, how do we find out if we have committed
the omission variable bias? There are several tests of
detecting the omission of relevant variables, but we will
consider only two here, namely, Ramsey's RESET test and the
Lagrange multiplier (LM) test.
• Ramsey's RESET test
1. Ramsey's regression specification error test, RESET for short, is
a general test of model specification errors.
2. From the (incorrectly) estimated model, we first obtain the
estimated, or fitted, values
3. Reestimate the model including additional regressors.
4. The initial model is the restricted model and the model in Step
2 is the unrestricted model.
Cont…
4. Under the null hypothesis that the restricted (i.e.
the original model) is correct, we can use the F test
given in
• F =)
• Let R2r and R2ur represent the restricted and
unrestricted R2 values and m = number of
restrictions, n number of observations, and k
number of regressors in the unrestricted model).
The F statistic in the Eq. follows the F distribution
with m and (n - k) degrees of freedom in the
numerator and denominator, respectively.
Cont…
5. If the F test in Step 4 is statistically significant, we can reject the
null hypothesis. That is, the restricted model is not appropriate. By
the same token, if the F statistic is statistically insignificant, we do
not reject the original model.
• The idea behind this test is simple. If the original model is
correctly specified, the added squared and higher powers of the
estimated regressand values should not add anything to the
model. But if one or more coefficients of the added regressors
are significant, this may be evidence of specification error.
• Although simple to apply, the RESET test has two drawbacks.
First, if the test shows that the chosen model is incorrectly
specified, it does not suggest any specific alternative. Second,
the test does not offer any guidance about the number of
powered terms of the estimated values of the regressand to be
included in the unrestricted model.
Cont…
• The Lagrange multiplier (LM) test
• 1 From the original model, we obtain the estimated residuals, e i.
• 2 If in fact the model is the correct model, then the residuals e i obtained
from this model should not be related to the regressors omitted from that
model.
• 3 We now regress ei on the regressors in the original model and the omitted
variables from the original model. Call this the auxiliary regression, auxiliary
to the original regression.
• 4 If the sample size is large, it can be shown that n (the sample size) times
the R2 obtained from the auxiliary regression follows the chi-square
distribution with df equal to the number of regressors omitted from the
original regression.
• Symbolically, nR2 ~ X2(m) (asymptotically). where m is the number of
omitted regressors from the original model.
• 5 If the computed X2 value exceeds the critical chi-square value at the
chosen level of significance, or if its p value is sufficiently low, we reject the
original (or restricted) regression. This is to say, that the original model was
Cont…
4. Inclusion of irrelevant or unnecessary variables
• Sometimes researchers add variables in the hope that the R2
value of their model will increase in the mistaken belief that
the higher the R2 the better the model This is called
overfitting a model But if the variables are not economically
meaningful and relevant, such a strategy is not
recommended because of the following consequences:
• 1 The OLS estimators of the "incorrect" or overfitted model
are all unbiased and consistent.
• 2 The error variance σ2 is correctly estimated.
• 3 The usual confidence interval and hypothesis testing
procedures remain valid.
• 4 However, the estimated coefficients of such a model are
generally inefficient that is, their variances will be larger
than those of the true model.
Cont…
• Notice the asymmetry in the two types of specification
error - underfitting and overfitting a model. In the
former case the estimated coefficients are biased as
well as inconsistent, the error variance is incorrectly
estimated, and the hypothesis-testing procedure
becomes invalid. In the latter case, the estimated
coefficients are unbiased as well as consistent, the
error variance is correctly estimated, and the
hypothesis-testing procedure remains valid; the only
penalty we pay for the inclusion of irrelevant or
superfluous variables is that the estimated variances,
and hence the standard errors, are relatively large and
therefore probability inferences about the parameters
Cont…
• One may be tempted to conclude that it is better
to include unnecessary variables (the so-called
"kitchen sink approach") than omit relevant
variables. Such a philosophy is not recommended
because the inclusion of unnecessary variables not
only leads to loss of efficiency of the estimators
but may also lead, unwittingly, to the problem of
multicollinearity, not to mention the loss of
degrees of freedom.
5. Misspecification of the functional form of a
regression model
6. Errors of measurement
Cont…
• One of the assumptions of CLRM is that the model used
in the analysis is correctly specified. Although not
explicitly spelled out, this presumes that the values of
the regressand as well as regressors are accurate. That
is, they are not guess estimates, extrapolated,
interpolated or rounded off in any systematic manner or
recorded with errors.
• This ideal, however, is not very often met in practice for
several reasons, such as non-response errors, reporting
errors, missing data, or sheer human errors. Whatever
the reasons for such errors, measurement errors
constitute yet another specification bias, which has
serious consequences, especially if there are such errors
Cont…
• Errors of measurement in the regressand
• Although we will not prove it here, if there are
errors of measurement in the dependent variable,
the following consequences ensue.
• 1 The OLS estimators are still unbiased.
• 2 The variances and standard errors of OLS
estimators are still unbiased.
• 3 But the estimated variances, and ipso facto the
standard errors, are larger than in the absence of
such errors. In short, errors of measurement in the
regress and do not pose a very serious threat to
OLS estimation.
Cont…
• Errors of measurement in the regressors
• The situation here is more serious, for errors of measurement in the
explanatory variable(s) render OLS estimators biased as well as
inconsistent. Even such errors in a single regressor can lead to biased and
inconsistent estimates of the coefficients of the other regressors in the
model. And it is not easy to establish the size and direction of bias in the
estimated coefficients. It is often suggested that we use instrumental or
proxy variables for variables suspected of having measurement errors.
The proxy variables must satisfy two requirements - that they are highly
correlated with the variables for which they are a proxy and also they are
uncorrelated with the usual equation error u i as well as the measurement
error. But such proxies are not easy to find; we are often in the situation
of complaining about the bad weather without being able to do much
about it. Therefore this remedy may not be always available.
• All we can say about measurement errors, in both the regressand and
regressors, is that we should be very careful in collecting the data and
making sure that some obvious errors are eliminated.
Cont…
7. Outliers, leverage and influence data
• We discussed the basics of the linear regression
model. You may recall that in minimizing the
residual sum of squares (RSS) to estimate the
regression parameters, OLS gives equal weight to
every observation in the sample. But this may
create problems if we have observations that may
not be "typical" of the rest of the sample. Such
observations, or data points, are known as
outliers, leverage or influence points. It is
important that we know what they are, how they
affect the regression results, and how we detect
Cont…
• Outliers: In the context of regression analysis, an outlier is
an observation with a large residual (ei), large in
comparison with the residuals of the rest of the
observations. In a bivariate regression, it is easy to detect
such large residual(s) because of its rather large vertical
distance from the estimated regression line. Remember
that there may be more than one outlier. One can also
consider the squared values of ei, as it avoids the sign
problem - residuals can be positive or negative.
• Leverage: An observation is said to exert (high) leverage if
it is disproportionately distant from the bulk of the
sample observations. In this case such observation{s) can
pull the regression line towards itself, which may distort
the slope of the regression line.
Cont…
• Influence point: If a levered observation in fact pulls the
regression line toward itself, it is called an influence point.
The removal of such a data point from the sample can
dramatically change the slope of the estimated regression
line.
• Detection of outliers- A simple method of detecting
outliers is to plot the residuals and squared residuals
from the estimated regression model. An inspection of
the graph will give a rough and ready method of spotting
outliers, although that may not always be the case
without further analysis. Our objective in discussing the
topic of outliers is to warn the researcher to be on the
lookout for them, because OL5 estimates can be greatly
affected by such outliers, especially if they are influential.
Cont…
8. Probability distribution of the error term
• The classical normal linear regression model
(CNLRM), an extension of CLRM, assumes that the
error term ui in the regression model is normally
distributed. This assumption is critical if the sample
size is relatively small, for the commonly used tests of
significance, such as t and F, are based on the
normality assumption. It is thus important that we
check whether the error term is normally distributed.
There are several tests of normality, but the most
popularly used test is the Jarque-Bera (JB) test of
normality. This test is a large sample test and may
not be appropriate in small samples.
Cont…
• The formula for the test is as follows:
• X22
• where n is the sample size, S= skewness coefficient, and K=
kurtosis coefficient. For a normally distributed variable S=
0 and K = 3. It is obvious from the JB statistic that if S = 0
and K = 3, its value is zero. Therefore, the closer is the
value of JB to zero; the better is the normality assumption.
Of course, we can always use the chi-square distribution to
find the exact statistical significance (i.e. the p value) of the
JB statistic. Since in practice we do not observe the true
error term, we use its proxy, ei. The null hypothesis is the
joint hypothesis that S=0 and K=3. Jarque and Bera have
shown that the statistic given from the formula follows the
chi-square distribution with 2 df.
Cont…
• There are two degrees of freedom because we are
imposing two restrictions, namely, that skewness is
zero and kurtosis is 3. Therefore, if in an
application the computed JB statistic (i.e. the chi-
square statistic) exceeds the critical chi-square
value, say, at the 5% level, we reject the hypothesis
that the error term is normally distributed.
• Non-normal error term- If the error term ui is not
normally distributed, it can be stated that the OLS
estimators are still best linear unbiased estimators
(BLUE); that is, they are unbiased and in the class
of linear estimators they have minimum variance.
Cont…
• This is not a surprising finding, for in establishing
the BLUE (recall the Gauss-Markov theorem)
property we did not invoke the normality
assumption. What then is the problem? The
problem is that for the purpose of hypothesis
testing we need the sampling, or probability,
distributions of the OLS estimators. The t and F
tests that we have used all along assume that the
probability distribution of the error term follows
the normal distribution. But if we cannot make
that assumption, we will have to resort to large or
asymptotic sample theory.
Cont…
9. Random or stochastic Regressors
• The CLRM assumes that the regressand is random but
the regressors are nonstochastic or fixed that is, we keep
the values of the regressors fixed and draw several
random samples of the dependent variable. For example,
in the regression of consumption expenditure on income,
we assume that income levels are fixed at certain values
and then draw random samples of consumers at the
fixed levels of income and note their consumption
expenditure. In regression analysis our objective is to
predict the mean consumption expenditure at various
levels of fixed income. If we connect these mean
consumption expenditures the line (or curve) thus drawn
represents the (sample) regression line (or curve).
Cont…
• Although the assumption of fixed regressors may be valid in
several economic situations, by and large it may not be
tenable for all economic data. In other words, we assume
that both Y (the dependent variable) and the X S (the
regressors) are drawn randomly. This is the case of stochastic
or random regressors. The important question that arises is
whether the results of regression analysis based on fixed
regressors also hold if the regressors are as random as the
regressand. If the stochastic regressors and the error term u
are independently distributed, the classical results discussed
(the Gauss-Markov theorem) continue to hold provided we
stress the fact that our analysis is conditional on given values
of the regressors. If, on the other hand, the random
regressors and the error term are uncorrelated, the classical
results hold asymptotically - that is in large samples.
Cont…
10. The Simultaneity Problem
• Our focus thus far has been on single-equation regression
models, in that we expressed a single dependent variable Y
as a function of one or more explanatory variables, the Xs.
If there was any causality between Y and the Xs, it was
implicitly assumed that the direction of causality ran from
the Xs to Y. But there are many situations where such a
unidirectional relationship between Y and the Xs cannot be
maintained, for it is quite possible that some of the Xs
affect Y but in turn Y also affects one or more Xs. In other
words, there may be a feedback relationship between the
Y and X variables. To take into account of such feedback
relationships, we will need more than one regression
equation.
Cont…
• This leads to a discussion of simultaneous equation
regression models - that is, models that take into
account feedback relationships among variables. In
what follows, we discuss briefly why OLS may not be
appropriate to estimate a single equation that may be
embedded in a system of simultaneous equation model
containing two or more equations. When dealing with
simultaneous equation models, we have to learn some
new vocabulary. First, we have to distinguish between
endogenous and exogenous variables. Endogenous
variables are those variables whose values are
determined in the model, and exogenous variables are
those variables whose values are not determined in the
End of Chapter Three

You might also like