CH-3
CH-3
•The error term ui is a catchall for all those variables that cannot be
introduced in the model for a variety of reasons. However, the average
influence of these variables on the regressand is assumed to be
negligible.
Cont…
The nature of the Y variable
•It is generally assumed that Y is a random variable. It can be measured on four
different scales:
1. Ratio scale,
2. Interval scale,
3. Ordinal scale, and
4. Nominal scale.
Ratio scale: A ratio scale variable has three properties:
(1) ratio of two variables,
(2) distance between two variables, and
(3) ordering of variables.
On a ratio scale if, say, Y takes two values, Y 1 and Y2, the ratio and the distance
(Y2 - Y1) are meaningful quantities, as are comparisons or ordering such as Y 2 ≤
Y1 or Y2 ≥Y1. Most economic variables belong to this category. Thus we can talk
about whether GDP is greater this year than the last year, or whether the ratio of
GDP this year to the GDP last year is greater than or less than one.
Cont…
Interval scale: Interval scale variables do not satisfy the first
property of ratio scale variables. For example, the distance
between two time periods, say, 2007 and 2000 (2007-2000) is
meaningful, but not the ratio 2007/2000. Fixed scale: the difference
between 70Kg and 80Kg is the same interval as that between 80Kg and
90Kg and so on.
Ordinal scale: Variables on this scale satisfy the ordering
property of the ratio scale, but not the other two properties. For
examples, grading systems, such as A, B, C, or income
classification, such as low income, middle income, and high
income, are ordinal scale variables, but quantities such as grade
A divided by grade B are not meaningful.
For example, suppose we didn’t ask for an exact weight but for which
group of weights a person belonged to such as 50-60Kg, 60Kg, 60-70Kg,
80Kg, over 90Kg.---ordinal
Cont…
Nominal scale: Do not have any of the features of
the ratio scale variables.
These variables cannot be rank ordered at all.
Variables such as gender, marital status, and
religion are nominal scale variables. …. Hot, Sweet,
salty
Such variables are often called dummy or categorical
variables. They are often "quantified" as 1 or 0, 1
indicating the presence of an attribute and 0
indicating its absence. Thus, we can" quantify"
gender as male = 1 and female = 0, or vice versa.
Cont…
The nature of X variables or regressors
•Notice carefully that the estimators, the bs, are random variables, for their values
will change from sample to sample. On the other hand, the (population)
regression coefficients or parameters, the βs, are fixed numbers, although we do
not what they are. On the basis of the sample we try to obtain the best guesses of
them.
•The distinction between population and sample regression function is important,
for in most applications we may not be able to study the whole population for a
variety of reasons, including cost considerations. It is remarkable that in
Presidential elections in the USA, polls based on a random sample of, say, 1,000
people often come close to predicting the actual votes in the elections. In
regression analysis our objective is to draw inferences about the population
regression function on the basis of the sample regression function, for in reality
we rarely observe the population regression function; we only guess what it might
be. This is important because our ultimate objective is to find out what the true
values of the Bs may be.
•For this we need a bit more theory, which is provided by the classical linear
1.4. The Classical Linear Regression Model (CLRM )
•Where, for brevity of expression, X (the bold X) stands for all X variables in the
model. In words, the conditional expectation of the error term, given the values
of the X variables, is zero. Since the error term represents the influence of factors
that may be essentially random, it makes sense to assume that their mean or
average value is zero.
•As a result of this critical assumption, we can write (3.2) as:
• E(Yi | X) = βX + E(ui | X).................................................... Eq. 3.9
• = βX
• which can be interpreted as the model for mean or average value of Yi
conditional on the X values. This is the population (mean) regression function
Cont…
•In regression analysis our main objective is to estimate this function. If
there is only one X variable, you can visualize it as the (population)
regression line. If there is more than one X variable, you will have to
imagine it to be a curve in a multi-dimensional graph. The estimated PRF,
the sample counterpart of Eq. (3.9), is denoted by Ŷi = bx. That is, Ŷi = bx
•where Cov stands for covariance and i and j are two different error terms. Of
course, if i = j, Eq. (3.11) will give the variance of ui given in Eq. (3.10).
•A-6: There are no perfect linear relationships among the X variables. This is the
assumption of no multicollinearity. For example, relationships like X 5 = 2X3 +
•On the basis of Assumptions A-1 to A-7, it can be shown that the method of
ordinary least squares (OLS), the method most popularly used in practice,
provides estimators of the parameters of the PRF that have several desirable
statistical properties, such as:
1. The estimators are linear, that is, they are linear functions of the dependent
variable Y. Linear estimators are easy to understand and deal with compared to
nonlinear estimators.
2. The estimators are unbiased, that is, in repeated applications of the method, on
average, the estimators are equal to their true values.
3. In the class of linear unbiased estimators, OLS estimators have minimum
variance. As a result, the true parameter values can be estimated with least
possible uncertainty; an unbiased estimator with the least variance is called an
efficient estimator.
Cont…
•In short, under the assumed conditions, OLS estimators are
BLUE: best linear unbiased estimators. This is the essence of
the well-known Gauss-Markov theorem, which provides a
theoretical justification for the method of least squares.
•With the added Assumption A-8, it can be shown that the OLS
estimators are themselves normally distributed. As a result, we
can draw inferences about the true values of the population
regression coefficients and test statistical hypotheses. With the
added assumption of normality, the OLS estimators are best
unbiased estimators (BUE) in the entire class of unbiased
estimators, whether linear or not. With normality assumption,
CLRM is known as the normal classical linear regression
model (NCLRM).
Cont…
•Before proceeding further, several questions can be raised.
•How realistic are these assumptions?
•What happens if one or more of these assumptions are not
satisfied?
•In that case, are there alternative estimators?
•Why do we confine to linear estimators only?
•All these questions will be answered as we move forward
in other chapters of this course. But it maybe added that in
the beginning of any field of enquiry we need some
building blocks. The CLRM provides one such building
block.
1.5. Variances and Standard Errors of OLS estimators
•As noted before, the OLS estimators, the bs, are random variables, for
their values will vary from sample to sample. Therefore we need a
measure of their variability. In statistics the variability of a random
variable is measured by its variance σ2, or its square root, the standard
deviation σ. In the regression context the standard deviation of an
estimator is called the standard error, but conceptually it is similar to
standard deviation. For the LRM, an estimate of the variance of the error
term ui, σ2, is obtained as
σˆ2 = ...................................................... Eq.3.13
• that is, the residual sum of squares (RSS) divided by (n - k), which is
called the degrees of freedom (df), n being the sample size and k
being the number of regression parameters estimated, an intercept
and (k-1) slope coefficients(βs). σˆ is called the standard error of the
regression (SER) or root mean square. It is simply the standard
deviation of the Y values about the estimated regression line and is
often used as a summary measure of "goodness of fit" of the
estimated regression line.
Cont…
•Note that a "hat" or caret over a parameter denotes an estimator of that
parameter.
•It is important to bear in mind that the standard deviation of Y values,
denoted by SY, is expected to be greater than SER, unless the regression
model does not explain much variation in the Y values. If that is the case,
there is no point in doing regression analysis, for in that case the X
regressors have no impact on Y.
•Then the best estimate of Y is simply its mean value, Ȳ. Of course we use
a regression model in the belief that the X variables included in the model
will help us to better explain the behavior of Y that Ȳ alone cannot.
•Given the assumptions of the CLRM, we can easily derive the variances
and standard errors of the b coefficients, but we will not present the actual
formulas to compute them because statistical packages produce them
easily, as we will show with an example.
Probability distributions of OLS estimators
1. OLS estimators are still BLUE, but they have large variances and
covariances, making precise estimation difficult.
2. As a result, the confidence intervals tend to be wider. Therefore, we
may not reject the "zero null hypothesis" (i.e. the true population
coefficient is zero).
3. Because of (1), the t ratios of one or more coefficients tend to be
statistically insignificant.
4. Even though some regression coefficients are statistically
insignificant, the R2 value may be very high.
5. The OLS estimators and their standard errors can be sensitive to
small changes in the data.
6. Adding a collinear variable to the chosen regression model can alter
the coefficient values of the other variables in the model.
In short, when regressors are collinear, statistical inference becomes
shaky, especially so if there is near-collinearity. This should not be
surprising, because if two variables are highly collinear it is very difficult
to isolate the impact of each variable separately on the regressand.
B. Detection of multicollinearity
• Besides the graphic methods, we can use two commonly used tests of
heteroscedasticity, namely, the Breusch-Pagan and White tests.
• Breusch-Pagan (BP) test- This test involves the following steps:
1. Estimate the OLS regression as usual and obtain the squared OLS residuals, ℮i2, from
this regression. Assume we want to study what are the factors that determine the
abortion rate across the 50 states in the USA? The model we consider the following
linear regression model:
ABRi = βl +β2Reli +β3Pricei +β4Lawsi +β5Fundsi +β6EduCi +β7Incomei + β8Picketi +ui, i=1,
2,... 50 ......eq.5.4. This is the primary regression model.
2. Regress ℮i2 on the k regressors included in the model; the idea here is to see if the
squared residuals (a proxy for true squared error term) are related to one or more X
variables. You can choose other regressors also that might have some bearing on the
error variance. Now run the following regression:
• ℮i2=Al+A2Reli+A3Pricei+A4Lawsi+A5Fundsi+A6EduCi+A7Incomei+A8Picketi+vi ….eq.5.5
• Where vi is the error term.
• Save R2 from regression (5.5); call it R2Aux, where aux stands for auxiliary, since Eq.
(5.2) is auxiliary to the primary regression (5.4) (see Table 5.1). The idea behind Eq.
(5.5) is to find out if the squared error term is related to one or more of the
regressors, which might indicate that perhaps heteroscedasticity is present in the
Cont…
3. The null hypothesis here is that the error variance is homoscedastic -
that is, all the slope coefficients in Eq. (5.5) are simultaneously equal to
zero. You can use the F statistic from this regression with (k-1) and (n-k) in
the numerator and denominator df, respectively, to test this hypothesis. If
the computed F statistic in Eq. (5.5) is statistically significant, we can
reject the hypothesis of homoscedasticity. If it is not, we may not reject
the null hypothesis. As the results in Table 5.1 show, the F statistic (7 df in
the numerator and 42 df in the denominator) is highly significant, for its p
value is only about 2%. Thus we can reject the null hypothesis.
4. Alternatively, you can use the chi-square statistic. It can be shown that
under the null hypothesis of homoscedasticity, the product of R2Aux
(computed in step 2) and the number of observations follows the chi-
square distribution, with df equal to the number of regressors in the
model. If the computed chi-square value has a low p value, we can reject
the null hypothesis of homoscedasticity. As the results in Table 5.1 show,
the observed chi-square value (= n R2Aux) of about 16 has a very low p
value, suggesting that we can reject the null hypothesis of
homoscedasticity.
Table 5.1 The Breusch-Pagan test of heteroscedasticity
Heteroskedasticity Test: Breusch-Pagan-Godfrey
F-statistic 2.823820 Prob. F(7,42) 0.0167
Obs*R-squared 16.00112 Prob. Chi-Square(7) 0.0251
Scaled explained SS 10.57563 Prob. Chi-Square(7) 0.1582
Test Equation:
Dependent Variable: RESID^2
Method: Least Squares
Date: 10/05/09 Time: 13:14
Sample: 150
Included observations: 50