Econometrics: Domodar N. Gujarati
Econometrics: Domodar N. Gujarati
Domodar N. Gujarati
HamidUllah
1
A HYPOTHETICAL EXAMPLE
Regression analysis is largely concerned with estimating and/or predicting the (population) mean value of the dependent variable on the basis of the known or fixed values of the explanatory variable(s).
Look at table 2.1 which refers to a total population of 60 families and their weekly income (X) and weekly consumption expenditure (Y). The 60 families are divided into 10 income groups.
There is considerable variation in weekly consumption expenditure in each income group. But the general picture that one gets is that, despite the variability of weekly consumption expenditure within each income bracket, on the average, weekly consumption expenditure increases as income increases.
The dark circled points in Figure 2.1 show the conditional mean values of Y against the various X values. If we join these conditional mean values, we obtain what is known as the population regression line (PRL), or more generally, the population regression curve. More simply, it is the regression of Y on X. The adjective population comes from the fact that we are dealing in this example with the entire population of 60 families. Of course, in reality a population may have many families.
4
10
11
12
13
5. Poor proxy variables: for example, Friedman regards permanent consumption (Yp) as a function of permanent income (Xp). But since data on these variables are not directly observable, in practice we use proxy variables, such as current consumption (Y) and current income (X), there is the problem of errors of measurement, u may in this case then also represent the errors of measurement. 6. Principle of parsimony: we would like to keep our regression model as simple as possible. If we can explain the behavior of Y substantially with two or three explanatory variables and if our theory is not strong enough to suggest what other variables might be included, why introduce more variables? Let ui represent all other variables.
14
7. Wrong functional form: Often we do not know the form of the functional relationship between the regressend (dependent) and the repressors. Is consumption expenditure a linear (in variable) function of income or a nonlinear (invariable) function? If it is the former, Yi = 1 + B2Xi + ui is the proper functional relationship between Y and X, but if it is the latter, Yi = 1 + 2Xi + 3X2i + ui may be the correct functional form. In two-variable models the functional form of the relationship can often be judged from the scattergram. But in a multiple regression model, it is not easy to determine the appropriate functional form, for graphically we cannot visualize scattergrams in multiple dimensions. 15
But how is the SRF itself determined? First, express (2.6.3) as ui = Yi Yi = Yi 1 2Xi (3.1.1)
16
Now given n pairs of observations on Y and X, we would like to determine the SRF in such a manner that it is as close as possible to the actual Y. To this end, we may adopt the following criterion: Choose the SRF in such a way that the sum of the residuals ui = (Yi Yi) is as small as possible.
But this is not a very good criterion. If we adopt the criterion of minimizing ui , Figure 3.1 shows that the residuals u2 and u3 as well as the residuals u1 and u4 receive the same weight in the sum (u1 + u2 + u3 + u4). A consequence of this is that it is quite possible that the algebraic sum of the ui is small (even zero) although the ui are widely scattered about the SRF. To see this, let u1, u2, u3, and u4 in Figure 3.1 take the values of 10, 2, +2, and 10, respectively. The algebraic sum of these residuals is zero although u1 and u4 are scattered more widely around the SRF than u2 and u3.
17
We can avoid this problem if we adopt the least-squares criterion, which states that the SRF can be fixed in such a way that u2i = (Yi Yi)2 = (Yi 1 2Xi)2 (3.1.2) is as small as possible, where u2i are the squared residuals.
18
By squaring ui , this method gives more weight to residuals such as u1 and u4 in Figure 3.1 than the residuals u2 and u3. It is obvious from (3.1.2) that: u2i = f (1, 2) (3.1.3) that is, the sum of the squared residuals is some function of the estimators 1 and 2. To see this, consider Table 3.1 and conduct two experiments.
19
Example
20
Since the values in the two experiments are different, we get different values for the estimated residuals. Now which sets of values should we choose? Obviously the s of the first experiment are the best values. But we can make endless experiments and then choosing that set of values that gives us the least possible value of u2i But since time, and patience, are generally in short supply, we need to consider some shortcuts to this trial-and-error process. Fortunately, the method of least squares provides us with unique estimates of 1 and 2 that give the smallest possible value of u2i.
(3.1.2)
21
22
The process of differentiation yields the following equations for estimating 1 and 2: Yi Xi = 1Xi + 2X2i (3.1.4) Yi = n1 + 2Xi (3.1.5) where n is the sample size. These simultaneous equations are known as the normal equations. Solving the normal equations simultaneously, we obtain
23
where X and Y are the sample means of X and Y and where we define xi = (Xi X ) and yi = (Yi Y). Henceforth we adopt the convention of letting the lowercase letters denote deviations from mean values.
24
26
Derivation of R2
TSS= ESS+RSS
(3.5.3)
27
28
Coefficient Correlation
In correlation analysis, the primary objective is to measure the strength or degree of linear association between two variables. The coefficient, measures this strength of (linear) association.
29