Short - Notes - Econometric Methods
Short - Notes - Econometric Methods
What is Econometrics?
Fundamental of economic measurement. It is a set of research tools employed in various disciplines.
In economics, we express our ideas about relationships between economic variables using the mathematical
concept of a function. For example, to express a relationship between income and consumption. i.e
CONSUMPTION = f(INCOME)
Econometrics is about how we can use theory and data from economics, business, and the social
sciences, along with tools from statistics, to predict outcomes, answer “how much” type questions,
and test hypotheses
In Econometric model the economic relations are not exact. While studing a data, we recognize that
some of it is a systematic part and a random and a unpredictable component ‘u’ (which is called
Random error)
For Ex:
The demand for an individual commodity say, the Honda Accord might be expressed as
Qd = 𝑓(P, Ps , Pc , INC)
which says that the quantity of Honda Accords demanded, Qd , is a function f (P, Ps , Pc , INC) of the
price of Honda Accords P, the price of cars that are substitutes Ps , the price of items that are
complements Pc (like gasoline), and the level of income INC.
Qd = 𝑓(P, Ps , Pc , INC) + u
The random error e accounts for the many factors that affect sales that we have omitted from this
simple model, and it also reflects the intrinsic uncertainty in economic activity.
To complete the economic model, we have to form the algebraic relationship among our economic
variables.
Here, β1, β2, β3 ,β4 ,β5 are unknown coefficient parameters which will be estimated using economic
data and economic technique.
The Simple Linear Regression Model
Simple linear regression is a linear regression model with a single explanatory variable. It examines
the relationship between a y-variable and one x-variable. Where the y-variable is called the
dependent variable(regressand) and the x-variable is called the independent variable(regressor).
y = β1 + β2 x + u
All the models we study are ‘abstractions from reality, and thus, working with models requires
assumptions.
Assumptions
A1. The model is linear in parameters, and The Data is a random sample drawn from the sample.
Y β1 β2X u
*Examples of models that are not linear in parameters - Y β1Xβ2 u
In a regression analysis, one of the objectives is to estimate β2 = ΔE(Yi |xi ) ∕Δxi. – If we are to hope
that a sample of data can be used to estimate the effects of changes in x, then we must observe
some different values of the explanatory variable x in the sample.
The more different values of x and the more variation they exhibit, the better our regression analysis
will be
Linear Regression is the process of finding a line that best fits the data points available on the plot,
so that we can use it to predict output values for given inputs. So, what is the “Best fitting line”? A
Line of best fit is a straight line that represents the best approximation of a scatter plot of data
points and the equation of the line is the fitted linear regression model.
True Model: Y = β1 + β2 + u
^ =^
Fitted Model: Y β1 + ^
β2 X
This principle asserts that to fit a line to the data values we should make the sum of the squares of
the vertical distances from each point to the line as small as possible.
The intercept and slope of this line, the line that best fits the data using the least squares principle,
are b1 and b2, the least squares estimates of β1 and β2.
The vertical distances from each point to the fitted line are the least squares residuals. – They are
given by
u^i = yi – ^
y i= yi -b1-b2xi
The least squares estimate b1 and b2 have the property that the sum of their squared residuals is
less than the sum of squared residuals for any other line.
The least squares principle says that the estimates b1 and b2 of β1 and β2 are the ones to use, since
the line using them as intercept and slope fits the data best.
– Given the sample observations on y and x, we want to find values for the unknown parameters β1
and β2
The formulas for the least squares estimates of β1 and β2 that give the minimum of the sum of
squared residuals are
We will call the estimators b1 and b2, given in equations (2.7) and (2.8), the ordinary least squares
estimators
i=1
Since the points (yi, xi) have been observed, the sum of squares function S depends only on the
unknown parameters β1 and β2.
Our task is to find, out of all the possible values β1 and β2, the point (b1, b2) at which the sum of
squares function S is a minimum
The minimization:
We can find the least squares estimates by solving these two linear equations for b1 and b2.
To solve for b2, multiply (2A.3) by ∑ xi, multiply (2A.4) by N, then subtract the first equation from the
second, and then isolate b2 on the left-hand side
To solve for b1, given b2, divide both sides of (2A.3) by N and rearrange
Then
The second useful fact is similar to the first, and it is
We know that
If the numerator and denominator of b2 in (2A.5) are divided by N, then using, (2B.1)–(2B.3),we can
rewrite b2 in deviation from the mean form as
Example 1:
Hourly earnings in 1992, EARNINGS, measured in dollars, being regressed on schooling, S, measured
as highest grade completed, for the 570 respondents in EAEF Data Set 21. The Stata output for the
regression is shown below. The scatter diagram and regression line are shown in Figure 2.8.
Interpreting it, the slope coefficient indicates that, as S increases by one unit (of S), EARNINGS
increases by 1.07 units (of EARNINGS). Since S is measured in years, and EARNINGS is measured in
dollars per hour, the coefficient of S implies that hourly earnings increase by $1.07 for every extra
year of schooling
Example 2:
• Y^ = 14.4773 + 0.0022Xi
• se (β^ 1) = 6.1523; se (β^ 2) = 0.00032
The slope coefficient means that the no. of cell phone subscriptions increase by 2.2 per 100 person
for a 1000 $ increase in per capita income.The intercept value of about 14.47 suggests that even if
the per capita income is zero, the average number of cell phone subscribers is about 14 per 100
subscribers.R2 value of 0.6024 means 60.24 % of the variation in cellphone data can be explained by
Per Capita Income
PC Demand
• Y^ = −6.5833 + 0.0018Xi
• se (β^ 1) = 2.7437; se (β^ 2) = 0.00014
The slope coefficient means that the no. of pc subscriptions increase by approx. 2 per 100 person for
a 1000 $ increase in per capita income.The negative intercept value has no practical significance
Multiple Regression Model
A2. The covariance between different error terms e and e , conditional on X, is zero
i j
var(ei|X) = σ2
A4. The conditional expectation of the random error ei, given all explanatory variable observations
X={xi = 1, 2, … , N}, is zero.
E(ei|X) = 0
A5. It is not possible to express one of the explanatory variables as an exact linear function of the
others.
Example 1:
^i = 263.6415 – 0.0056PGNPi − 2.2316FLRi
Equation : CM
• The partial regression coefficient of PGNP is −0.0056 and tells us that with the influence of
FLR held constant. If the per capita GNP goes up by a thousand dollars, on average, the
number of deaths of children under age 5 goes down by about 5.6 per thousand live births.
• The coefficient −2.2316 tells us that holding the influence of PGNP constant, on average, the
number of deaths of children under age 5 goes down by about 2.23 per thousand live births
as the female literacy rate increases by one percentage point.
• The intercept value of about 263, mechanically interpreted, means that if the values of PGNP
and FLR rate were fixed at zero, the mean child mortality rate would be about 263 deaths
per thousand live births. More practically, if the two regressors were fixed at zero, child
mortality will be quite high which makes sense.
• The R2 value of about 0.71 means that about 71 percent of the variation in child mortality is
explained by PGNP and FLR
Example 2
Predict life expectancy in years based on 7 predictors — population estimate, illiteracy (population
percentage), murder and non-negligent manslaughter rate per 100k members of the population,
percent high-school graduates, mean number of days with temperature < 32 degrees Fahrenheit,
and land area in square miles grouped by state
• Life expectancy is inversely proportional to Murder rate, for every manslaughter rate per
100k members of the population, the life expectancy decreases by 0.283 year.
• Life expectancy is directly proportional to HS. Grade, i.e., for every increase in HS grade, life
expectancy increases by 0.0499 years.
• Life expectancy is inversely proportional to number of frost days i.e., days with temperature
below 32F. For increase in frost day, life expectancy decreases by 0.00691 years.
Multicollinearity
Multicollinearity occurs when two or more independent variables are highly correlated with
This means that an independent variable can be predicted from another independent
variable in a regression model. For example, height and weight, household income and
water consumption, mileage and price of a car, study time and leisure time, etc.
distinguish between the individual effects of the independent variables on the dependent
variable.
Example
expenditure, and yet neither of the slope coefficients is individually statistically significant.
Earlier, the income variable was statistically insignificant, whereas now it is highly
significant.
Instead, if we regress Y on X3
^ i = 24.4545 + 0.5091X2i
Equation : Y
We see that wealth has now a significant impact on consumption expenditure, whereas earlier, it
had no effect on consumption expenditure. These regressions show very clearly that in situations of
extreme multicollinearity dropping the highly collinear variable will often make the other X variable
statistically significant.
Heteroscedasticity
Where the conditional variance of the Y population varies with X. This situation in known
appropriately as heteroscedasticity or unequal spread or variance .
heteroscedasticity.
The usual F-tests and t-tests are not valid under heteroscedasticity
Including all squares and interactions leads to a large number of estimated parameters (e.g.
k=6 leads to 27 parameters to be estimated)
Weighted least squares estimation
Autocorrelation
The disturbance or error term of the model is independent. Symbolically, it means that, for
the model:
The variables like sex (male or female) and colour (black, white) are defined on a nominal scale. Such
variables usually indicate the presence or absence of a “quality”. Such variables can be quantified by
artificially constructing the variables that take the values, e.g., 1 and 0 where “1” usually indicates
the presence of an attribute and “0” usually indicates the absence of the attribute.
Usually, the indicator variables take on the values 0 and 1 to identify the mutually exclusive classes
of the explanatory variables. For example, 1 if person is male 0 if person is female, 1 if person is
employed 0 if person is unemployed.
Example: Consider the following model with ‘1’ as quantitative and D2 as an indicator variable
Interpretation : The estimated coefficients in represent the average, or mean, sales of refrigerators
(in thousands of units) in each season (i.e., quarter). Thus, the average sale of refrigerators in the
first quarter, in thousands of units, is about 1,222, that in the second quarter about 1,468, that in
the third quarter about 1,570, and that in the fourth quarter about 1,160.
Incidentally, instead of assigning a dummy for each quarter and suppressing the intercept term to
avoid the dummy variable trap, we could assign only three dummies and include the intercept term.
Suppose we treat the first quarter as the reference quarter and assign dummies to the second, third,
and fourth quarters. This produces the following regression results :
Interpretation : Since we are treating the first quarter as the benchmark, the coefficients attached to
the various dummies are now differential intercepts, showing by how much the average value of Y in
the quarter that receives a dummy value of 1 differs from that of the benchmark quarter. Put
differently, the coefficients on the seasonal dummies will give the seasonal increase or decrease in
the average value of Y relative to the base season.
THE LOGIT and PROBIT Model
Where X is income and Y -1 means the family owns a house. But now consider the
following representation of home ownership:
Where Zi = 1 + 2Xi
Equation 3 represent what is known as the cumulative logistic distribution function.
It is easy to verify that as Zi ranges from - to + , Pi Ranges between 0 and 1 and
that Pi is nonlinearly related to Zi (i.e. Xi )
Thus satisfying the two requirements considered earlier. But it seems that in
satisfying these requirements, we have created an estimation problem because Pi is
nonlinear not only in X but also in the 's as can be seen clearly from (2).
This means that we 172 cannot use the familier OLS procedure to estimate the
parameters. But this problems is more apparent than real because (2) can be
linearized, which can be shown as follows.
If Pi the probability fo owning a house, is given by (3) then (1-Pi), the probability of
not owning a house, is
Now Pi/(1 - Pi) is simply the odds ratio in favour of owning a house- the ratio of the probability that a
family will own a house to the probability that it will not own a house. Thus if Pi = 0.8, it means that
odds are 4 to 1 in favour of the family owning a house.
That is, L, the log of the odds ratio, is not only linear in X, but also (from the estimation viewpoint0
linear in the parameters. L is called the logit, and hence the name logit model for models like (6)
Notice these features of the logit model.
That is, L, the log of the odds ratio, is not only linear in X, but also (from the estimation
viewpoint0 linear in the parameters. L is called the logit, and hence the name logit model for
1. As P goes from 0 to 1 (i.e. as Z varies from - ¥ to + ¥), the logit L goes from - ¥ to + ¥.
That is, although the probabilities (of necessity) lie between 0 and 1, the logits are
not so bounded.
2. Although l is linear in X, the probabilities themselves are not. This property is in
contrast with the LPM model (1) where the probabilities increase linearly with X.
3. Although we have included only a single X variable, or regressor, in the
preceding model, one can add as many regressors as may be dictated by the
underlying theory.
4. If L, the logit, is positive, it means that when the value of the regressor(s) increases,
the odds that the regress and equals 1 (meaning some event of interest happens)
increases. If L is negative, the odds that the regress and equal 1 decreases as the
value of X increases. To put it differently, the logit becomes negative and
increasingly large in magnitude as the odds ratio decreases from 1 to 0 and becomes
increasingly large and positive as the odds ratio increases from 1 to infinity.
5. More formally, the interpretation of the logit model given in (6) is as follows: b 2, the
slope, measures the change in L for unit change in X, that is, it tells how the log –
odds in favor of owning a house change an income changes by a unit, say $ 1000.
The intercept b1 is the value of the log odds in favour of owning a house if income is
zero. Like most interpretations of intercepts, this interpretation may not have any
physical meaning.
6. Given a certain level of income, say, X, if we actually want to estimate not the odds in
favor of owning a house but the probability of owning a house itself, this can be done
directly from (3) once the estimate of b1 + b2 are available. This, however, raises the
most important question. How do we estimate b1 and b2 in the first place? The
answer is given in the next section.
7. Whereas the LPM assumes that Pi is linearly related to Xi the logit model assumes
that the log of the odds ratio is linearly related to Xi.
How is the (unobservable0 index related to the actual decision to own a house? As before let Y= 1, if
the family owns a house and Y = 0 if it does not. Now it is reasonable to assume that there is a
critical or threshold level of the index, call it 𝐼∗such that if Ii*exceeds 𝐼∗, the family will own a house,
otherwise it will not.
The threshold 𝐼∗Like Ii is not observable, but if we assume that it is normally distributed with the
same mean and variance, it is possible not only to estimate the parameters of the index given in (1).
But also to get some information about the unobservable index itself. This calculation is also follows.
Given the assumption of normality, the probability that 𝐼∗ is less than or equal to Ii can be computed
from the standardized normal CDF as.
Now to obtain information, in Ii the utility index, as well as𝑖 𝑎𝑛𝑑 2 , we take the inverse of (2) to
obtain.
Where F-1 is the inverse of the normal CDF. What all this means can be made clear from
Figure in panel a of this figure we obtain from the ordinate the (cumulative) probability of
owning a house given 𝐼∗£ 𝐼𝑖 whereas in panel b we obtain from the abscissa the value of Ii
𝑖
Given the value of Pi which is simply the reverse of the former.