0% found this document useful (0 votes)
108 views

Short - Notes - Econometric Methods

1) Econometrics uses economic theory, data, and statistical tools to build models to predict outcomes and test hypotheses. 2) Econometric models represent economic relationships as functions but include a random error term to account for unpredictability. 3) Ordinary least squares regression estimates the coefficients in a linear regression model by minimizing the sum of squared residuals between the actual and predicted values of the dependent variable.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views

Short - Notes - Econometric Methods

1) Econometrics uses economic theory, data, and statistical tools to build models to predict outcomes and test hypotheses. 2) Econometric models represent economic relationships as functions but include a random error term to account for unpredictability. 3) Ordinary least squares regression estimates the coefficients in a linear regression model by minimizing the sum of squared residuals between the actual and predicted values of the dependent variable.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Econometric Methods

What is Econometrics?
Fundamental of economic measurement. It is a set of research tools employed in various disciplines.

In economics, we express our ideas about relationships between economic variables using the mathematical
concept of a function. For example, to express a relationship between income and consumption. i.e

CONSUMPTION = f(INCOME)

Econometrics is about how we can use theory and data from economics, business, and the social
sciences, along with tools from statistics, to predict outcomes, answer “how much” type questions,
and test hypotheses

The Econometric Model

In Econometric model the economic relations are not exact. While studing a data, we recognize that
some of it is a systematic part and a random and a unpredictable component ‘u’ (which is called
Random error)

For Ex:

The demand for an individual commodity say, the Honda Accord might be expressed as

Qd = 𝑓(P, Ps , Pc , INC)

which says that the quantity of Honda Accords demanded, Qd , is a function f (P, Ps , Pc , INC) of the
price of Honda Accords P, the price of cars that are substitutes Ps , the price of items that are
complements Pc (like gasoline), and the level of income INC.

Hence the econometric model for the representation of Honda Accords is

Qd = 𝑓(P, Ps , Pc , INC) + u

The random error e accounts for the many factors that affect sales that we have omitted from this
simple model, and it also reflects the intrinsic uncertainty in economic activity.

To complete the economic model, we have to form the algebraic relationship among our economic
variables.

𝑓(P, Ps , Pc , INC) = β1 + β2P + β3Ps + β4Pc + β5 INC + u

Here, β1, β2, β3 ,β4 ,β5 are unknown coefficient parameters which will be estimated using economic
data and economic technique.
The Simple Linear Regression Model
Simple linear regression is a linear regression model with a single explanatory variable. It examines
the relationship between a y-variable and one x-variable. Where the y-variable is called the
dependent variable(regressand) and the x-variable is called the independent variable(regressor).

y = β1 + β2 x + u

All the models we study are ‘abstractions from reality, and thus, working with models requires
assumptions.
Assumptions
A1. The model is linear in parameters, and The Data is a random sample drawn from the sample.

Y  β1  β2X  u
*Examples of models that are not linear in parameters - Y  β1Xβ2  u

A2. There is some variation in the regressor in the sample.

In a regression analysis, one of the objectives is to estimate β2 = ΔE(Yi |xi ) ∕Δxi. – If we are to hope
that a sample of data can be used to estimate the effects of changes in x, then we must observe
some different values of the explanatory variable x in the sample.
The more different values of x and the more variation they exhibit, the better our regression analysis
will be

A3. The disturbance term has zero expectation

Eui  0 for all i


A4. The disturbance term is homoscedastic

ui  u for all i


We assume the disturbance term is homoscedastic, meaning its value in each observation is drawn
from a distribution with constant population variance.

A5. The values of the disturbance term have independent distributions

ui is distributed independently of ui for all j ≠ i


For example, just because the disturbance term is large and positive in one observation, there
should be no tendency for it to be large and positive in the next (or large and negative, for that
matter, or small and positive, or small and negative).

A6. The disturbance term has a normal distribution


Linear Regression is the process of finding a line that best fits the data points available on the plot,
so that we can use it to predict output values for given inputs. So, what is the “Best fitting line”? A
Line of best fit is a straight line that represents the best approximation of a scatter plot of data
points and the equation of the line is the fitted linear regression model.

The Line of Best Fit

Linear Regression is the process of finding a line that best fits the data points available on the plot,
so that we can use it to predict output values for given inputs. So, what is the “Best fitting line”? A
Line of best fit is a straight line that represents the best approximation of a scatter plot of data
points and the equation of the line is the fitted linear regression model.

True Model: Y = β1 + β2 + u
^ =^
Fitted Model: Y β1 + ^
β2 X

The Least Square Principle


To Estimate β1 and β2 we use least square principle.

This principle asserts that to fit a line to the data values we should make the sum of the squares of
the vertical distances from each point to the line as small as possible.

The intercept and slope of this line, the line that best fits the data using the least squares principle,
are b1 and b2, the least squares estimates of β1 and β2.

The vertical distances from each point to the fitted line are the least squares residuals. – They are
given by

u^i = yi – ^
y i= yi -b1-b2xi
The least squares estimate b1 and b2 have the property that the sum of their squared residuals is
less than the sum of squared residuals for any other line.

The least squares principle says that the estimates b1 and b2 of β1 and β2 are the ones to use, since
the line using them as intercept and slope fits the data best.

The problem is to find b1 and b2 in a convenient way.

– Given the sample observations on y and x, we want to find values for the unknown parameters β1
and β2

that minimize the “sum of squares” function

The formulas for the least squares estimates of β1 and β2 that give the minimum of the sum of
squared residuals are

We will call the estimators b1 and b2, given in equations (2.7) and (2.8), the ordinary least squares
estimators

Derivation of Ordinary Least Square Model


Given the sample observations on y and x, we want to find values for the unknown parameters β1
and β2 that minimize the “sum of squares” function
n
S(β1, β2) = ∑ ( yi−β 1−β 2 xi)
2

i=1

Since the points (yi, xi) have been observed, the sum of squares function S depends only on the
unknown parameters β1 and β2.

Our task is to find, out of all the possible values β1 and β2, the point (b1, b2) at which the sum of
squares function S is a minimum

The minimization:

Partial derivatives of S with respect to β1 and β2 are


Algebraically, to obtain the point (b1, b2), we set (2A.2) to zero and replace β1 and β2 by b1 and b2,
respectively, to obtain

Simplifying these gives equations usually known as the normal equations:

These two equations have two unknowns b1 and b2.

We can find the least squares estimates by solving these two linear equations for b1 and b2.

To solve for b2, multiply (2A.3) by ∑ xi, multiply (2A.4) by N, then subtract the first equation from the
second, and then isolate b2 on the left-hand side

To solve for b1, given b2, divide both sides of (2A.3) by N and rearrange

To convert b2 into deviation from the mean form, we use

Then
The second useful fact is similar to the first, and it is

This result is proven in a similar manner

We know that

If the numerator and denominator of b2 in (2A.5) are divided by N, then using, (2B.1)–(2B.3),we can
rewrite b2 in deviation from the mean form as

Example 1:

Hourly earnings in 1992, EARNINGS, measured in dollars, being regressed on schooling, S, measured
as highest grade completed, for the 570 respondents in EAEF Data Set 21. The Stata output for the
regression is shown below. The scatter diagram and regression line are shown in Figure 2.8.
Interpreting it, the slope coefficient indicates that, as S increases by one unit (of S), EARNINGS
increases by 1.07 units (of EARNINGS). Since S is measured in years, and EARNINGS is measured in
dollars per hour, the coefficient of S implies that hourly earnings increase by $1.07 for every extra
year of schooling

Example 2:

Cell Phone Demand

• Y^ = 14.4773 + 0.0022Xi
• se (β^ 1) = 6.1523; se (β^ 2) = 0.00032

The slope coefficient means that the no. of cell phone subscriptions increase by 2.2 per 100 person
for a 1000 $ increase in per capita income.The intercept value of about 14.47 suggests that even if
the per capita income is zero, the average number of cell phone subscribers is about 14 per 100
subscribers.R2 value of 0.6024 means 60.24 % of the variation in cellphone data can be explained by
Per Capita Income

PC Demand

• Y^ = −6.5833 + 0.0018Xi
• se (β^ 1) = 2.7437; se (β^ 2) = 0.00014

The slope coefficient means that the no. of pc subscriptions increase by approx. 2 per 100 person for
a 1000 $ increase in per capita income.The negative intercept value has no practical significance
Multiple Regression Model

Multiple regression simultaneously considers the influence of multiple explanatory variables on a


response variable Y

The multiple regression model equation is

Y = 0 + 1x1 + 2x2 + ... + pxp + ε


where E(ε) = 0 and Var(ε) =  2. Again, it is assumed that ε is normally distributed.

The regression coefficient 1 is interpreted as the expected change in Y associated with a 1-


unit increase in x1 while x2..., xp are held fixed. Analogous interpretations hold for 2,..., p.
Thus, these coefficients are called partial or adjusted regression coefficients. In contrast, the
simple regression slope is called the marginal (or unadjusted) coefficient.

Assumptions of the Multiple


Regression Model
A1. Econometric Model

Y = 0 + 1x1 + 2x2 + ... + pxp + ε

A2. The covariance between different error terms e and e , conditional on X, is zero
i j

cov(ei, ej|X) = 0 for i ≠ j

A3. The variance of the error term, conditional on X, is a constant

var(ei|X) = σ2

A4. The conditional expectation of the random error ei, given all explanatory variable observations
X={xi = 1, 2, … , N}, is zero.

E(ei|X) = 0

A5. It is not possible to express one of the explanatory variables as an exact linear function of the
others.

Example 1:
^i = 263.6415 – 0.0056PGNPi − 2.2316FLRi
Equation : CM

• The partial regression coefficient of PGNP is −0.0056 and tells us that with the influence of
FLR held constant. If the per capita GNP goes up by a thousand dollars, on average, the
number of deaths of children under age 5 goes down by about 5.6 per thousand live births.
• The coefficient −2.2316 tells us that holding the influence of PGNP constant, on average, the
number of deaths of children under age 5 goes down by about 2.23 per thousand live births
as the female literacy rate increases by one percentage point.
• The intercept value of about 263, mechanically interpreted, means that if the values of PGNP
and FLR rate were fixed at zero, the mean child mortality rate would be about 263 deaths
per thousand live births. More practically, if the two regressors were fixed at zero, child
mortality will be quite high which makes sense.
• The R2 value of about 0.71 means that about 71 percent of the variation in child mortality is
explained by PGNP and FLR

Example 2
Predict life expectancy in years based on 7 predictors — population estimate, illiteracy (population
percentage), murder and non-negligent manslaughter rate per 100k members of the population,
percent high-school graduates, mean number of days with temperature < 32 degrees Fahrenheit,
and land area in square miles grouped by state

• Life expectancy is inversely proportional to Murder rate, for every manslaughter rate per
100k members of the population, the life expectancy decreases by 0.283 year.
• Life expectancy is directly proportional to HS. Grade, i.e., for every increase in HS grade, life
expectancy increases by 0.0499 years.
• Life expectancy is inversely proportional to number of frost days i.e., days with temperature
below 32F. For increase in frost day, life expectancy decreases by 0.00691 years.
Multicollinearity
Multicollinearity occurs when two or more independent variables are highly correlated with

one another in a regression model.

This means that an independent variable can be predicted from another independent

variable in a regression model. For example, height and weight, household income and

water consumption, mileage and price of a car, study time and leisure time, etc.

Multicollinearity can be a problem in a regression model because we would not be able to

distinguish between the individual effects of the independent variables on the dependent

variable.

Ways to alleviate Multicollinearity

(1) Increase the number of observations.

(2) Reduce by including further relevant variables in the model.

(3) Drop some of the correlated variables.

Example

Equation : Y^i = 24.7747 + 0.9415X2i − 0.0424X3i


It shows that income and wealth together explain about 96 percent of the variation in consumption

expenditure, and yet neither of the slope coefficients is individually statistically significant.

It shows that there is almost perfect collinearity between X3 and X2.

Now let us see what happens if we regress Y on X2 only:

Equation : Y^ i = 24.4545 + 0.5091X2i

Earlier, the income variable was statistically insignificant, whereas now it is highly

significant.

Instead, if we regress Y on X3

^ i = 24.4545 + 0.5091X2i
Equation : Y
We see that wealth has now a significant impact on consumption expenditure, whereas earlier, it

had no effect on consumption expenditure. These regressions show very clearly that in situations of

extreme multicollinearity dropping the highly collinear variable will often make the other X variable

statistically significant.

Heteroscedasticity

Where the conditional variance of the Y population varies with X. This situation in known
appropriately as heteroscedasticity or unequal spread or variance .

Consequences of heteroscedasticity for OLS

 OLS still unbiased and consistent under

heteroscedasticity.

 Also, interpretation of R-squared is not change

 Heteroscedasticity invalidates variance formulas for OLS estimators

 The usual F-tests and t-tests are not valid under heteroscedasticity

 Under heteroscedasticity, OLS is no longer the best linear unbiased


estimator (BLUE); there may be more efficient linear estimators

Testing for heteroscedasticity


It may still be interesting whether there is heteroscedasticity because then OLS
may not be the most efficient linear estimator anymore

 Breusch-Pagan test for heteroscedasticity


 White test for heteroscedasticity

Disadvantage of this form of the White test

 Including all squares and interactions leads to a large number of estimated parameters (e.g.
k=6 leads to 27 parameters to be estimated)
 Weighted least squares estimation

Example: Savings and income

 The transformed model is homoscedastic

 OLS applied to the transformed model is the best linear unbiased


estimator
 OLS in the transformed model is weighted least squares (WLS)
Why is WLS more efficient than OLS in the original model?
 Observations with a large variance are less informative than observa- tions with small
variance and therefore should get less weight

Example: Financial wealth equation

Autocorrelation

 The disturbance or error term of the model is independent. Symbolically, it means that, for
the model:

 Autocorrelation is just as correlation measures the extent of a linear relationship between


two variables and it measures the linear relationship between lagged values of a time series.
It is a characteristic of data which shows the degree of similarity between the values of the
same variables over successive time intervals
 Autocorrelation is diagnosed using a correlogram (ACF plot).
 In presence of the autocorrelation in data, the ordinary least square (OLS) estimation
technique can’t be applied as the estimate violate the BLUE property.
Dummy Variable Models

The variables like sex (male or female) and colour (black, white) are defined on a nominal scale. Such
variables usually indicate the presence or absence of a “quality”. Such variables can be quantified by
artificially constructing the variables that take the values, e.g., 1 and 0 where “1” usually indicates
the presence of an attribute and “0” usually indicates the absence of the attribute.

Usually, the indicator variables take on the values 0 and 1 to identify the mutually exclusive classes
of the explanatory variables. For example, 1 if person is male 0 if person is female, 1 if person is
employed 0 if person is unemployed.

Example: Consider the following model with ‘1’ as quantitative and D2 as an indicator variable

The interpretation of the result is essential. We proceed as follows:


So multicollinearity is present in such cases. Hence the rank of the matrix of explanatory variables falls short by
1. So β0,1 and 2 are indeterminate, and least-squares method breaks down. So, the proposition of
introducing two indicator variables is useful, but they lead to serious consequences. This is known as the
dummy variable trap.
Example: Seasonal Analysis

^ t = 1,222.125D1t + 1,467.500D2t + 1,569.750D3t + 1,160.000D4t


Equation : Y

Interpretation : The estimated coefficients in represent the average, or mean, sales of refrigerators
(in thousands of units) in each season (i.e., quarter). Thus, the average sale of refrigerators in the
first quarter, in thousands of units, is about 1,222, that in the second quarter about 1,468, that in
the third quarter about 1,570, and that in the fourth quarter about 1,160.

Incidentally, instead of assigning a dummy for each quarter and suppressing the intercept term to
avoid the dummy variable trap, we could assign only three dummies and include the intercept term.
Suppose we treat the first quarter as the reference quarter and assign dummies to the second, third,
and fourth quarters. This produces the following regression results :

^ t = 1,222.13 + 245.37D2t + 347.62D3t + -62.12D4t


Equation : Y

Interpretation : Since we are treating the first quarter as the benchmark, the coefficients attached to
the various dummies are now differential intercepts, showing by how much the average value of Y in
the quarter that receives a dummy value of 1 differs from that of the benchmark quarter. Put
differently, the coefficients on the seasonal dummies will give the seasonal increase or decrease in
the average value of Y relative to the base season.
THE LOGIT and PROBIT Model

THE LOGIT MODEL

 Where X is income and Y -1 means the family owns a house. But now consider the
following representation of home ownership:

 For ease of exposition, we write (2) as

 Where Zi = 1 + 2Xi
 Equation 3 represent what is known as the cumulative logistic distribution function.
It is easy to verify that as Zi ranges from -  to + , Pi Ranges between 0 and 1 and
that Pi is nonlinearly related to Zi (i.e. Xi )
 Thus satisfying the two requirements considered earlier. But it seems that in
satisfying these requirements, we have created an estimation problem because Pi is
nonlinear not only in X but also in the 's as can be seen clearly from (2).
 This means that we 172 cannot use the familier OLS procedure to estimate the
parameters. But this problems is more apparent than real because (2) can be
linearized, which can be shown as follows.
 If Pi the probability fo owning a house, is given by (3) then (1-Pi), the probability of
not owning a house, is
Now Pi/(1 - Pi) is simply the odds ratio in favour of owning a house- the ratio of the probability that a
family will own a house to the probability that it will not own a house. Thus if Pi = 0.8, it means that
odds are 4 to 1 in favour of the family owning a house.

That is, L, the log of the odds ratio, is not only linear in X, but also (from the estimation viewpoint0
linear in the parameters. L is called the logit, and hence the name logit model for models like (6)
Notice these features of the logit model.

That is, L, the log of the odds ratio, is not only linear in X, but also (from the estimation
viewpoint0 linear in the parameters. L is called the logit, and hence the name logit model for

models like (6) Notice these features of the logit model.

1. As P goes from 0 to 1 (i.e. as Z varies from - ¥ to + ¥), the logit L goes from - ¥ to + ¥.
That is, although the probabilities (of necessity) lie between 0 and 1, the logits are
not so bounded.
2. Although l is linear in X, the probabilities themselves are not. This property is in
contrast with the LPM model (1) where the probabilities increase linearly with X.
3. Although we have included only a single X variable, or regressor, in the
preceding model, one can add as many regressors as may be dictated by the
underlying theory.

4. If L, the logit, is positive, it means that when the value of the regressor(s) increases,
the odds that the regress and equals 1 (meaning some event of interest happens)
increases. If L is negative, the odds that the regress and equal 1 decreases as the
value of X increases. To put it differently, the logit becomes negative and
increasingly large in magnitude as the odds ratio decreases from 1 to 0 and becomes
increasingly large and positive as the odds ratio increases from 1 to infinity.
5. More formally, the interpretation of the logit model given in (6) is as follows: b 2, the
slope, measures the change in L for unit change in X, that is, it tells how the log –
odds in favor of owning a house change an income changes by a unit, say $ 1000.
The intercept b1 is the value of the log odds in favour of owning a house if income is
zero. Like most interpretations of intercepts, this interpretation may not have any
physical meaning.
6. Given a certain level of income, say, X, if we actually want to estimate not the odds in
favor of owning a house but the probability of owning a house itself, this can be done
directly from (3) once the estimate of b1 + b2 are available. This, however, raises the
most important question. How do we estimate b1 and b2 in the first place? The
answer is given in the next section.
7. Whereas the LPM assumes that Pi is linearly related to Xi the logit model assumes
that the log of the odds ratio is linearly related to Xi.

THE PROBIT MODEL


The estimating model that emerges from the normal CDF is popularly probit model.
To motivate the probit model, assume that in our home ownership example the decision of
the ith Family to own a house or not depends on an unobservable utility index I i(also known
as a latent variable), that is determined by one or more explanatory variables, say income X i
in such a way that the larger the value of the index I i The greater the probability of a family
owning a house. We express the index Ii as.

Where Xi is the income of the ith Family.

How is the (unobservable0 index related to the actual decision to own a house? As before let Y= 1, if
the family owns a house and Y = 0 if it does not. Now it is reasonable to assume that there is a
critical or threshold level of the index, call it 𝐼∗such that if Ii*exceeds 𝐼∗, the family will own a house,
otherwise it will not.
The threshold 𝐼∗Like Ii is not observable, but if we assume that it is normally distributed with the
same mean and variance, it is possible not only to estimate the parameters of the index given in (1).
But also to get some information about the unobservable index itself. This calculation is also follows.

Given the assumption of normality, the probability that 𝐼∗ is less than or equal to Ii can be computed
from the standardized normal CDF as.

Now to obtain information, in Ii the utility index, as well as𝑖 𝑎𝑛𝑑  2 , we take the inverse of (2) to
obtain.

Where F-1 is the inverse of the normal CDF. What all this means can be made clear from
Figure in panel a of this figure we obtain from the ordinate the (cumulative) probability of
owning a house given 𝐼∗£ 𝐼𝑖 whereas in panel b we obtain from the abscissa the value of Ii
𝑖
Given the value of Pi which is simply the reverse of the former.

You might also like