Chapter 2 Econometrics
Chapter 2 Econometrics
Simple Linear Regression Model The basic idea of regression is to estimate the population
parameters from a sample.
• Economic theories are mainly concerned with the
relationships among various economic variables. Stochastic and non-stochastic relationships
• These relationships can predict the effect of one variable • A relationship between X and Y, characterized as Y = f(X) is
on another. said to be non-stochasticif for each value of the independent
• The functional relationships of these variables define the variable (X) there is one and only one corresponding value of
dependence of one variable upon the other variable in dependent variable (Y). That is
the specific form. = + −−−−−− −(1)
• The specific functional forms may be linear, quadratic,
logarithmic, exponential, or any other form. • A relationship between X and Y is said to be stochastic if for
a particular value of X there is a whole probabilistic
• Regression analysis is concerned with the study of the distribution of values of Y.
dependence of one variable, the dependent variable, on
one or more other variables, the explanatory variables, = + + −−−−−− −(2)
1 2
Assumptions
U may be a cumulative effect of the following i. The model is linear in parameters:
factors: This is because if the parameters are non-linear it is difficult to
estimate them
• Omission of variables from the function
• Random behavior of human beings ii. is a random real number.
• Imperfect specification of the mathematical This means that the value which u may assume in any one period
depends on chance; it may be positive, negative or zero.
form of the model
iii. The mean value of in any particular period is zero.
• Error of aggregation
the positive and negative values of u canceleach other.
• Error of measurement
iv. The variance of is constant in each period
1
3/15/2023
v. The random variable ( ) has a normal distribution viii. is independent of the explanatory variables.
This means the values of u (for each x) have a bell shaped This means there is no correlation between the random
symmetrical distribution about their zero mean and variable and the explanatory variable.
constant variance , i.e.
If two variables are unrelated their covariance is zero.
vi. The correlation between any two uiand uj is zero (The
assumption of no autocorrelation). ix. The explanatory variables are measured without error
Algebraically, regressors are error free, while y values may or may not
Cov(uiuj ) [(ui (ui )][uj (uj )] include errors of measurement.
We can now use the above assumptions to derive the following basic concepts. Methods of Estimation
• Specifying the model and stating its underlying
A. The dependent variable Yi is normally distributed. assumptions are the first stage of any econometric
i.e Yi ~ N( xi ), 2 ………………………………(2.7)
application.
Proof:
• The next step is the estimation of the numerical values of the
parameters of economic relationships.
Mean: (Y) xi ui
• The parameters of the simple linear regression model can be
Xi Since (ui ) 0
estimated by various methods.
Variance: Var(Yi ) Yi (Yi )
2
2
3/15/2023
Cont… ∑ − ( − )
• From equation (4), we get =
∑ −
= + −−−− −(6) In deviation form
∑
• Then dividing by n and rearrange = −−−−−− −(10)
∑
= − −−−−−− −(7)
Second Order Condition (SOC):
• From equation (5), we obtain
• ∑ is minimized when the SOC is positive.
= + −−− −(8)
Take the second order derivative of ∑ with respect to β
Equation (6) and (8) are called the Normal Equations. ∑
• Substitute = − in equation 8, we get: =2
β
∑ − which is positive, so that a true minimum has been
= −−−−− −(9)
∑ − obtained.
OR 13 14
‘Closeness’ of OLS estimate to the true population parameter is An estimator is called BLUE if:
measured by the mean and variance of the sampling distribution of
1. Linear: a linear function of the random variable ,Y.
the estimate of the different econometric methods.
we assume that we get a very large number of samples each of size 2. Unbiased: its average or expected value is equal to the true
‘n’; we compute the estimate from each sample, and for population parameter.
each econometric method and we form their distribution. 3. Minimum variance: It has a minimum variance in the class of
We next compare the expected values and the variances of these linear and unbiased estimators. An unbiased estimator with the
distributions and we choose among the alternative estimates least variance is known as best/ efficient estimator.
whose distribution is concentrated as close as possible around the
The detailed proof of these properties are presented below
true parameter. 15 16
Cont…
17 18
3
3/15/2023
Cont…
19 20
Cont… B
21 22
Cont… Cont...
24
23
4
3/15/2023
cont… Cont..
25 26
Cont… Cont…
27 28
Cont… Cont…
29 30
5
3/15/2023
Cont… Cont…
31 32
Cont…
33 34
Hypothesis Testing
A. Test of Overall Significance (Tests of the ‘goodness of fit’
with R2)
• It is the most commonly used measure of the goodness of fit
of a regression line.
• It is used to find out how “well” the sample regression line
fits the data.
• R2 shows the percentage of total variation of the dependent
variable that can be explained by the changes in the
explanatory variable(s) included in the model.
• To compute R2, recall that
= +
Or in the deviation form
= +
35 36
6
3/15/2023
∑
OR =1− =1−∑ Derivation of F (ANOVA) regression(Tests for the
coefficient of determination =R2 )
= 1−
The largest value that R2 can assume is 1 (in which case all
observations fall on the regression line), and the smallest it
• The value R2 falls between 0 and 1, i.e., 0 ≤ ≤ 1. can assume is zero.
• If R2 is 0, there is no relationship between the regressand and the A low value of R2 is an indication that:
regressor whatsoever (i.e. = 0 or = = ).
X is a poor explanatory variable in the sense that
• R2 of 1 means a perfect fit, that is = for each i.
– variation in X leaves Y unaffected, or
• Suppose R2 = 0.9 , this means that the regression line gives a good
– while X is a relevant variable, its influence on Y is weak as
fit to the observed data since this line explains 90% of the total
compared to some other variables that are omitted from the
variation of the Y value around their mean.
regression equation, or
The remaining 10% of the total variation in Y is unaccounted for by
the regression equation is misspecified (for example, an
the regression line and is attributed to the factors included in the
exponential relationship might be more appropriate).
disturbance variable
39 40
• This can be done by using various tests. The most common • First: Compute standard error of the parameters:
ones are:
SE(ˆ) var(ˆ)
i)Standard error test
SE(ˆ) var(ˆ)
ii) Student’s t-test
• Second: compare the standard errors with the numerical
ii) Confidence interval test
values of .
• All of these testing procedures reach on the same conclusion.
41 42
7
3/15/2023
that ̂i is statistically insignificant. • We can derive the t-value of the OLS estimates:
that ̂i is statistically significant. Since we have two parameters in simple linear regression with intercept different from zero, our
degree of freedom is n-2.
Numerical example: Suppose that from a sample of size n=30, we estimate the following
supply function.
Like the standard error test we formally test the hypothesis: H0 : i 0 against the alternative
Q 120 0.6 p ei
SE : (1.7) (0.025) H1 : i 0 for the slope parameter; and H0 : 0 against the alternative H1 : 0 for the
Test the significance of the slope parameter at 5% level of significance using the standard error test. intercept.
43
To undertake the above test we follow the following steps. 44
Step 1: Calculate t*, which is called the calculated value of t, Step 3: Check whether there is one tail test or two tail test
by taking the value of in the null hypothesis. In our case = 0,
• If the inequality sign in H1 is ≠ , then, this implies a two tail test
then t* becomes:
and divide the chosen level of significance by two; decide the
critical value of t called tc.(T-tabulated).
• But if the inequality sign is either > or < then, it indicates one tail
test and there is no need to divide the chosen level of significance
Step 2: Choose level of significance. by two to obtain the critical value from the t-table.
• Level of significance is the probability of making ‘wrong’ decision, Step 4: Obtain critical value of t, called tc at 2 and n-2 degree of freedom for two tail test.
i.e. the probability of rejecting the null hypothesis while it is Step 5: Compare t* (the computed value of t) and tc (critical value of t)
true and acceptable or the probability of committing a type I
error. If t*> tc , reject H0 and accept H1. The conclusion is ̂ is statistically significant.
• It is usual in econometric research to choose the 1%, 5% or 10% If t*< tc , accept H0 and reject H1. The conclusion is ̂ is statistically insignificant.
level of significance.
This means that in making the decision, we allow (tolerate) one/five /ten
46
out of hundred to be ‘wrong’. 45
b. Since the alternative hypothesis (H1) is stated by inequality sign ( ) ,it is a two tail test, hence
Numerical Example:
we divide
2 0.05 2 0.025 to obtain the critical value of ‘t’ at
2 =0.025 and 18 degree of
Suppose that from a sample size n=20 we estimate the following consumption function:
freedom (df) i.e. (n-2=20-2). From the
C 100 0.70e
t-table ‘tc’ at 0.025 level of significance and 18 df is 2.10.
(75.5) (0.21)
c. Since t*=3.3 and tc=2.1, t*>tc. It implies that ̂ is statistically significant.
The values in the brackets are standard errors. We want to test the null hypothesis: H0 : i 0 against
ˆ 0 ˆ 0.70
t* = 3.3
SE() SE(ˆ)
ˆ 0.21
47 48
8
3/15/2023
ˆ 2.88
Numerical Example:
Suppose we have estimated the following regression line from a sample of 20 observations. SE(ˆ) 0.85
ˆ SE(ˆ)tc
51 52