0% found this document useful (0 votes)
0 views

Chapter 2 Econometrics

The document discusses the concept of simple linear regression, focusing on the estimation of population parameters from sample data and the relationships between economic variables. It outlines the assumptions of regression models, methods of estimation, and statistical properties of Ordinary Least Squares (OLS) estimators, including the Gauss-Markov theorem. Additionally, it covers hypothesis testing for overall significance and individual significance of OLS parameters.

Uploaded by

Teklu Nega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Chapter 2 Econometrics

The document discusses the concept of simple linear regression, focusing on the estimation of population parameters from sample data and the relationships between economic variables. It outlines the assumptions of regression models, methods of estimation, and statistical properties of Ordinary Least Squares (OLS) estimators, including the Gauss-Markov theorem. Additionally, it covers hypothesis testing for overall significance and individual significance of OLS parameters.

Uploaded by

Teklu Nega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

3/15/2023

Simple Linear Regression Model  The basic idea of regression is to estimate the population
parameters from a sample.
• Economic theories are mainly concerned with the
relationships among various economic variables. Stochastic and non-stochastic relationships
• These relationships can predict the effect of one variable • A relationship between X and Y, characterized as Y = f(X) is
on another. said to be non-stochasticif for each value of the independent
• The functional relationships of these variables define the variable (X) there is one and only one corresponding value of
dependence of one variable upon the other variable in dependent variable (Y). That is
the specific form. = + −−−−−− −(1)
• The specific functional forms may be linear, quadratic,
logarithmic, exponential, or any other form. • A relationship between X and Y is said to be stochastic if for
a particular value of X there is a whole probabilistic
• Regression analysis is concerned with the study of the distribution of values of Y.
dependence of one variable, the dependent variable, on
one or more other variables, the explanatory variables, = + + −−−−−− −(2)
1 2

Two Variable Linear Regression Model


 The stochastic relationship with one explanatory variable is called
Types of Regression Models simple linear regression model
 The true relationship which connects the variables involved has
two parts: deterministic part and part represented by the
Regression
1 Explanatory 2+ Explanatory random error term .
Models
Variable Variables  is called population regression function(PRF) because
Y and X represent their respective population value, and
Simple Multiple are called the true population parameters.
 The parameters estimated from the sample value of Y and X are
called the estimators of the true parameters and are symbolized as
Non- Non-
Linear Linear
Linear Linear
 is called SRF(shows estimated relationship) between Y
and X.
3  represents the sample residual counterpart of the Ui 4

Assumptions
U may be a cumulative effect of the following i. The model is linear in parameters:
factors:  This is because if the parameters are non-linear it is difficult to
estimate them
• Omission of variables from the function
• Random behavior of human beings ii. is a random real number.

• Imperfect specification of the mathematical  This means that the value which u may assume in any one period
depends on chance; it may be positive, negative or zero.
form of the model
iii. The mean value of in any particular period is zero.
• Error of aggregation
 the positive and negative values of u canceleach other.
• Error of measurement
iv. The variance of is constant in each period

 This constant variance is called homoscedasticity assumption and


5 the constant variance itself is called homoscedastic variance. 6

1
3/15/2023

v. The random variable ( ) has a normal distribution viii. is independent of the explanatory variables.
 This means the values of u (for each x) have a bell shaped  This means there is no correlation between the random
symmetrical distribution about their zero mean and variable and the explanatory variable.
constant variance , i.e.
 If two variables are unrelated their covariance is zero.

vi. The correlation between any two uiand uj is zero (The
assumption of no autocorrelation). ix. The explanatory variables are measured without error
Algebraically,  regressors are error free, while y values may or may not

Cov(uiuj )  [(ui (ui )][uj (uj )]  include errors of measurement.

 E(uiuj )  0…………………………..….( x. The number of observations n must be greater than the


number of parameters to be estimated.
vii. values are fixed in repeated sampling, i.e., non-
stochastic. 7 8

 We can now use the above assumptions to derive the following basic concepts. Methods of Estimation
• Specifying the model and stating its underlying
A. The dependent variable Yi is normally distributed. assumptions are the first stage of any econometric
 
i.e Yi ~ N(  xi ), 2 ………………………………(2.7)
application.

Proof:
• The next step is the estimation of the numerical values of the
parameters of economic relationships.
Mean: (Y)    xi ui 
• The parameters of the simple linear regression model can be
  Xi Since (ui )  0
estimated by various methods.
Variance: Var(Yi )  Yi (Yi )
2

• commonly used methods are:


   Xi ui (  Xi )
2

– Ordinary least square method (OLS)


 (ui )2
– Maximum likelihood method (MLM)
  2 (since (ui )2  2 )
– Method of moments (MM)
var(Yi )  2 ……………………………………….( 9 10

1. Ordinary Least Square Estimation (OLS) Cont…


• Given the model = + + , Least square estimates
= , −−−−−−− −(3)
Y and X represent their respective population value, and
 and are called the true parameters. • We know that, = −
= −
• But it is difficult to obtain the population value of Y and X.
= − −
So we are forced to take the sample value of Y and X.
• The parameters estimated from the sample value of Y and X • In order to minimize ∑ , take the partial derivatives with
are called the estimators of the true parameters and and respect to and and set them equal to zero.
are symbolized as and . ∑
= −2 − − = 0 −−−−−− −(4)
• Estimation of and by OLS involves finding values for
the estimates and which will minimize the sum of the ∑
= −2 − − = 0 −−−−− −(5)
squared residuals (∑ ).
11 12

2
3/15/2023

Cont… ∑ − ( − )
• From equation (4), we get =
∑ −
= + −−−− −(6) In deviation form

• Then dividing by n and rearrange = −−−−−− −(10)

= − −−−−−− −(7)
Second Order Condition (SOC):
• From equation (5), we obtain
• ∑ is minimized when the SOC is positive.
= + −−− −(8)
 Take the second order derivative of ∑ with respect to β
 Equation (6) and (8) are called the Normal Equations. ∑
• Substitute = − in equation 8, we get: =2
β
∑ − which is positive, so that a true minimum has been
= −−−−− −(9)
∑ − obtained.
OR 13 14

2. Statistical Properties of OLS Estimators Gauss-Markov Theorem


 The statistical properties of OLS are based on classical regression  Given the assumptions of the classical regression assumptions , the
assumptions. OLS estimators, in the class of linear and unbiased estimators,
 We would like OLS estimates ,as compared to other econometric have the minimum variance, i.e. the OLS estimators are BLUE(i.e.
methods, to be as close as the value of the true population best linear unbiased estimators). Theorem gives theoretical
parameters. justification for the popularity of OLS.

 ‘Closeness’ of OLS estimate to the true population parameter is  An estimator is called BLUE if:
measured by the mean and variance of the sampling distribution of
1. Linear: a linear function of the random variable ,Y.
the estimate of the different econometric methods.
 we assume that we get a very large number of samples each of size 2. Unbiased: its average or expected value is equal to the true
‘n’; we compute the estimate from each sample, and for population parameter.
each econometric method and we form their distribution. 3. Minimum variance: It has a minimum variance in the class of
 We next compare the expected values and the variances of these linear and unbiased estimators. An unbiased estimator with the
distributions and we choose among the alternative estimates least variance is known as best/ efficient estimator.
whose distribution is concentrated as close as possible around the
 The detailed proof of these properties are presented below
true parameter. 15 16

Cont…

Given the formulas used to estimate OLS estimators

17 18

3
3/15/2023

Cont…

19 20

Cont… B

21 22

Cont… Cont...

24
23

4
3/15/2023

cont… Cont..

25 26

Cont… Cont…

27 28

Cont… Cont…

29 30

5
3/15/2023

Cont… Cont…

31 32

Cont…

33 34

Hypothesis Testing
A. Test of Overall Significance (Tests of the ‘goodness of fit’
with R2)
• It is the most commonly used measure of the goodness of fit
of a regression line.
• It is used to find out how “well” the sample regression line
fits the data.
• R2 shows the percentage of total variation of the dependent
variable that can be explained by the changes in the
explanatory variable(s) included in the model.
• To compute R2, recall that
= +
Or in the deviation form
= +
35 36

6
3/15/2023

• By squaring and summing both sides, we obtain Cont…


= + • Hence, = +
• Dividing both sides by TSS, we obtain
= + +2
1= +
= + +2
∑ ∑
1= +
Thus, ∑ ∑
= + • We now define R2as

Where = =

• ∑ represents total variation of actual Y values about their mean,
i.e., the explained variation as a percentage of the total variation
TSS
• ∑ represents explained variation of the estimated Y values ∑ ∑
about their mean, ESS = =
∑ ∑ ∑
• ∑ represents residual or unexplained variation of the Y values
about the regression line, RSS 38
37


OR =1− =1−∑ Derivation of F (ANOVA) regression(Tests for the
coefficient of determination =R2 )
= 1−
 The largest value that R2 can assume is 1 (in which case all
observations fall on the regression line), and the smallest it
• The value R2 falls between 0 and 1, i.e., 0 ≤ ≤ 1. can assume is zero.
• If R2 is 0, there is no relationship between the regressand and the  A low value of R2 is an indication that:
regressor whatsoever (i.e. = 0 or = = ).
 X is a poor explanatory variable in the sense that
• R2 of 1 means a perfect fit, that is = for each i.
– variation in X leaves Y unaffected, or
• Suppose R2 = 0.9 , this means that the regression line gives a good
– while X is a relevant variable, its influence on Y is weak as
fit to the observed data since this line explains 90% of the total
compared to some other variables that are omitted from the
variation of the Y value around their mean.
regression equation, or
 The remaining 10% of the total variation in Y is unaccounted for by
 the regression equation is misspecified (for example, an
the regression line and is attributed to the factors included in the
exponential relationship might be more appropriate).
disturbance variable
39 40

B. Tests of individual significance (Testing the significance of


Standard error test
OLS parameters ) • This test helps us to decide whether the estimates are
• Since sampling errors are inevitable in all estimates, it is significantly different from zero,
necessary to apply test of significance in order to:
• Formally we test the null hypothesis:
 measure the size of the error and
H0 :i 0 against the alternative hypothesis H1 :i 0
determine the degree of confidence in order to measure
the validity of these estimates. • The standard error test may be outlined as follows:

• This can be done by using various tests. The most common • First: Compute standard error of the parameters:
ones are:
SE(ˆ)  var(ˆ)
i)Standard error test
SE(ˆ)  var(ˆ)
ii) Student’s t-test
• Second: compare the standard errors with the numerical
ii) Confidence interval test
values of .
• All of these testing procedures reach on the same conclusion.
41 42

7
3/15/2023

Decision rule: Student’s t-test


• Like the standard error test, this test is also important to test the significance of
the parameters if the sample size is less than 30 and the unknown population
 If SE(ˆi )  12 ˆi , accept the null hypothesis and reject the alternative hypothesis. We conclude parameter is normally distributed.

that ̂i is statistically insignificant. • We can derive the t-value of the OLS estimates:

with n-k degree of freedom.


 If SE(ˆi )  12 ˆi , reject the null hypothesis and accept the alternative hypothesis. We conclude

that ̂i is statistically significant.  Since we have two parameters in simple linear regression with intercept different from zero, our
degree of freedom is n-2.
 Numerical example: Suppose that from a sample of size n=30, we estimate the following
supply function.
 Like the standard error test we formally test the hypothesis: H0 : i  0 against the alternative
Q  120  0.6 p  ei
SE : (1.7) (0.025) H1 : i  0 for the slope parameter; and H0 :  0 against the alternative H1 :  0 for the
Test the significance of the slope parameter at 5% level of significance using the standard error test. intercept.

43
 To undertake the above test we follow the following steps. 44

Step 1: Calculate t*, which is called the calculated value of t, Step 3: Check whether there is one tail test or two tail test
by taking the value of  in the null hypothesis. In our case  = 0,
• If the inequality sign in H1 is ≠ , then, this implies a two tail test
then t* becomes:
and divide the chosen level of significance by two; decide the
critical value of t called tc.(T-tabulated).
• But if the inequality sign is either > or < then, it indicates one tail
test and there is no need to divide the chosen level of significance
Step 2: Choose level of significance. by two to obtain the critical value from the t-table.
• Level of significance is the probability of making ‘wrong’ decision, Step 4: Obtain critical value of t, called tc at 2 and n-2 degree of freedom for two tail test.
i.e. the probability of rejecting the null hypothesis while it is Step 5: Compare t* (the computed value of t) and tc (critical value of t)
true and acceptable or the probability of committing a type I
error.  If t*> tc , reject H0 and accept H1. The conclusion is ̂ is statistically significant.

• It is usual in econometric research to choose the 1%, 5% or 10%  If t*< tc , accept H0 and reject H1. The conclusion is ̂ is statistically insignificant.
level of significance.
 This means that in making the decision, we allow (tolerate) one/five /ten
46
out of hundred to be ‘wrong’. 45

b. Since the alternative hypothesis (H1) is stated by inequality sign (  ) ,it is a two tail test, hence
Numerical Example:
we divide 
2  0.05 2  0.025 to obtain the critical value of ‘t’ at 
2 =0.025 and 18 degree of
Suppose that from a sample size n=20 we estimate the following consumption function:
freedom (df) i.e. (n-2=20-2). From the
C  100  0.70e
t-table ‘tc’ at 0.025 level of significance and 18 df is 2.10.
(75.5) (0.21)
c. Since t*=3.3 and tc=2.1, t*>tc. It implies that ̂ is statistically significant.
The values in the brackets are standard errors. We want to test the null hypothesis: H0 : i  0 against

the alternative H1 : i  0 using the t-test at 5% level of significance.


a. the t-value for the test statistic is:

ˆ 0 ˆ 0.70
t*  =  3.3
SE() SE(ˆ)
ˆ 0.21

47 48

8
3/15/2023

Confidence interval test


The limit within which the true  lies at (1 )%degree of confidence is:
• In order to define how close the estimate to the true parameter,
we must construct confidence interval for the true parameter, [ˆ  SE(ˆ)tc , ˆ  SE(ˆ)tc ] ; where tc is the critical value of t at 
2 confidence interval and n-2 degree
of freedom.
 in other words we must establish limiting values around the
estimate in which the true parameter is expected to lie within The test procedure is outlined as follows.
a certain “degree of confidence”. H0 :   0

• We choose a probability in advance and refer to it as H1 :   0


confidence level (interval coefficient).
• It is customarily in econometrics to choose the 95% confidence Decision rule: If the hypothesized value of  in the null hypothesis is within the confidence interval,
level.
accept H0 and reject H1. The implication is that ̂ is statistically insignificant; while if the hypothesized
• i.e., the confidence limits, computed from the sample, would
value of  in the null hypothesis is outside the limit, reject H0 and accept H1. This indicates ̂ is
include the true population parameter in 95% of the cases.
statistically significant.
 5% of the cases the population parameter will fall outside the
confidence interval.
49 50

ˆ  2.88
Numerical Example:
Suppose we have estimated the following regression line from a sample of 20 observations. SE(ˆ)  0.85

Y 128.5  2.88X  e tc at 0.025 level of significance and 18 degree of freedom is 2.10.


(38.2) (0.85)
 ˆ  SE(ˆ)tc  2.88 2.10(0.85)  2.881.79.
The values in the bracket are standard errors.
The confidence interval is:
a. Construct 95% confidence interval for the slope of parameter
(1.09, 4.67)
b. Test the significance of the slope parameter using constructed confidence interval.
a. The value of  in the null hypothesis is zero which implies it is out side the confidence interval.
Solution:
Hence  is statistically significant.
a. The limit within which the true  lies at 95% confidence interval is:

ˆ  SE(ˆ)tc

51 52

You might also like