0% found this document useful (0 votes)
54 views

2 Basic Regression

This document provides an overview of classical linear regression models. It discusses key concepts such as: - Regression analysis examines the relationship between a dependent variable (y) and one or more independent variables (x). - The classical linear regression model (CLRM) assumes a linear relationship between y and x, where y is a stochastic variable and x is a non-stochastic variable. - Ordinary least squares (OLS) is commonly used to estimate the regression coefficients by minimizing the sum of squared residuals between the actual data points and the regression line. - For the CLRM to apply, the error terms must meet certain assumptions such as having a mean of zero, constant variance, and being un

Uploaded by

Giang Hoang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

2 Basic Regression

This document provides an overview of classical linear regression models. It discusses key concepts such as: - Regression analysis examines the relationship between a dependent variable (y) and one or more independent variables (x). - The classical linear regression model (CLRM) assumes a linear relationship between y and x, where y is a stochastic variable and x is a non-stochastic variable. - Ordinary least squares (OLS) is commonly used to estimate the regression coefficients by minimizing the sum of squared residuals between the actual data points and the regression line. - For the CLRM to apply, the error terms must meet certain assumptions such as having a mean of zero, constant variance, and being un

Uploaded by

Giang Hoang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Chapter 3

A brief overview of the


classical linear regression model
Regression

• Regression is probably the single most important tool at the


econometrician’s disposal.

But what is regression analysis?

• It is concerned with describing and evaluating the relationship


between a given variable (usually called the dependent variable, y) and
one or more other variables (usually known as the independent
variable(s), x).
Regression is different from Correlation

• If we say y and x are correlated, it means that we are treating y and x in


a completely symmetrical way.

• In regression, we treat the dependent variable (y) and the independent


variable(s) (x’s) very differently. The y variable is assumed to be
random or “stochastic” in some way, i.e. to have a probability
distribution. The x variables are, however, assumed to have fixed
(“non-stochastic”) values in repeated samples.
Simple Regression

• For simplicity, say k=1. This is the situation where y depends on only one x
variable.

• Examples of the kind of relationship that may be of interest include:


– How asset returns vary with their level of market risk
– Measuring the long-term relationship between stock prices and
dividends.
Simple Regression: An Example

• Suppose that we have the following data on the excess returns on a fund
manager’s portfolio (“fund XXX”) together with the excess returns on a
market index:
Year, t Excess return Excess return on market index
= rXXX,t – rft = rmt - rft
1 17.8 13.7
2 39.0 23.2
3 12.8 6.9
4 24.2 16.8
5 17.2 12.3

• We have some intuition that the beta on this fund is positive, and we
therefore want to find whether there appears to be a relationship between x
and y given the data that we have. The first stage would be to form a scatter
plot of the two variables.
Graph (Scatter Diagram)

45
40
Excess return on fund XXX

35
30
25
20
15
10
5
0
0 5 10 15 20 25
Excess return on market portfolio
Finding a Line of Best Fit

• We can use the general equation for a straight line,


y=a+bx
to get the line that best “fits” the data.

• However, this equation (y=a+bx) is completely deterministic.

• Is this realistic? No. So what we do is to add a random disturbance


term, u into the equation.
y t =  +  x t + ut
where t = 1,2,3,4,5
Why do we include a Disturbance term?

• The disturbance term can capture a number of features:

- We always leave out some determinants of yt


- There may be errors in the measurement of yt that cannot be
modelled.
- Random outside influences on yt which we cannot model
Determining the Regression Coefficients

• So how do we determine what  and  are?


• Choose  and  so that the (vertical) distances from the data points to the
fitted lines are minimised (so that the line fits the data as closely as
possible): y

x
Ordinary Least Squares

• The most common method used to fit a line to the data is known as
OLS (ordinary least squares).

• What we actually do is take each distance and square it (i.e. take the
area of each of the squares in the diagram) and minimise the total sum
of the squares (hence least squares).

• Tightening up the notation, let


yt denote the actual data point t
ŷt denote the fitted value from the regression line
ût denote the residual, yŷt t-
Actual and Fitted Value

yi

û i

ŷ i

xi x
How OLS Works

5
2
uˆ12  uˆ 22  uˆ32  uˆ 42  uˆ52  uˆ
t 1
t

ût
ŷt

2 2
  yt  yˆ t  ˆ
u
 t
Deriving the OLS Estimator (cont’d)

• After some mathematical manipulation we have

ˆ  xt yt  Tx y
 2 2
andˆ  y  ˆx
 xt  Tx

• This method of finding the optimum is known as ordinary least squares.


• You are NOT required to know the derivation of the estimator.
yˆ t  1.74  1.64 x t

yˆ i  1.74  1.64 20 31.06


EViews : Running a linear regression with spot and
future prices
• Run a linear regression of spot on futures:
Click on spot (dependent variable), press Ctrl, click on futures
(independent variable), then double click on either
EViews : Running a linear regression
• The first variable is dependent variable (spot), and the others
following are independent variables (futures), c stands for constant
term.
EViews regression output
The Population and the Sample

• The population is the total collection of all objects or people to be studied,


for example,

• Interested in Population of interest


predicting outcome the entire electorate
of an election

• A sample is a selection of just some items from the population.

• A random sample is a sample in which each individual item in the


population is equally likely to be drawn.
The SRF and the PRF

• The population regression function (PRF) is a description of the model


that is thought to be generating the actual data and the true relationship
between the variables (i.e. the true values of  and ).

• The PRF is yt    xt  ut

• The SRF is yˆ t ˆ  ˆ xt


and we also know that uˆt  yt  yˆ t .

• We use the SRF to infer likely values of the PRF.

• We also want to know how “good” our estimates of  and  are.


Linearity

• In order to use OLS, we need a model which is linear in the parameters (


and  ). It does not necessarily have to be linear in the variables (y and x).

• Linear in the parameters means that the parameters are not multiplied
together, divided, squared or cubed etc.

• Some models can be transformed to linear ones by a suitable substitution or


manipulation, e.g. the exponential regression model

Yt e X t e ut  ln Yt    ln X t  ut
• Then let yt=ln Yt and xt=ln Xt
yt   xt  ut
Linear and Non-linear Models

• This is known as the exponential regression model. Here, the coefficients


can be interpreted as elasticities.

• Similarly, if theory suggests that y and x should be inversely related:



yt    ut
xt
then the regression can be estimated using OLS by substituting
1
zt 
xt
• But some models are intrinsically non-linear, e.g.

yt   xt  ut
Generating new variables in EViews
Click on Genr (stands for generate)
Type in equation of new variable: lspot=log(spot) lfutures=log(futures)
Running a log-log regression
• Run a log-log regression of lspot on lfutures
• (running a linear regression of lspot on lfutures)
Regression output
• For a log-log regression, the slope coefficient can be
interpreted as elasticity. This means that when futures
price increases by 1%, the spot price will increase by
approximately 0.98%.
Estimator or Estimate?

• Estimators are the formulae used to calculate the coefficients

• Estimates are the actual numerical values for the coefficients.


The Assumptions Underlying the
Classical Linear Regression Model (CLRM)
• The model which we have used is known as the classical linear regression model.
• We observe data for xt, but since yt also depends on ut, we must be specific about
how the ut are generated.
• We usually make the following set of assumptions about the ut’s (the unobservable
error terms):
• Technical Notation Interpretation
1. E(ut) = 0 The errors have zero mean
2. Var (ut) = 2 The variance of the errors is constant and finite
over all values of xt
3. Cov (ui,uj)=0 The errors are statistically independent of
one another
4. Cov (ut,xt)=0 No relationship between the error and
corresponding x variate
The Assumptions Underlying the
CLRM Again
Properties of the OLS Estimator
Consistency/Unbiasedness/Efficiency
Which estimator is the most efficient?
Are all the estimators unbiased and consistent?
Precision and Standard Errors

x y  Tx y
ˆ   t 2 t 2
andˆ  y  ˆx
 xt  Tx

SE (ˆ ) s
 xt2 s
 xt ,
T  ( xt  x ) 2 T  xt2  T 2 x 2
1 1
SE ( ˆ ) s 2
s
(
 tx  x )  xt2  Tx 2
Estimating the Variance of the Disturbance Term

• The variance of the random variable ut is given by


Var(ut) = E[(ut)-E(ut)]2
which reduces to
Var(ut) = E(ut2)
2
u
• We could estimate this using the average of t :
1
s2   ut2
T
• Unfortunately this is not workable since
ût ut is not observable.
1 We can use
2 2
the sample counterpart to ut, which is : s  uˆt
T
But this estimator is a biased estimator of 2.
Estimating the Variance of the Disturbance Term
(cont’d)

2
ˆ
u
 t
s
T 2
2
 uˆ t
Example: How to Calculate the Parameters and
Standard Errors
• Assume we have the following data calculated from a regression of y on a
single variable x and a constant over 22 observations.
• Data:
 xt yt 830102, T 22, x 416.5, y 86.65,
2
 t 3919654, RSS 130.6
x

• Calculations:  830102  (22 * 416.5 * 86.65) 0.35


3919654  22 * (416.5) 2

 86.65  0.35 * 416.5  59.12


• We write yˆ t ˆ  ˆxt
yˆ t 59.12  0.35 xt
Example (cont’d)

 uˆ t2 130.6
• SE(regression), s   2.55
T 2 20

3919654
SE ( ) 2.55 * 3.35

 22 3919654  22 416.5 
2

1
SE (  ) 2.55 * 0.0079

3919654  22 416.5 2

• We now write the results as

yˆ t  59.12  0.35 xt
(3.35) (0.0079)
An Introduction to Statistical Inference

yˆ t  20.3  0.5091xt
(14.38) (0.2561)
Hypothesis Testing: Some Concepts

• We can use the information in the sample to make inferences about the
population.
• We will always have two hypotheses that go together, the null hypothesis
(denoted H0) and the alternative hypothesis (denoted H1).
• The null hypothesis is the statement or the statistical hypothesis that is actually
being tested. The alternative hypothesis represents the remaining outcomes of
interest.
• For example, suppose given the regression results above, we are interested in
the hypothesis that the true value of  is in fact 0.5. We would use the notation
H0 :  = 0.5
H1 :   0.5
This would be known as a two sided test.
One-Sided Hypothesis Tests

• Sometimes we may have some prior information that, for example, we


would expect  > 0.5 rather than  < 0.5. In this case, we would do a
one-sided test:
H0 :  = 0.5
H1 :  > 0.5
or we could have had
H0 :  = 0.5
H1 :  < 0.5

• There are two ways to conduct a hypothesis test: via the test of
significance approach or via the confidence interval approach.
The Probability Distribution of the
Least Squares Estimators
The Probability Distribution of the
Least Squares Estimators (cont’d)

ˆ   ˆ  
~ N  0,1 ~ N  0,1
var  var  

ˆ   ˆ  
~ tT  2 ~ tT  2
SE (ˆ ) ˆ
SE (  )
Testing Hypotheses:
The Test of Significance Approach

yt   xt  ut
The Test of Significance Approach (cont’d)

3. We need some tabulated distribution with which to compare the estimated


test statistics. Test statistics derived in this way can be shown to follow a t-
distribution with T-2 degrees of freedom.
As the number of degrees of freedom increases, we need to be less cautious in
our approach since we can be more sure that our results are robust.

4. We need to choose a “significance level”, often denoted . This is also


sometimes called the size of the test and it determines the region where we
will reject or not reject the null hypothesis that we are testing. It is
conventional to use a significance level of 5%.

Conventional to use a 5% size of test, but 10% and 1% are also commonly
used.
Determining the Rejection Region for a Test of Significance

5. Given a significance level, we can determine a rejection region and non-


rejection region. For a 2-sided test:
f(x)

2.5% 95% non-rejection 2.5%


rejection region region rejection region
The Rejection Region for a 1-Sided Test (Upper Tail)

f(x)

95% non-rejection
region 5% rejection region
The Rejection Region for a 1-Sided Test (Lower Tail)

f(x)

95% non-rejection region


5% rejection region
The Test of Significance Approach: Drawing Conclusions

6. Use the t-tables to obtain a critical value or values with which to


compare the test statistic.

7. Finally perform the test. If the test statistic lies in the rejection
region then reject the null hypothesis (H0), else do not reject H0.
A Note on the t and the Normal Distribution

• You should all be familiar with the normal distribution and its
characteristic “bell” shape.

• We can scale a normal variate to have zero mean and unit variance by
subtracting its mean and dividing by its standard deviation.

• There is, however, a specific relationship between the t- and the


standard normal distribution. Both are symmetrical and centred on
zero. The t-distribution has another parameter, its degrees of freedom.
We will always know this (for the time being from the number of
observations -2).
What Does the t-Distribution Look Like?

normal distribution

t-distribution
The Confidence Interval Approach
to Hypothesis Testing

• An example of its usage: We estimate a parameter, say to be 0.93, and


a “95% confidence interval” to be (0.77,1.09). This means that we are
95% confident that the interval containing the true (but unknown)
value of .

• Confidence intervals are almost invariably two-sided, although in


theory a one-sided interval can be constructed.
How to Carry out a Hypothesis Test
Using Confidence Intervals

( ˆ  t crit SE ( ˆ ), ˆ  t crit SE ( ˆ ))


Confidence Intervals Versus Tests of Significance

 t crit SE ( ˆ )  ˆ   * t crit SE ( ˆ )


ˆ  t crit SE ( ˆ )   *  ˆ  t crit SE ( ˆ )
Constructing Tests of Significance and
Confidence Intervals: An Example

• Using the regression results above,

yˆ t  20.3  0.5091xt
, T=22
(14.38) (0.2561)
• Using both the test of significance and confidence interval approaches,
test the hypothesis that  =1 against a two-sided alternative.

• The first step is to obtain the critical value. We want tcrit = t20;5%
Determining the Rejection Region

f(x)

2.5% rejection region 2.5% rejection region

-2.086 +2.086
Testing other Hypotheses

• What if we wanted to test H0 :  = 0 or H0 :  = 2?

• Note that we can test these with the confidence interval approach.
For interest (!), test
H0 :  = 0
vs. H1 :   0

H0 :  = 2
vs. H1 :   2
Performing the Test

ˆ t crit SE ( ˆ )
0.5091 2.086 0.2561
( 0.0251,1.0433)
Changing the Size of the Test
Changing the Size of the Test:
The New Rejection Regions

f(x)

5% rejection region 5% rejection region

-1.725 +1.725
Changing the Size of the Test:
The Conclusion

• t20;10% = 1.725. So now, as the test statistic lies in the rejection region,
we would reject H0.

• Caution should therefore be used when placing emphasis on or making


decisions in marginal cases (i.e. in cases where we only just reject or
not reject).
Some More Terminology

• If we reject the null hypothesis at the 5% level, we say that the result
of the test is statistically significant.

• Note that a statistically significant result may be of no practical


significance. E.g. if a shipment of cans of beans is expected to weigh
450g per tin, but the actual mean weight of some tins is 449g, the
result may be highly statistically significant but presumably nobody
would care about 1g of beans.
• We are usually concerned with both statistical and economic
significance.
Performing Hypothesis Test in EViews
• Null hypothesis, H0 : Beta = 0 Alternative hypothesis, H1 : Beta ≠ 0
• P value = 0.0000 < 0.05 (5% level) Reject the null
Performing Hypothesis Test in EViews
• Null hypothesis, H0 : Beta = 1 Alternative hypothesis, H1 : Beta ≠ 1
• Type c(1)=1 Note: c(1) stands for the 1st coefficient
Performing Hypothesis Test in EViews
• Null hypothesis, H0 : Beta = 1 Alternative hypothesis, H1 : Beta ≠ 1
• P value > 0.05 (5% level) Fail to reject the null
The Errors That We Can Make
Using Hypothesis Tests

• We usually reject H0 if the test statistic is statistically significant at a


chosen significance level.

• There are two possible errors we could make:


1. Rejecting H0 when it was really true. This is called a type I error.
2. Not rejecting H0 when it was in fact false. This is called a type II error.
Reality
H0 is true H0 is false
Significant Type I error 
Result of (reject H0) =
Test Insignificant Type II error
( do not  =
reject H0)
The Trade-off Between Type I and Type II Errors

• The probability of a type I error is just , the significance level or size of test we
chose.

• What happens if we reduce the size of the test (e.g. from a 5% test to a 1% test)?
We reduce the chances of making a type I error ... but we also reduce the
probability that we will reject the null hypothesis at all, so we increase the
probability of a type II error:
less likely
to falsely reject
Reduce size  more strict  reject null
of test criterion for hypothesis more likely to
rejection less often incorrectly not
reject
• So there is always a trade off between type I and type II errors when choosing a
significance level. The only way we can reduce the chances of both is to increase
the sample size.
A Special Type of Hypothesis Test: The t-ratio
The t-ratio: An Example

• Suppose that we have the following parameter estimates, standard errors and
t-ratios for an intercept and slope respectively.
Coefficient 1.10 -4.40
SE 1.35 0.96
t-ratio 0.81 -4.63

Compare this with a tcrit with 15-3 = 12 d.f.


(2½% in each tail for a 5% test) = 2.179 5%
= 3.055 1%
• Do we reject H0 : 1 = 0? (No)
H0 : 2 = 0? (Yes)
What Does the t-ratio tell us?

• If we reject H0, we say that the result is significant. If the coefficient is not
“significant” (e.g. the intercept coefficient in the last regression above), then
it means that the variable is not helping to explain variations in y. Variables
that are not significant are usually removed from the regression model.
• In practice there are good statistical reasons for always having a constant
even if it is not significant. Look at what happens if no intercept is included:
y
t

x
t
The Exact Significance Level or p-value

• This is equivalent to choosing an infinite number of critical t-values from tables.


It gives us the marginal significance level where we would be indifferent
between rejecting and not rejecting the null hypothesis.

• If the test statistic is large in absolute value, the p-value will be small, and vice
versa. The p-value gives the plausibility of the null hypothesis.

e.g. a test statistic is distributed as a t62 = 1.47.


The p-value = 0.12.

• Do we reject at the 5% level?...........................No


• Do we reject at the 10% level?.........................No
• Do we reject at the 20% level?.........................Yes

You might also like