The Fundamentals of Regression Analysis PDF
The Fundamentals of Regression Analysis PDF
The Fundamentals
of Regression Analysis
A Primer for Antitrust Attorneys
by
Russell Lamb, Ph.D.
Senior Vice President
Nathan Associates Inc.
Arlington, VA.
N 0
Outline
• Introduction
• Building a regression model
• Interpreting the regression coefficients
• Estimating the regression coefficients
• Empirical example
• The indicator variable model
·• Hypothesis testing
• Assessing the OLS estimators
• Assessing the regression model
• Identification
• Introduction to the forecasting method
• Appendix A - Probability Basics
• Appendix B - Assumptions of the Multiple Linear Regression Model
N .I
,. • Appendix C - Random Sampling
1
Introduction
N 2
Regression analysis
The "how much" question
• Regression analysis is a statistical tool that is used
to understand the relationship among two or more
economic variables.
• Regression analysis aims to answer the "how
much" question. For example, the regression
model might seek to answer
N 3
Regression analysis
It all begins with theory
CONSUMPTION == f (INCOME)
INPUT x
FUNCTION f:
OUTPUT y == f (x)
N
- I
5
The economic model
N•
...J
6
The linear model
Food Expenditure= {30 + {31 /ncome
Linear Model
y Y = /Jo + /J1X
.3
]
Q.
'
-==
~
t:.Y
B1: t:,.X
Bo
X
Income
Yi == f3o + /31 xi
where the subscript i runs over observations, i == 1, ... .. . , N;
Yi is the dependent variable;
xi is an independent or explanatory variable; and
/Jo and {31 are the unknown population parameters, or
regression coefficients.
N 8
Introduction to the error term
• Every regression model can be thought of as having two components:
a systematic component, which is obtained from theory, and a
random component [2].
• Since economic theory describes the average behavior of many
individuals or firms, the systematic portion can be thought of as the
expected value of Y given X, which is the mean value of the Ys
associated with a particular value of X. The systematic component is
1. All factors effecting the dependent variable that are not included in
the model.
2. Any approximation error that arises from the fact that the
underlying theoretical equation might have a different functional
form than the one chosen for the regression [3].
N 10
Multiple regression models
• When studying the effect of an explanatory variable
x on the dependent variable y, we generally want to
"control" for other factors that influence y by
including these factors in the model.
• Including other factors in the model allows us to
isolate and measure the impact of a variable of
interest while accounting for all other factors that are
thought to influence outcomes.
• For this purpose, we use the multiple regression
model, the general form of which is
N 13
Nonlinear models
Food Expenditure= {30 + {31 In(Income)
Linear-log Model
600 •
500
•
•••
~
3
·-i
"C
c. 300
400
•
• •••
~
•
·
•
---
•
• •
~
r;i;l
"C •• •
• •
••
0
~
0 200 • •
••
100 •
0
',
0 10 20 30 40
Income
N Data Source: R. Carter Hill, William E. Griffiths, and Guay C. Lim (2011), Principles of Econometrics, 4th Ed.
14
Models that are linear in the
coefficients
Food Expenditure = {30 + {31 In(Income)
• The coefficients are not raised to any powers (other than one).
• The coefficients are not multiplied or divided by other coefficients.
• The coefficients do not themselves include some sort of function.
N 15
Interpreting the regression coefficients
Simple linear regression model
Y = Po + /31 X1 +e
• {3 0 is the intercept coefficient, and it indicates the
value of y when x 1 is equal to zero.
• {31 is the slope coefficient, and it indicates the
amount that the dependent variable y will change
when x 1 increases by one unit.
N 16
Interpreting the regression coefficients
Multiple regression models
N 17
Key concept
Ceteris paribus [5]
N 18
Interpreting the regression coefficients
Nonlinear models
19
Interpreting the regression coefficients
Nonlinear models
y
lo(y) = Bo + B 1x
y
y = B, + B1lo(x)
~
------
B1 >0
X X
/
ln(y) = Bo + B1ln(x) y = Bo + B1x
y y
X X
N 22
Estimating the regression coeffici·e nts
Linear least squares
N 23
Estimating the regression coefficients
Least squares principle
• The least squares principle: Fit a line such that we minimize the
vertical distances from each point to the line. This distance is called the
residual.
Statistical Fit of a Line
~ 1--------------------------------
residual (y1- Yt)
•
Y = f3o + f31x
t. 1------------------ ---· •- ~-------
•
•
.. • • •
• ••
•----
' •
,- • •
6X
M
•
Bo I B1 = 6X
squared residuals.
• This rule is known as the least squares problem.
• The solution to the least squares problem yields a set of
formulas called the least squares estimators. These
formulas are perfectly general and can be applied to any
sample data.
• When you plug the sample data values into the estimators
you obtain numbers, which are the least squares estimates.
N 25
The indicator-variable model
Introduction
N 26
The indicator-variable model
Modeling but-for prices
N 28
Hypothetical empirical example
N 29
Hypothetical empirical example
N 30
Hypothetical empirical example
• Since the Pit, Cit and Dit are in their natural logarithm
transformations, the slope coefficients /31 and /3 2 are
elasticities.
• The coefficient on the indicator variable {3 3 is an
estimate of the impact of the alleged cartel on prices,
while accounting for the influences on prices of
demand (personal disposable income) and
production costs.
• The Cartel effect is an (approximately)100{33 °/o change in
prices.
Ho: f3cARTEL < O (the values that are not expected to be true)
N 35
Statistical significance
N 36
~ Hypothesis testing
' The p-value
J
N 37
Hypothesis testing
The t-statistic
N 38
Hypothesis testing
The t-statistic
N 41
Hypothesis testing
Hypothetical empirical example
N 131
The probability density function oftbe least squares estimator b1.
b1
44
Assessing the OLS estimators
Variance
• The variance of an estimator is a measure of the spread of its
probability distribution.
• The smaller the variance of an estimator is, the greater the
precision of that estimator [8].
>- When comparing 2 estimators, the one with smaller variance is
best since this rule gives us a higher probability of obtaining an
estimate that is close to the true parameter value.
N B1
Two possible probability density functions (or b1.
45
The Gauss-Markov theorem
46
Assessing the regression model
Measuring goodness-of-fit
N 47
Assessing the regression model
Measuring goodness-of-fit
• The total sum of squares (SST) can be decomposed into
• The sum of squares that is explained by the regression (SSR).
• The sum of squares due to error (SSE), which is that part of SST that
is not explained by the regression.
Yi
Yt-Y
• ••• •
Yd • ~ explained
YI
~
.~• •·
•
'
, . . l rl
I\
Y1 - Y
-
• •
• •
•
48
Assessing the regression model
Measuring goodness-of-fit
2
SSR
_
R - SST
N 49
Assessing the regression model
Measuring goodness-of-fit
y •
•
• • •
• ••
••
• •
• •
.,
y •
•
•
Regression Line
• •,
• • • •• •
•
•• • • •
• •
• •
•••
X X
x and y are not related; In such case, R 1 would be zero. 2
All of the data points are on the regression Une,and the resulting R Is eqnal to 1.
N 50
Assessing the regression model
The F-test of overall significance
N 51
Assessing the regression model
The F-test of overall significance
• To test the significance of the model, we calculate the F-statistic:
F == (SST - SSE)/(K - 1)
SSE/(N - K)
where:
K is the number of parameters in the model including the
intercept;
N is the number of observations;
SST is the total sum of squares; and
SSE is the sum of squares due to error (sum of squared OLS
residuals)
• Thus, the F-statistic in a test for overall significance is the ratio of
the sum of errors explained to the residual sum of squares,
adjusted for the number of independent variables and the number
N of observations.
52
Assessing the regression model
Hypothetical empirical example
• The R 2 tells us that approximately 81 % of the variation in
In(PRICE) can be explained by the variation in the explanatory
variables.
• The probability of observing an F-statistic as large or larger
than the observed statistic under the null hypothesis is smaller
than 0.00005 .
> We reject the hypothesis that the slope coeffi~ients are jointly
equal to zero.
Sourc e I ss df MS Number of ob.s = 22 , 62 4
F(3, 22620) = 32297.51
Model 1503.98605 3 5 01 . 328683 Prob> F = 0.0000
Re .sidu a l I 35 1.112389 22, 620 . 01552221 R- .squared = 0.8107
Adj R- .squared = 0.810 7
Total I 1855.09844 22,623 .08200055 Root MSE = .124 59
53
Key concept
Identification
N 54
Threats to identification
Biasedness
N 58
Incorrect functio·n al form
Solution
A Fitted Linear Relationship A Fitted Quadratic Relationship A Fitted Log-linear Relationship
y •• y •• y ••
• • •
•• •• ••
• . ./ •
• • •
• • •
• • •
• • • • ·1'·
• •
•
• • •
• •
• • •
X X X
The s:wn of .squared residuals is 6,868,481,92 The sum of .squared re.s iduals is S,015,0l7;86 The swn of .squared :residuals is 111
N 60
Simultaneous causality bias
Problem
Q = /Jo + /31 P + e ,
where P is the equilibrium price of the product,
and Q is the equilibrium quantity.
N .
,,.
61
Simultaneous causality bias
Problem
Q* Q
Supply and demand equilibrium.
N 62
Simultaneous causality bias
Problem
Demand: Q = a 0 + a 1 P + a 2X + ed
Supply: Q = /30 + /31 P + e5
63
Key concept
Endogenous vs. exogenous variables
Demand: Q = a 0 + a1P + a 2X + ed
Supply: Q = /30 + {31 P + e5
N 64
Simultaneous causality solutions
Reduced-form methods
Demand: Q = a0 + a1 P + a 2 X + ed
Supply: Q = /3 0 + /31 P + e5
N 66
Introduction to the forecasting method
N 67
Introduction to the forecasting method
. . ... ..
In- Sample Ont- of- Sample
0 .50 Prediction Prediction
S'
§ 0 .40
0
Q.
.... .
a..
~ 0 .30 •·<• Actual
~
'-" •
~
-~
a.. 0 .20
I
I•
•
Predicted - - ~---
p..
J~
.+
0 . 10 I
I
I
I
0 .0 0
......
:<S-°'
~<:>'V,
~......
s::>".;, ~~" ~"
~<i-j "
~'o:<S' s,.:<S-
~ ~ ~ ~ ~ ~
Month
N 69
Summary
N 70
References
N...
71
References
N 72
In-text Citations
N...
14 . James Stock and Mark Watson (2007), p. 778.
Appendices
N 74
Appendix A - Probability Basics
Random variables
N 75
Appendix A - Probability Basics
Probability density function
N 76
Appendix A - Probability Basics
Probability density function
Probability
Density Function
of X
X f(x)
l 0.1
2 0.2
3 0.3
4 0.4
.4
.3
~
:§
.! .2
=
i!:
.1
0
2 3 4
Xvalue
Probability density function for X
• For a discrete (i.e., countable) random variable, the pd/ indicates the
N probability that the random variable X takes on the value x.
77
Appendix A - Probability Basics
Probability density function
Probability(5 ~ X ~ 10)
0 5 10 15 20
X
Probability density function for a continuous random variable.
• For a continuous (i.e., uncountable) random variable, the pdf indicates the
probability of outcomes being in certain ranges.
N 78
Appendix A - Probability Basics
Mathematical expectation
N'
::J
79
Appendix A - Probability Basics
Conditional mean
• With two or more random variables, we need to consider
the joint probability density function .
• For example, say Y is a random variable that takes on the
values 1, 2, 3, and 4, and X is a random variable that
takes on the values O and 1. The joint probability
density function of X and Y allows us to say things like
"The probability of Y being equal to 2 when X is equal to 1
is .25."
• The conditional expectation or conditional mean of Y
given X = x is the average value of Y in repeated
sampling where X = x has occurred.
N 80
Appendix A - Probability Basics
Variance
• The variance of a random variable, denoted as var(Y),
characterizes the spread of the probability density function .
.4
Smaller variance
.3
f(x) .2
Larger variance
"
.1 ""·
Q-l----
4 ~ ~ ~ 0 2 3 4
X
Distributions with different variances.
N
Appendix A - Probability Basics
Correlation
N 83
Appendix A - Probability Basics
Statistical independence
N:
;..
84
Appendix A - Probability Basics
Normality
• The normal distribution, denoted as N(µ, cr 2 ), is a probability density function
that is symmetric and centered around its popu lation mean value µ.
• Random variables that have "bell-shaped" probability density functions are
said to be "normally distributed."
• In the figure below, the random variable Y is normally distributed with mean
µ = E(Y) and variance cr 2 . The area under the normal pdf between
µ - l.96cr and µ + 1.96cr is 0.95, where er is the square root of the variance,
or the standard deviation of µ.
The Normal Probability Density
N µ - 1.96 a µ µ + 1.96 a y
85
Appendix B - Assumptions of the
Multiple Linear Regression Model
• A1. Yi = Po + f31xi1 + f32xi2 + ...... ·· + f3kxiK + ei,
=
i 1, ..... , N correctly describes the relationship between y
and x in the population.
• A2. The data pairs (xii, Xi2, ..... I XiK, ya, i = 1, ...... IN, are obtained
by random sampling (see Appendix C).
• A3. E(elx1, x 2 , ....... , xK) = 0. The expected value of the error
term conditional on x 1 , x 2 , ....... , xK is zero.
• Sometimes y can be above the population regression line,
and sometimes y can be below the population regression
line, but on average, y falls on the population regression
line.
~ This means that the expected value of
e = y - E(ylxv x 2 , ....... , xK) conditional on any
value of xk, is zero.
• If xk and e are correlated, then it can be shown that
*
N E(elxvx 2, ... .... ,xK) 0.
86
Appendix B - Assumptions of the
Multiple Linear Regression Model
• A4. In the sample, each xk must take on at least 2 different values,
and the values of each xk are not exact linear functions of the
other explanatory variables (no perfect multicollinearity}.
• Perfect multicollinearity is the condition where variables are
essentially redundant.
• Under perfect multicollinearity, the least squares procedure fails
• A5. var(e lx1 , x 2 , ....... , xK) = (J 2 . The variance of the error term,
conditional on any value of x, is a constant (J 2 :
~ It is assumed to be the same for each observation.
~ It is not directly related to any of the explanatory variables.
Errors with this property are said to be homoskedastic.
• A6. The distribution of the error term is normal.
If assumptions A 1 through A5 hold, then the OLS estimators are the best
linear unbiased estimators (BLUE} of the regression coefficients.
N 87
Appendix C - Random Sampling
• By random sampling, we mean that the process by which the
data are collected is such that each observation
(Yi, xi1, xi2, "
1 xiK) is statistically independent of every
I I,
other observation.
• Statistical independence means that knowing the values
(y, x1 , x 2, ...
Ill xK) for one observation provides no
I,
N 88
Glossary
N:
-'
89
Glossary
N 91
Glossary
N_..
variables takes on the value of O in a regression equation .
93
Glossary
• linear least squares. The estimator of the regression intercept and slope(s)
that minimizes the sum of squared residuals.
• linear-log model. A nonlinear regression function in which the dependent
variable is y and the independent variable is the natural logarithm transformation
Of X.
94
Glossary
95
Glossary
N 97
Glossary
N 98
Glossary
N 99