Chap 1 Nature of Regression Analysis
Chap 1 Nature of Regression Analysis
REGRESSION ANALYSIS
Basic
2020-10-07 Econometrics Haleema Sadia 1
HISTORICAL ORIGIN OF THE TERM
REGRESSION
• The term regression is introduced by Francis
Galton.
• He found that, although there was a tendency for
tall parents to have tall children and for short
parents to have short children, the average height
of children born of parents of a given height
tended to move or “regress” toward the averge
height in the population as a whole. This tendency
is called Galton’s law of universal regression.
Basic
2020-10-07 Econometrics Haleema Sadia 2
THE MODERN INTERPRETATION OF
REGRESSION
• Regression analysis is concerned with the study of the
dependence of one variable, the dependent variable,
on one or more other variables, the explanatory
variables, with a view to estimating and/or predicting
the (population) mean or average value of the former
in terms of the known or fixed (in repeated sampling)
values of the latter.
Basic
2020-10-07 Econometrics Haleema Sadia 3
Examples of Regression Analysis
1. Reconsider Galton’s law of universal
regression.
We want to find out how the average height
of sons changes, given the father’s height.
Basic Econometrics
442020-10-07 Haleema Sadia 4
Figure 1.1 Hypothetical distribution of sons’ heights
corresponding to given heights of fathers.
Basic
2020-10-07 Econometrics Haleema Sadia 5
Examples of Regression Analysis
2. Consider the heights of boys measured at
fixed ages.
Basic
2020-10-07 Econometrics Haleema Sadia 6
Figure 1.2 Hypothetical distribution of heights
corresponding to selected ages.
Basic Econometrics
2020-10-07 Haleema Sadia 7
Examples of Regression Analysis
5. A labor economist may want to study the rate
of change of money wages in relation to the
unemployment rate.
Figure 1.3
Basic
2020-10-07 Econometrics Haleema Sadia 8
Examples of Regression Analysis
6. From monetary economics it is known that, other things
remaining the same, the higher the rate of inflation π, the lower
the proportion k of their income that people would want to hold in
the form of money, as depicted in Figure 1.4 (next slide).
Basic
2020-10-07 Econometrics Haleema Sadia 9
Figure 1.4 Money holding in relation to
the inflation rate π
Basic
2020-10-07 Econometrics Haleema Sadia 10
STATISTICAL AND DETERMINISTIC
RELATIONSHIPS
• In the regression analysis we are concerned
with that what is known as the statistical, not
functional or deterministic, dependence
among variables, such as those of classical
physics.
• In statistical relationships among variables we
essentially deal with random or stochastic
variables. These variables have probability
distributions.
Basic
2020-10-07 Econometrics Haleema Sadia 11
REGRESSION VERSUS CAUSATION
• Although regression analysis deals with the
dependence of one variable on other
variables, it does not necessarily imply
causation.
• A statistical relationship per se cannot logically
imply causation.
Basic
2020-10-07 Econometrics Haleema Sadia 12
REGRESSION VERSUS CORRELATION
• In the correlation analysis we try to measure
the strength or degree of linear association
between two variables. The correlation
coefficient measures this strength of (linear)
association
• In regression analysis we try to estimate the
average value of one variable on the basis of
the fixed values of other variables.
Basic Econometrics
2020-10-07 13
Haleema Sadia
REGRESSION VERSUS CORRELATION
• In correlation analysis we treat any two
variables symmetrically. There is no distinction
between variables. Both variables are
considered random.
Basic
2020-10-07 Econometrics Haleema Sadia 14
TERMINOLOGY
Dependent variable Explanatory variable
Explained variable Independent variable
Predictand Predictor
Regressand Regressor
Response Stimulus
Endogenous Exogenous
Outcome Covariate
Controlled variable Control variable
Basic
2020-10-07 Econometrics Haleema Sadia 15
TERMINOLOGY
• In a simple (two-variable) regression analysis
we study the dependence of a variable on only
a single explanatory variable, such as that of
consumption expenditure on real income.
• In a multiple regression analysis we study the
dependence of one variable on more than one
explanatory variable, such as that of money
demand on interest rates, income, and
inflation.
Basic
2020-10-07 Econometrics Haleema Sadia 16
TERMINOLOGY
• The term random is a synonym for the term
stochastic. A random (stochastic) variable is a
variable that can take on any set of values,
positive or negative, with a given probability.
Basic
2020-10-07 Econometrics Haleema Sadia 17
NOTATION
• Y: dependent variable
• X1, X2, … , Xk : explanatory variables
• Xk : kth explanatory variable
• Xki : ith observation on variable Xk (cross-sectional
data)
• Xkt : tth observation on variable Xk (time series data)
• N (or T): the total number of observations or values in
the population.
• n (or t): the total number of observations in the
sample. (time series data)
Basic
2020-10-07 Econometrics Haleema Sadia 18
TYPES OF DATA
• There are mainly three types of data for
empirical analysis:
1. Time series data
2. Cross sectional data
3. Pooled data
Basic
2020-10-07 Econometrics Haleema Sadia 19
Time series data
• A time series is a set of observations on the
values that a variable takes at different times.
Basic
2020-10-07 Econometrics Haleema Sadia 20
Cross-sectional data
• Cross-sectional data are data on one or more
variables collected at the same point in time.
GPA study hours/week
3.5 10
2.7 8
1.9 9
2.3 5
2.0 8
2.2 6
2.5 3
Basic
2020-10-07 Econometrics Haleema Sadia 21
Pooled data
• In the pooled data there are elements of both
time and cross-sectional data.
time GPA study hs/week
2000 2.5 9
2000 2.7 8
2000 2.3 6
2005 1.9 5
2005 3.1 12
2010 2.4 7
2010 2.0 5
2010 3.9 11
2010 1.2 2
Basic
2020-10-07 Econometrics Haleema Sadia 22
• Panel data is a special type of pooled data in
which the same cross-sectional unit is
surveyed over time.
person time GPA study
hs/week
1 2010 2.5 9
1 2011 2.7 7
1 2012 2.3 6
2 2010 1.9 8
2 2011 3.1 12
2 2012 2.4 6
3 2010 2.0 5
3 2011 3.9 11
3 2012 1.2 2
Basic
2020-10-07 Econometrics Haleema Sadia 23
Sources of Data
• Government agencies (Department of
Commerce...)
• International agencies (World Bank...)
• Surveys
Basic
2020-10-07 Econometrics Haleema Sadia 24
The quality of data which are used in
economics is often not that good.
1. Possibility of observational errors.
2. Approximations and roundoffs.
3. Nonresponce to surveys may cause selectivity
bias.
4. The sampling method used in obtaining the
data may vary so widely that it might be very
difficult to compare them.
Basic
2020-10-07 Econometrics Haleema Sadia 25
5. Economic data are generally available at a
highly aggregate level. Such highly aggregated
data may not tell us much about the individual
or micro level units (GNP...) .
6. Because of confidentiality, certain data can be
published only in highly aggregate form
(health data...).
Basic
2020-10-07 Econometrics Haleema Sadia 26