Lecture 01
Lecture 01
I Cross-sectional data
I Time series data
I Panel data
Cross-sectional data
This type of data is characterized by individual units (e.g. companies,
people, countries).
salaryi = α + β · yrs.servicei + i
PN
SSE = i=1 (salaryi − (α + β · yrs.servicei ))2
In this case it means that for 1 additional years of service the average
nine-month academic salary increases by 779.6 dollars. The intercept, α,
can only be interpreted if yrs.servicei can be zero. Let’s check:
100000
0 10 20 30 40 50 60
Years of service
However, salary cannot be explained by years of experience alone - we
can extend the model by including additional explanatory variables - we
will estimate a multiple regression model:
We note that from our model, having more experience lowers the salary
but the more time since PhD - the higher the salary.
Sometimes explanatory variables are tightly connected (e.g. linear
relationship) and it is impossible to disentangle the individual influences
of explanatory variables. A popular measure of multicollinearity is the
1
Variance Inflation Factor (VIF): VIF = , where Rk2 is the R 2 from
1 − Rk2
regressing the variable xk on all the remaining regressors.
## GVIF Df GVIF^(1/(2*Df))
## rank 2.013193 2 1.191163
## discipline 1.064105 1 1.031555
## yrs.since.phd 7.518936 1 2.742068
## yrs.service 5.923038 1 2.433729
## sex 1.030805 1 1.015285
## Estimate Pr(>|t|)
## (Intercept) 64919.67653 3.386603e-43
## rankAssocProf 5800.88482 2.500129e-01
## rankProf 34848.87453 3.064268e-08
## disciplineB 14297.08049 2.153495e-09
## yrs.since.phd 1460.65638 1.006698e-02
## I(yrs.since.phd^2) -23.91633 1.205955e-02
Note: multicollinearity inflates all the variances of βbj , thus deflates all
respective t − values and makes the coefficients insignificant. One of
the solutions is to use the F-test: if seemingly insignificant parameters
are truly zero, the F-test should not reject the joint null hypothesis:
H0 : βi = 0, βj = 0. If it rejects H0 , we have an indication that the low
t − values are due to multicollinearity.
Our final model states that the nine-month academic salary depends on
the academic rank as well as the discipline as well as the years since PhD
with its quadratic form. While the coefficient of yrs.since.phd is positive,
the coefficient of yrs.since.phd 2 is negative, which indicates that as a
person gets older, the effect of yrs.since.phd is lessened.
We now test the residuals of our model. The model form is accepted as
correct if there are no changes in the variance of the residuals (residual
variance must be constant) and there is no pattern to the residuals with
respect to the predicted values.
mdl <- lm(salary ~ rank + discipline
+ yrs.since.phd + I(yrs.since.phd^2),
data = Salaries)
par(mfrow = c(1,2))
plot(mdl, which= c(1, 2))
44 44
Standardized residuals
4
365250 365
250
50000
Residuals
2
0
0
−50000
−2
Sample Quantiles
20000
20000
0
0
−20000
−20000
We see that the weighted residual variance is more consistent. From the
quantile-quantile plot the residuals appear to be normal.
Time series data
A time series is a sequence of observations that are arranged according to
the time of their outcome. Time series data can be observed at many
frequencies: annual crop yield, quarterly financial reports, daily stock
prices, hourly wind speeds, etc.
The characteristic property of a time series is the fact that the data are
not generated independently, their dispersion varies in time, they are
often governed by a trend and they might have cyclic components.
50000
6000
woolyrnq
gas
30000
5000
4000
10000
0
1965 1970 1975 1980 1985 1990 1995 1960 1970 1980 1990
Time Time
In this course, we will use the notation Yt to indicate an observation on
variables Y at time t = 1, ..., T .
One objective of analyzing economic data is to predict the future values
of economic variables. One approach to do this is to build an
econometric model, describing the relationship between the variable of
interest and other economic quantities, then estimate the model using
sample data and use it as a basis for forecasting. However, this approach
is not always useful.
For example, it may be possible to adequately model the
contemporaneous relationship between unemployment and the inflation
rate, but as long as we cannot predict future inflation rates we are also
unable to forecast future unemployment.
In the first part of this course we will follow a pure time series approach -
we will assume that the current values of an economic variable are
related to its past values only. The emphasis is purely on making use of
the information in past values of a variable for forecasting its future. In
addition to producing forecasts, time series models also produce the
distribution of future values, conditional upon the past, and can thus be
used to evaluate the likelihood of certain events.
In the second part of this course we shall get to know different variants of
regressions with time series variables.
Finally, the most interesting results in econometrics are obtained in the
intersection of cross-sectional and time series methods…
Panel data
A dataset containing observations on multiple phenomena observed over
multiple time periods. Panel data aggregates all individuals and analyses
them in a period of time. Whereas time series and cross-sectional data
are one-dimensional, panel data sets are two dimensional.
require(plm)
#data(package = "plm")
data(Grunfeld)
Stock returns
Let Pt be the price of an asset at time t. Then, the one-period return is:
Pt − Pt−1
Rt =
Pt−1
suppressPackageStartupMessages({require("TSA")})
data("google")
plot.ts(google, main = "Daily returns of the google stock")
abline(0, 0, col = "red")
Daily returns of the google stock
0.15
0.10
0.05
google
0.00
−0.05
Time
Sales of shampoo
suppressPackageStartupMessages({require("fma")})
data(shampoo)
plot.ts(shampoo,
main = "Sales of shampoo over a three year period")
400
300
200
100
Time
Air passenger numbers 1949 - 1960
require("datasets")
data("AirPassengers")
plot.ts(AirPassengers,
main = "Monthly totals of international airline passenge
400
300
200
100
Time
Exchange rates
suppressPackageStartupMessages({require("Ecdat")})
data(Forward)
plot.ts(Forward$usdeuro,
main = "Monthly exchange rate USD/Euro")
1.0
0.8
0.6
Time
plot.ts(diff((Forward$usdeuro)),
main = "First Differences of USD/Euro exchagne rate")
abline(0, 0, col = "red")
0.00
−0.05
−0.10
Time
Forecasting