0% found this document useful (0 votes)

85 views

Introduction Econometrics R

Rajat Tayal gave a presentation on using R for econometrics. The presentation introduced linear regression models, including simple and multiple regression. It demonstrated how to estimate regression models in R and examine diagnostics. Linear regression with time series and panel data was also discussed. Key functions in R for modeling, estimation, and inference using linear regression were presented.

Uploaded by

Mayank Rawat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views

Introduction Econometrics R

Uploaded by

Mayank Rawat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Econometrics using R

Rajat Tayal
Fourth Quantitative Finance Workshop
December 21-December 24, 2012
Indian Institute of Technology, Kanpur

23 December 2012

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

1/1

Outline of the presentation

Linear regression
Simple linear regression
Multiple linear regression
Partially linear models
Factors, interactions, and weights
Linear regression with time series data
Linear regression with panel data
Systems of linear equations

Regression diagnostics
Leverage and standardized residuals
Deletion diagnostics
The function influence.measures()
Testing for heteroskedasticity
Testing for functional form
Testing for autocorrelation
Robust standard errors and tests
Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

2/1

Part I
Linear regression

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

3/1

Introduction
The linear regression model, typically estimated by ordinary least squares
(OLS), is the workhorse of applied econometrics. The model is
yi = xiT + i , i = 1, 2, . . . , n.

(1)

y = X +

(2)

E (|X ) = 0

(3)

Var (|X ) = 2 I

(4)

E (j |xi ) = 0, i j.

(5)

For cross-sections:

For time series:

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

4/1

Introduction

We estimate the OLS by:

= (X T X )1 X T y

(6)

the residuals are = y y

The corresponding fitted values are: y = X ,
T
and the residual sum of squares is .
In R, models are typically fitted by calling a model-fitting function, in this
case lm(), with a formula object describing the model and a data.frame
object containing the variables used in the formula.
fm < lm(formula, data, . . .)

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

5/1

The first example

Data from Stock & Watson (2007) on subscriptions to economics journals
at US libraries for the year 2000:
>
>
>
>
>
>

install.packages("AER", dependencies=TRUE)
library(AER)
data("Journals") ; names("Journals")
journals <- Journals[, c("subs", "price")]
journals$citeprice <- Journals$price/Journals$citations
summary(journals)
subs
price
citeprice
Min.
:
2.0
Min.
: 20.0
Min.
: 0.005223
1st Qu.: 52.0
1st Qu.: 134.5
1st Qu.: 0.464495
Median : 122.5
Median : 282.0
Median : 1.320513
Mean
: 196.9
Mean
: 417.7
Mean
: 2.548455
3rd Qu.: 268.2
3rd Qu.: 540.8
3rd Qu.: 3.440171
Max.
:1098.0
Max.
:2120.0
Max.
:24.459459
Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

6/1

The first example

In view of the wide range of the variables, combined with a

considerable amount of skewness, it is useful to take logarithms.
The goal is to estimate the effect of the price per citation on the
number of library subscriptions.
To explore this issue quantitatively, we will fit a linear regression
model,
log (subs)i = 1 + 2 log (citeprice)i + i

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

(7)

7/1

The first example

Here, the formula of interest is log (subs) log (citeprice). This can be used
both for plotting and for model fitting:
> plot(log(subs) ~ log(citeprice), data = journals)
> jour_lm <- lm(log(subs) ~ log(citeprice), data = journals)
> abline(jour_lm)
abline() extracts the coefficients of the fitted model and adds the
corresponding regression line to the plot.

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

8/1

The first example

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

9/1

The first example

The function lm() returns a fitted-model object, here stored as jour lm.
It is an object of class lm.

> class(jour_lm)
[1] "lm"
> names(jour_lm)
[1] "coefficients" "residuals" "effects" "rank" "fitted.val
[7] "qr" "df.residual" "xlevels" "call" "terms" "model"

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

10 / 1

The first example

> summary(jour_lm)
Call:
lm(formula = log(subs) ~ log(citeprice), data = journals)
Residuals:
Min
1Q
Median
3Q
Max
-2.72478 -0.53609 0.03721 0.46619 1.84808
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
4.76621
0.05591
85.25
<2e-16 ***
log(citeprice) -0.53305
0.03561 -14.97
<2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 0.7497 on 178 degrees of freedom
Multiple R-squared: 0.5573,
Adjusted R-squared: 0.5548
F-statistic:
224 on 1 and 178 DF, p-value: < 2.2e-16
Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

11 / 1

Generic functions for fitted (linear) model objects

Function
print()
summary()
coef() (or coefficients())
residuals() (or resid())
fitted() (or fitted.values())
anova()
predict()
plot()
confint()
deviance()
vcov()
logLik()
AIC()

Rajat Tayal (IIT Kanpur)

Function Description
simple printed display
standard regression output
extracting the regression coefficients
extracting residuals
extracting fitted values
comparison of nested models
predictions for new data
diagnostic plots
confidence intervals for the regression
coefficients
residual sum of squares
(estimated) variance-covariance matrix
log-likelihood (assuming normally distributed
errors)
information criteria including AIC, BIC/SBC
(assuming normally distributed errors)

Introduction to Estimation/Computing Environment -II23 December 2012

12 / 1

The first example

It is instructive to take a brief look at what the summary() method
returns for a fitted lm object:

> jour_slm <- summary(jour_lm)

> class(jour_slm)
[1] "summary.lm"
> names(jour_slm)
[1] "call" "terms" "residuals" "coefficients" "aliased" "sigm
[7] "df" "r.squared" "adj.r.squared" "fstatistic" "cov.unscal
> jour_slm$coefficients
Estimate Std. Error
t value
Pr(>|t|)
(Intercept)
4.7662121 0.05590908 85.24934 2.953913e-146
log(citeprice) -0.5330535 0.03561320 -14.96786 2.563943e-33

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

13 / 1

Analysis of variance
> anova(jour_lm)
Analysis of Variance Table
Response: log(subs)
Df Sum Sq Mean Sq F value
Pr(>F)
log(citeprice)
1 125.93 125.934 224.04 < 2.2e-16 ***
Residuals
178 100.06
0.562
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
The ANOVA table breaks the sum of squares about the mean (for the
dependent variable, here log(subs)) into two parts: a part that is
accounted for by a linear function of log(citeprice) and a part attributed to
residual variation.
Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

14 / 1

Point and Interval estimates

the function coef() can

To extract the estimated regression coefficients ,
be used:
> coef(jour_lm)
(Intercept) log(citeprice)
4.7662121
-0.5330535
> confint(jour_lm, level = 0.95)
2.5 %
97.5 %
(Intercept)
4.6558822 4.8765420
log(citeprice) -0.6033319 -0.4627751

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

15 / 1

Prediction

Two types of predictions:

1
2

the prediction of points on the regression line and

the prediction of a new data value.

The standard errors of predictions for new data take into account
both the uncertainty in the regression line and the variation of the
individual points about the line.
Thus, the prediction interval for prediction of new data is larger than
that for prediction of points on the line. The function predict()
provides both types of standard errors.

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

16 / 1

Prediction

> predict(jour_lm, newdata = data.frame(citeprice = 2.11),

interval = "confidence")
fit
lwr
upr
1 4.368188 4.247485 4.48889
> predict(jour_lm, newdata = data.frame(citeprice = 2.11),
interval = "prediction")
fit
lwr
upr
1 4.368188 2.883746 5.852629
The point estimates are identical (fit) but the intervals differ.
The prediction intervals can also be used for computing and visualizing
confidence bands.

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

17 / 1

Prediction

>
>
>
>
>
>

lciteprice <- seq(from = -6, to = 4, by = 0.25)

jour_pred <- predict(jour_lm, interval = "prediction", newda
plot(log(subs) ~ log(citeprice), data = journals)
lines(jour_pred[, 1] ~ lciteprice, col = 1)
lines(jour_pred[, 2] ~ lciteprice, col = 1, lty = 2)
lines(jour_pred[, 3] ~ lciteprice, col = 1, lty = 2)

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

18 / 1

Prediction

Figure: Scatterplot with prediction intervals for the journals data

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

19 / 1

Plotting lm objects

The plot() method for class lm() provides six types of diagnostic plots,
four of which are shown by default.
We set the graphical parameter mfrow to c(2, 2) using the par() function,
creating a 2 2 matrix of plotting areas to see all four plots simultaneously:
> par(mfrow = c(2, 2))
> plot(jour_lm)
> par(mfrow = c(1, 1))
The first provides a graph of residuals versus fitted values, the second is a
QQ plot for normality, plots three and four are a scale-location plot and a
plot of standardized residuals against leverages, respectively.

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

20 / 1

Plotting lm objects

Figure: Diagnostic plots for the journals data

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

21 / 1

Testing a linear hypothesis

The standard regression output as provided by summary() only indicates

individual significance of each regressor and joint significance of all
regressors in the form of t and F statistics, respectively. Often it is
necessary to test more general hypotheses.
This is possible using the function linear.hypothesis() from the car package.
Suppose we want to test the hypothesis that the elasticity of the number
of library subscriptions with respect to the price per citation equals 0.5.
H0 : 2 = 0.5

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

(8)

22 / 1

Testing a linear hypothesis

> linear.hypothesis(jour_lm, "log(citeprice) = -0.5")

Linear hypothesis test
Hypothesis:
log(citeprice) = - 0.5
Model 1: restricted model
Model 2: log(subs) ~ log(citeprice)
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
179 100.54
2
178 100.06 1
0.48421 0.8614 0.3546

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

23 / 1

Multiple linear regression

In economics, most regression analyses comprise more than a single
regressor. Often there are regressors of a special type, usually referred to
as dummy variables in econometrics, which are used for coding
categorical variables.
> data("CPS1988")
> summary(CPS1988)
wage
Min.
:
50.05
1st Qu.: 308.64
Median : 522.32
Mean
: 603.73
3rd Qu.: 783.48
Max.
:18777.20

education
Min.
: 0.00
1st Qu.:12.00
Median :12.00
Mean
:13.07
3rd Qu.:15.00
Max.
:18.00

experience
Min.
:-4.0
1st Qu.: 8.0
Median :16.0
Mean
:18.2
3rd Qu.:27.0
Max.
:63.0

ethnicity
cauc:25923
afam: 2232

smsa
no : 7223
yes:20932

region
northeast:6441
midwest :6863
south
:8760
west
:6091

parttime
no :25631
yes: 2524

The model of interest is

log (wage) = 1 +2 experience+3 experience 2 +4 education+5 ethnicity +
(9)

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

24 / 1

Multiple linear regression

> cps_lm <- lm(log(wage) ~ experience + I(experience^2) + education
+ ethnicity, data = CPS1988)
> summary(cps_lm)
Call:
lm(formula = log(wage) ~ experience + I(experience^2) + education +
ethnicity, data = CPS1988)
Residuals:
Min
1Q Median
3Q
Max
-2.9428 -0.3162 0.0580 0.3756 4.3830
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
4.321e+00 1.917e-02 225.38
<2e-16 ***
experience
7.747e-02 8.800e-04
88.03
<2e-16 ***
I(experience^2) -1.316e-03 1.899e-05 -69.31
<2e-16 ***
education
8.567e-02 1.272e-03
67.34
<2e-16 ***
ethnicityafam
-2.434e-01 1.292e-02 -18.84
<2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 0.5839 on 28150 degrees of freedom
Multiple R-squared: 0.3347,
Adjusted R-squared: 0.3346
F-statistic: 3541 on 4 and 28150 DF, p-value: < 2.2e-16
Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

25 / 1

Comparison of models
With more than a single explanatory variable, it is interesting to test for
the relevance of subsets of regressors. For any two nested models, this can
be done using the function anova(). E.g. to test for the relevance of the
variable ethnicity, we explicitly fit the model without ethnicity and then
compare both models.
> cps_noeth <- lm(log(wage) ~ experience + I(experience^2) +
education, data = CPS1988)
> anova(cps_noeth, cps_lm)
Analysis of Variance Table
Model 1: log(wage) ~ experience + I(experience^2) + education
Model 2: log(wage) ~ experience + I(experience^2) + education + ethnicity
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1 28151 9719.6
2 28150 9598.6 1
121.02 354.91 < 2.2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
This reveals that the effect of ethnicity is significant at any reasonable level.

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

26 / 1

Comparison of models

> cps_noeth <- update(cps_lm, formula = . ~ . - ethnicity)

> waldtest(cps_lm, . ~ . - ethnicity)
Wald test
Model 1: log(wage) ~ experience + I(experience^2) + education
Model 2: log(wage) ~ experience + I(experience^2) + education
Res.Df Df
F
Pr(>F)
1 28150
2 28151 -1 354.91 < 2.2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

27 / 1

Part II
Linear regression with panel data

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

28 / 1

Introduction

There has been considerable interest in panel data econometrics over

the last two decades.
The package plm (Croissant and Millo 2008) contains the relevant
fitting functions and methods for specifications in R.
Two types of panel data models:
1
2

Statis linear models

Dynamic linear models

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

29 / 1

Introduction
For illustrating the basic fixed- and random-effects methods, we use the
wellknown Grunfeld data (Grunfeld 1958) comprising 20 annual
observations on the three variables real gross investment (invest), real
value of the firm (value), and real value of the capital stock (capital) for
11 large US firms for the years 1935-1954.
>
>
>
>

data("Grunfeld", package = "AER")

library("plm")
gr <- subset(Grunfeld, firm %in% c("General Electric", "Gene
pgr <- plm.data(gr, index = c("firm", "year"))

# The last command tells R that the individuals are

called "firm", whereas the time identifier is called "year".
Alternatively, pooled OLS can be run using:
> gr_pool <- plm(invest ~ value + capital, data = pgr, model
= "pooling")
Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

30 / 1

One-way panel regression

investit = 1 values + 2 capital + i + it
(10)
where i = 1, . . . , n, t = 1, . . . , T, and the i denote the individual-specific effects. A
fixed-effects version is estimated by running OLS on a within-transformed model:
> gr_fe <- plm(invest ~ value + capital, data = pgr, model = "within")
> summary(gr_fe)
Oneway (individual) effect Within Model
Call:
plm(formula = invest ~ value + capital, data = pgr, model = "within")
Balanced Panel: n=3, T=20, N=60
Residuals :
Min. 1st Qu. Median 3rd Qu.
Max.
-167.00 -26.10
2.09
26.80 202.00
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
value
0.104914
0.016331 6.4242 3.296e-08 ***
capital 0.345298
0.024392 14.1564 < 2.2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Total Sum of Squares:
1888900 Residual Sum of Squares: 243980
R-Squared
: 0.87084
Adj. R-Squared : 0.79827
F-statistic: 185.407 on 2 and 55 DF, p-value: < 2.22e-16
Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

31 / 1

One-way panel regression

A two-way model could have been estimated upon setting effect =
twoways.
If fixed effects need to be inspected, a fixef() method and an
associated summary() method are available.
To check whether the fixed effects are really needed, we compare the
fixed effects and the pooled OLS fits by means of pFtest().
> gr_fe <- plm(invest ~ value + capital, data = pgr, model = "within")
> pFtest(gr_fe, gr_pool)
F test for individual effects
data: invest ~ value + capital
F = 56.8247, df1 = 2, df2 = 55, p-value = 4.148e-14
alternative hypothesis: significant effects

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

32 / 1

One-way panel regression

It is also possible to fit a random-effects version of (3.3) using the

same fitting function upon setting model = random and selecting a
method for estimating the variance components.
Four methods are available: Swamy-Arora, Amemiya,
Wallace-Hussain, and Nerlove.
> gr_re <- plm(invest ~ value + capital, data = pgr, model = "random",
random.method = "walhus")
> summary(gr_re)

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

33 / 1

One-way panel regression

Oneway (individual) effect Random Effect Model
(Wallace-Hussains transformation)
Call:
plm(formula = invest ~ value + capital, data = pgr, model = "random",
random.method = "walhus")
Balanced Panel: n=3, T=20, N=60
Effects:
var std.dev share
idiosyncratic 4389.31
66.25 0.352
individual
8079.74
89.89 0.648
theta: 0.8374
Residuals :
Min. 1st Qu. Median 3rd Qu.
Max.
-187.00 -32.90
6.96
31.40 210.00
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
(Intercept) -109.976572
61.701384 -1.7824
0.08001 .
value
0.104280
0.014996 6.9539 3.797e-09 ***
capital
0.344784
0.024520 14.0613 < 2.2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Total Sum of Squares:
1988300
Residual Sum of Squares: 257520
R-Squared
: 0.87048
Adj. R-Squared : 0.82696
Rajat Tayal (IIT
Kanpur) on 2 Introduction
to Estimation/Computing
Environment -II23 December 2012
F-statistic:
191.545
and 57 DF,
p-value: < 2.22e-16

34 / 1

One-way panel regression

A comparison of the regression coefficients shows that fixed- and
randomeffects methods yield rather similar results for these data.
To check whether the random effects are really needed, a Lagrange
multiplier test is available in plmtest(), defaulting to the test
proposed by Honda (1985).
> plmtest(gr_pool)
Lagrange Multiplier Test - (Honda)
data: invest ~ value + capital
normal = 15.4704, p-value < 2.2e-16
alternative hypothesis: significant effects

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

35 / 1

One-way panel regression

Random-effects methods are more efficient than the fixed-effects
estimator under more restrictive assumptions, namely exogeneity of
the individual effects. It is therefore important to test for endogeneity,
and the standard approach employs a Hausman test. The relevant
function phtest() requires two panel regression objects, in our case
yielding
> phtest(gr_re, gr_fe)
Hausman Test
data: invest ~ value + capital
chisq = 0.0404, df = 2, p-value = 0.98
alternative hypothesis: one model is inconsistent
In line with the rather similar estimates presented above, endogeneity
does not appear to be a problem here.
Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

36 / 1

Dynamic linear models

To conclude this section, we present a more advanced example, the

dynamic panel data model:
yit =

p
X

j yi,tj + xitT + uit , uit = i + t + it

(11)

i=1

This is estimated by the method of Arellano and Bond (1991) viz.

generalized method of moments (GMM) estimator utilizing lagged
endogenous regressors after a first-differences transformation.

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

37 / 1

Dynamic linear models

> data("EmplUK", package = "plm")

> form <- log(emp) ~ log(wage) + log(capital) + log(output)
# The function providing the Arellano-Bond estimator
is pgmm().
>
+
+
+

empl_ab <- pgmm(dynformula(form, list(2, 1, 0, 1)),

data = EmplUK, index = c("firm", "year"),
effect = "twoways", model = "twosteps",
gmm.inst = ~ log(emp), lag.gmm = list(c(2, 99)))

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

38 / 1

Dynamic linear models

Twoways effects Two steps model
Call:
pgmm(formula = dynformula(form, list(2, 1, 0, 1)), data = EmplUK,
effect = "twoways", model = "twosteps", index = c("firm",
"year"), ... = list(gmm.inst = ~log(emp), lag.gmm = list(c(2,
99))))
Unbalanced Panel: n=140, T=7-9, N=1031
Number of Observations Used: 611
Residuals
Min.
1st Qu.
Median
Mean
3rd Qu.
Max.
-0.6191000 -0.0255700 0.0000000 -0.0001339 0.0332000 0.6410000
Coefficients
Estimate Std. Error z-value Pr(>|z|)
lag(log(emp), c(1, 2))1 0.474151
0.085303
5.5584 2.722e-08 ***
lag(log(emp), c(1, 2))2 -0.052967
0.027284 -1.9413 0.0522200 .
log(wage)
-0.513205
0.049345 -10.4003 < 2.2e-16 ***
lag(log(wage), 1)
0.224640
0.080063
2.8058 0.0050192 **
log(capital)
0.292723
0.039463
7.4177 1.191e-13 ***
log(output)
0.609775
0.108524
5.6188 1.923e-08 ***
lag(log(output), 1)
-0.446373
0.124815 -3.5763 0.0003485 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

39 / 1

Dynamic linear models

Sargan Test: chisq(25) = 30.11247 (p.value=0.22011)

Autocorrelation test (1): normal = -2.427829 (p.value=0.0075948)
Autocorrelation test (2): normal = -0.3325401 (p.value=0.36974)
Wald test for coefficients: chisq(7) = 371.9877 (p.value=< 2.22e-16)
Wald test for time dummies: chisq(6) = 26.9045 (p.value=0.0001509)

The results suggest that autoregressive dynamics are important for these
data.

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

40 / 1

Part III
Regression diagnostics

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

41 / 1

Review

>
>
>
>
>

data("Journals")
journals <- Journals[, c("subs", "price")]
journals$citeprice <- Journals$price/Journals$citations
journals$age <- 2000 - Journals$foundingyear
jour_lm <- lm(log(subs) ~ log(citeprice), data = journals)

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

42 / 1

Testing for heteroskedasticity

For cross-section regressions, the assumption Var (i |xi ) = 2 is

typically in doubt. A popular test for checking this assumption is the
Breusch-Pagan test (Breusch and Pagan 1979).
For our model fitted to the journals data, stored in jour lm, the
diagnostic plots in suggest that the variance decreases with the fitted
values or, equivalently, it increases with the price per citation.
Hence, the regressor log(citeprice) used in the main model should also
be employed for the auxiliary regression.
Under H0 , the test statistic of the Breusch-Pagan test approximately
follows a 2q distribution, where q is the number of regressors in the
auxiliary regression (excluding the constant term).

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

43 / 1

Testing for heteroskedasticity

The function bptest() implements all these flavors of the
Breusch-Pagan test. By default, it computes the studentized statistic
for the auxiliary regression utilizing the original regressors X.
> bptest(jour_lm)
studentized Breusch-Pagan test
data: jour_lm
BP = 9.803, df = 1, p-value = 0.001742
Alternatively, the White test picks up the heteroskedasticity. It uses
the original regressors as well as their squares and interactions in the
auxiliary regression, which can be passed as a second formula to
bptest().
> bptest(jour_lm, ~ log(citeprice) + I(log(citeprice)^2),
+ data = journals)
studentized Breusch-Pagan test
data: jour_lm
BP = 10.912, df = 2, p-value = 0.004271
Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

44 / 1

Testing the functional form

The assumption E (|X ) = 0 is crucial for consistency of the
least-squares estimator. A typical source for violation of this
assumption is a misspecification of the functional form; e.g., by
omitting relevant variables. One strategy for testing the functional
form is to construct auxiliary variables and assess their significance
using a simple F test. This is what Ramseys RESET does.
The function resettest() defaults to using second and third powers of
the fitted values as auxiliary variables.
> resettest(jour_lm)
RESET test
data: jour_lm
RESET = 1.4409, df1 = 2, df2 = 176, p-value = 0.2395

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

45 / 1

Testing the functional form

The rainbow test (Utts 1982) takes a different approach to testing

the functional form. It fits a model to a subsample (typically the
middle 50%) and compares it with the model fitted to the full sample
using an F test.
> raintest(jour_lm, order.by = ~ age, data = journals)
Rainbow test
data: jour_lm
Rain = 1.774, df1 = 90, df2 = 88, p-value = 0.003741
This appears to be the case, signaling that the relationship between
the number of subscriptions and the price per citation also depends
on the age of the journal.

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

46 / 1

Testing for autocorrelation

Let us reconsider the first model for the US consumption function.
> library(dynlm)
> data("USMacroG")
> consump1 <- dynlm(consumption ~ dpi + L(dpi), data =
USMacroG)

A classical testing procedure suggested for assessing autocorrelation

in regression relationships is the Durbin-Watson test (Durbin and
Watson 1950).
> dwtest(consump1)
Durbin-Watson test
data: consump1
DW = 0.0866, p-value < 2.2e-16
alternative hypothesis: true autocorrelation is greater than 0

Further tests for autocorrelation are the Box-Pierce test and the
Ljung-Box test, both being implemented in the function Box.test() in
base R.
> Box.test(residuals(consump1), type = "Ljung-Box")
Box-Ljung test
Rajat
Tayal (IITresiduals(consump1)
Kanpur)
Introduction to Estimation/Computing Environment -II23 December 2012
data:

47 / 1

Thats all for today

Thank You....

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

48 / 1

Introduction to Statistical Methods for Financial Models 1st Severini Solution Manual - Latest Version Can Be Downloaded Immediately
100% (4)
Introduction to Statistical Methods for Financial Models 1st Severini Solution Manual - Latest Version Can Be Downloaded Immediately
54 pages
Quantitative Finance Cheat Sheet: Amit Kumar Jha, UBS
No ratings yet
Quantitative Finance Cheat Sheet: Amit Kumar Jha, UBS
11 pages
Quantitative Equity Portfolio Management: An Active Approach to Portfolio Construction and Management
From Everand
Quantitative Equity Portfolio Management: An Active Approach to Portfolio Construction and Management
Ludwig B. Chincarini
4.5/5 (2)
GEA-24 - Resellers PRICE LIST JANUARY 2023
No ratings yet
GEA-24 - Resellers PRICE LIST JANUARY 2023
1 page
Operation Project
No ratings yet
Operation Project
17 pages
Table of Contents - Latest
No ratings yet
Table of Contents - Latest
5 pages
Blue Dart
No ratings yet
Blue Dart
6 pages
Quantitative Analysis and Modelling
No ratings yet
Quantitative Analysis and Modelling
3 pages
in Your Line of Work, Cite A Situation Using One of The Quantitative Techniques As Basis of Decision/s You Made
No ratings yet
in Your Line of Work, Cite A Situation Using One of The Quantitative Techniques As Basis of Decision/s You Made
57 pages
Machine Learning: An Applied Econometric Approach
100% (1)
Machine Learning: An Applied Econometric Approach
31 pages
Krolzig Markov-Switching Vector Autoregressions - Modelling, Statistical Inference, and Application To Business Cycle A
No ratings yet
Krolzig Markov-Switching Vector Autoregressions - Modelling, Statistical Inference, and Application To Business Cycle A
375 pages
Advance Stats
No ratings yet
Advance Stats
233 pages
Eri WP Predicting Decomposing Risk Data Driven Portfolios 0 PDF
No ratings yet
Eri WP Predicting Decomposing Risk Data Driven Portfolios 0 PDF
46 pages
Time Series Analysis in Python With Statsmodels
No ratings yet
Time Series Analysis in Python With Statsmodels
8 pages
Immediate download Paul Wilmott on Quantitative Finance 3 Volume Set (2nd Edition) ebooks 2024
100% (7)
Immediate download Paul Wilmott on Quantitative Finance 3 Volume Set (2nd Edition) ebooks 2024
36 pages
HiddenMarkovModels RobertFreyStonyBrook PDF
No ratings yet
HiddenMarkovModels RobertFreyStonyBrook PDF
34 pages
Harlow 1991
No ratings yet
Harlow 1991
13 pages
Tidy Portfoliomanagement in R
100% (1)
Tidy Portfoliomanagement in R
94 pages
Practice of Analysis of Financial Time Series by Ruey S Tsay
No ratings yet
Practice of Analysis of Financial Time Series by Ruey S Tsay
10 pages
Kbai Study Guide
No ratings yet
Kbai Study Guide
5 pages
20 A Long-Run and Short-Run Component Model of Stock Return Volatility
No ratings yet
20 A Long-Run and Short-Run Component Model of Stock Return Volatility
23 pages
Analytical Pricing of Basket Default Swaps in A Dynamic Hull & White Framework
No ratings yet
Analytical Pricing of Basket Default Swaps in A Dynamic Hull & White Framework
18 pages
Calibration of The Schwartz-Smith Model For Commodity Prices
100% (1)
Calibration of The Schwartz-Smith Model For Commodity Prices
77 pages
MSM Specification: Discrete Time
No ratings yet
MSM Specification: Discrete Time
5 pages
Markov Interest Rate Models - Hagan and Woodward
No ratings yet
Markov Interest Rate Models - Hagan and Woodward
28 pages
Useful Stata Commands
No ratings yet
Useful Stata Commands
48 pages
Statistics Done Wrong PDF
No ratings yet
Statistics Done Wrong PDF
27 pages
Gabillon Oil Futures Curve
100% (1)
Gabillon Oil Futures Curve
52 pages
Risk Parity Portfolio vs. Other Asset Allocation Heuristic Portfolios
No ratings yet
Risk Parity Portfolio vs. Other Asset Allocation Heuristic Portfolios
11 pages
Chapter5 Solutions
100% (1)
Chapter5 Solutions
12 pages
New Trends in Energy Derivatives: Alexander Eydeland Morgan Stanley
No ratings yet
New Trends in Energy Derivatives: Alexander Eydeland Morgan Stanley
33 pages
Terence C. Mills - The Econometric of Modelling of Financial Time Series
No ratings yet
Terence C. Mills - The Econometric of Modelling of Financial Time Series
11 pages
Sornette - Physics and Financial Economics (1776-2014) - Puzzles, Ising and Agent-Based Models
No ratings yet
Sornette - Physics and Financial Economics (1776-2014) - Puzzles, Ising and Agent-Based Models
76 pages
Alternative To Profit Maximisation
No ratings yet
Alternative To Profit Maximisation
11 pages
preview-9781000176766_A39526004
No ratings yet
preview-9781000176766_A39526004
35 pages
FM GWP 1 Report
No ratings yet
FM GWP 1 Report
7 pages
HD - Machine Learnind and Econometrics
No ratings yet
HD - Machine Learnind and Econometrics
185 pages
Advice
No ratings yet
Advice
16 pages
Stochastic Calculus: Summary. by Celine Azizieh (Université Libre de Bruxelles)
No ratings yet
Stochastic Calculus: Summary. by Celine Azizieh (Université Libre de Bruxelles)
360 pages
Change Point Detection
No ratings yet
Change Point Detection
23 pages
Elements of Forecasting
No ratings yet
Elements of Forecasting
344 pages
Advance Stochastic Calculus (Abstracts) PDF
100% (2)
Advance Stochastic Calculus (Abstracts) PDF
106 pages
Postgraduate Courses in Economics and Econometrics 2014 PDF
No ratings yet
Postgraduate Courses in Economics and Econometrics 2014 PDF
20 pages
Introduction To Stochastic Processes
No ratings yet
Introduction To Stochastic Processes
22 pages
Financial Econometrics
No ratings yet
Financial Econometrics
4 pages
Heath Jarrow Morton A Interest Rate Model For CVA Calculations
No ratings yet
Heath Jarrow Morton A Interest Rate Model For CVA Calculations
9 pages
Monte Carlo Simulation: Assignment 1
No ratings yet
Monte Carlo Simulation: Assignment 1
13 pages
Derivatives and Risk Management - Bhaskar Sinha
No ratings yet
Derivatives and Risk Management - Bhaskar Sinha
3 pages
Bootstrap PDF
No ratings yet
Bootstrap PDF
24 pages
Libor Market Model Joshi
No ratings yet
Libor Market Model Joshi
2 pages
Introductory Econometrics For Finance Chris Brooks Solutions To Review Questions - Chapter 5
No ratings yet
Introductory Econometrics For Finance Chris Brooks Solutions To Review Questions - Chapter 5
9 pages
Two Stage Fama Macbeth
100% (1)
Two Stage Fama Macbeth
5 pages
Models in Microeconomic Theory: Martin J. Osborne Ariel Rubinstein
No ratings yet
Models in Microeconomic Theory: Martin J. Osborne Ariel Rubinstein
363 pages
Portfolio Optimization CVaR
No ratings yet
Portfolio Optimization CVaR
36 pages
Multivariate GARCH Models: Software Choice and Estimation Issues
No ratings yet
Multivariate GARCH Models: Software Choice and Estimation Issues
21 pages
Crowding The Exits - Quantitative Equity Market Portfolio Endogeni
No ratings yet
Crowding The Exits - Quantitative Equity Market Portfolio Endogeni
43 pages
Bates 1996
No ratings yet
Bates 1996
40 pages
Machine Learning for Time Series Forecasting with Python 1st Edition Francesca Lazzeridownload
100% (1)
Machine Learning for Time Series Forecasting with Python 1st Edition Francesca Lazzeridownload
53 pages
Quantitative Strategies for Achieving Alpha: The Standard and Poor's Approach to Testing Your Investment Choices
From Everand
Quantitative Strategies for Achieving Alpha: The Standard and Poor's Approach to Testing Your Investment Choices
Richard Tortoriello
4/5 (1)
Principles of Quantitative Development
From Everand
Principles of Quantitative Development
Manoj Thulasidas
No ratings yet
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
From Everand
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
Joseph O. Esin
No ratings yet
Machine Learning: Fundamentals and Applications
From Everand
Machine Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Kardex - Job Description - Territory Sales South
No ratings yet
Kardex - Job Description - Territory Sales South
2 pages
ARIMA Modeling:: B-J Procedure
No ratings yet
ARIMA Modeling:: B-J Procedure
26 pages
Variable Description
No ratings yet
Variable Description
2 pages
Regression Analysis in R
No ratings yet
Regression Analysis in R
7 pages
Times OOH - Case Study - Innovation in OOH Industry
No ratings yet
Times OOH - Case Study - Innovation in OOH Industry
6 pages
Kock2016 Minimum Sample Size Estimation in PLS-SEM
No ratings yet
Kock2016 Minimum Sample Size Estimation in PLS-SEM
35 pages
07: Regularization: The Problem of Overfitting
No ratings yet
07: Regularization: The Problem of Overfitting
5 pages
M.tech Question Paper 2021-2022
No ratings yet
M.tech Question Paper 2021-2022
9 pages
Statistics and Probability: Lecture 10: Test of Hypotheses For A Single Sample
No ratings yet
Statistics and Probability: Lecture 10: Test of Hypotheses For A Single Sample
14 pages
Efficient Maximum Likelihood Decoding of Linear Block Codes Using A Trellis
No ratings yet
Efficient Maximum Likelihood Decoding of Linear Block Codes Using A Trellis
5 pages
Lesson 11 - Regression and Correlation Analysis
No ratings yet
Lesson 11 - Regression and Correlation Analysis
8 pages
Convergence of Martingales: 1. Maximal Inequalities
No ratings yet
Convergence of Martingales: 1. Maximal Inequalities
12 pages
RVSP CO-1 Material
No ratings yet
RVSP CO-1 Material
69 pages
Covariance and Corelation
No ratings yet
Covariance and Corelation
19 pages
Assumption of Normality
No ratings yet
Assumption of Normality
27 pages
The Normal Distribution
No ratings yet
The Normal Distribution
5 pages
X X Z N N: Interval Estimate
No ratings yet
X X Z N N: Interval Estimate
10 pages
ECN 132 Course Outline
100% (1)
ECN 132 Course Outline
2 pages
Simple Linear Regression Analysis: Mcgraw-Hill/Irwin
No ratings yet
Simple Linear Regression Analysis: Mcgraw-Hill/Irwin
16 pages
Chap1 Student
No ratings yet
Chap1 Student
14 pages
Tutorial Session 10 Autocorrelation Solution
No ratings yet
Tutorial Session 10 Autocorrelation Solution
4 pages
HW 2 Solution
No ratings yet
HW 2 Solution
10 pages
Short-Term Actuarial Mathematics Exam-October 2021
No ratings yet
Short-Term Actuarial Mathematics Exam-October 2021
8 pages
2 Interval Estimation
0% (1)
2 Interval Estimation
41 pages
Probability 2nd Edition An Introduction with Statistical Applications John J. Kinney - Download the ebook and explore the most detailed content
100% (3)
Probability 2nd Edition An Introduction with Statistical Applications John J. Kinney - Download the ebook and explore the most detailed content
57 pages
Additional Topics With Hypothesis Testing: Math Courseware Specialists
No ratings yet
Additional Topics With Hypothesis Testing: Math Courseware Specialists
55 pages
Chapter - 5 (New) PDF
No ratings yet
Chapter - 5 (New) PDF
17 pages
Actl 3001/5104: Actuarial Statistics Mid-Term Exam: School of Risk and Actuarial Studies SESSION 1, 2013
No ratings yet
Actl 3001/5104: Actuarial Statistics Mid-Term Exam: School of Risk and Actuarial Studies SESSION 1, 2013
16 pages
Application of Statistical Concepts in The Determination of Weight Variation in Samples
100% (1)
Application of Statistical Concepts in The Determination of Weight Variation in Samples
4 pages
Probability Cheatsheet Midterm
No ratings yet
Probability Cheatsheet Midterm
3 pages
The Forecasting Dictionary: J. Scott Armstrong (Ed.) : Norwell, MA: Kluwer Academic Publishers, 2001
No ratings yet
The Forecasting Dictionary: J. Scott Armstrong (Ed.) : Norwell, MA: Kluwer Academic Publishers, 2001
62 pages
Econ 131 Problem Set 2
No ratings yet
Econ 131 Problem Set 2
1 page
ARCH and GARCH MODEL
No ratings yet
ARCH and GARCH MODEL
3 pages
Statistics and Probability q4 Mod13 Drawing Conclusions About Population Proportion Based On Test Statistic Value and Rejection Region V2pdf
100% (1)
Statistics and Probability q4 Mod13 Drawing Conclusions About Population Proportion Based On Test Statistic Value and Rejection Region V2pdf
24 pages

Introduction Econometrics R

Uploaded by

Introduction Econometrics R

Uploaded by

Econometrics using R

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

Outline of the presentation

Introduction to Estimation/Computing Environment -II 23 December 2012

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

For time series:

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

We estimate the OLS by:

the residuals are  = y y

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

The first example

Introduction to Estimation/Computing Environment -II 23 December 2012

The first example

In view of the wide range of the variables, combined with a

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

The first example

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

The first example

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II 23 December 2012

The first example

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

The first example

Introduction to Estimation/Computing Environment -II23 December 2012

Generic functions for fitted (linear) model objects

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

The first example

> jour_slm <- summary(jour_lm)

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

Introduction to Estimation/Computing Environment -II23 December 2012

Point and Interval estimates

the function coef() can

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

Two types of predictions:

the prediction of points on the regression line and

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

> predict(jour_lm, newdata = data.frame(citeprice = 2.11),

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

lciteprice <- seq(from = -6, to = 4, by = 0.25)

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

Figure: Scatterplot with prediction intervals for the journals data

Introduction to Estimation/Computing Environment -II23 December 2012

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

Figure: Diagnostic plots for the journals data

Introduction to Estimation/Computing Environment -II23 December 2012

Testing a linear hypothesis

The standard regression output as provided by summary() only indicates

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

Testing a linear hypothesis

> linear.hypothesis(jour_lm, "log(citeprice) = -0.5")

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

Multiple linear regression

The model of interest is

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

Multiple linear regression

Introduction to Estimation/Computing Environment -II23 December 2012

Rajat Tayal (IIT Kanpur)

Introduction to Estimation/Computing Environment -II23 December 2012

> cps_noeth <- update(cps_lm, formula = . ~ . - ethnicity)

the residuals are = y y

For cross-section regressions, the assumption Var (i |xi ) = 2 is