WQU - Econometrics - Module2 - Compiled Content
WQU - Econometrics - Module2 - Compiled Content
3
2.2 Module Breakdown ...................................................................................................................................4
Econometrics is the second course presented in the WorldQuant University (WQU) Master
of Science in Financial Engineering (MScFE) program. In this course, you will apply
statistical techniques to the analysis of econometric data. The course starts with an
introduction to the R statistical programming languages that you will use to build
econometric models, including multiple linear regression models, time series models, and
stochastic volatility models. You will learn to develop programs using the R language,
solve statistical problems, and understand value distributions in modeling extreme
portfolio and basic algorithmic trading strategies. The course concludes with a review on
applied econometrics in finance and algorithmic trading.
Upon completion of the Econometrics course, you will be able to:
1 Basic Statistics
2 Linear Models
3 Univariate Time Series Models
4 Univariate Volatility Modeling
5 Multivariate Time Series Analysis
6 Introduction to Risk Management
7 Algorithmic Trading
In Module 2, we extend the aspects of basic statistics and introduce the concepts of
stationarity, unit root, and linear model (such as regression and Logit). We will also
explore how to implement a basic algorithmic trading strategy using predictions from a
linear model.
There are four assumptions in total. I will now explain some of the assumptions in detail.
Before I turn to the other assumptions of the standard model, it is convenient and
standard to write the model of interest in matrix form. If we define the row vector:
𝑥𝑖 = [1 𝑋𝑖1 𝑥𝑖2 … 𝑥𝑖𝑘 ] and the column vector 𝛽 = [𝛽0 𝛽1 𝛽2 … 𝛽𝑘 ]′ we can rewrite the
equation as:
𝑦𝑖 = 𝑥𝑖 𝛽 + 𝜀𝑖 .
𝑦 = 𝑋𝛽 + 𝜀.
The goal here is to find the 𝛽 vector. For the moment, we will assume that the relationship
is exact, which means that 𝜀 = 0. We can use simple linear algebra to recover 𝛽. 𝑋 is an
[𝑛 × (𝑘 + 1)] matrix. First, we pre-multiply by the transpose of 𝑋, which gives us:
𝑋 ′ 𝑦 = 𝑋 ′ 𝑋 𝛽.
(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦 = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑋𝛽
𝛽 = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦.
Thus, we have a closed form expression for the parameters of the model. This is, however,
only possible when (𝑋 ′ 𝑋)−1 is well-defined. This requires that 𝑋 ′ 𝑋 has linearly
independent rows and columns, which in turn requires that 𝑋 has linearly independent
columns. This leads to the second assumption.
Following the same steps as above, but now with errors, yields:
̂ |𝑋) = 𝛽
𝔼(𝛽 𝑂𝐿𝑆
Applying 𝔼(𝛽
̂ |𝑋) to the estimator:
𝑂𝐿𝑆
we see that the OLS estimator is unbiased if, and only if, 𝔼(ε|𝑋) = 0. This outcome is called
strict exogeneity: we say that X is strictly exogenous with respect to ε. It implies that no
element of X is correlated with any element of ε, or more simply, that these two objects
share no common information.
𝔼(𝜀𝑖2 ) = 𝜎 2 ∀𝑖
𝔼(𝜀𝑖 𝜀𝑗 ) = 0∀𝑖 ≠ 𝑗.
Recall that this setup is for a cross-section. Homoscedasticity says that there is no
correlation in the errors across observational units (say, stock prices at a given moment).
If this had been a univariate time series, we would have considered the “no serial
correlation” condition. That is, each error must be uncorrelated with its own past/future.
In a panel context, we would consider both conditions. To learn more about each of these
assumptions, study the provided notes on this subject.
Many models in econometrics assume a linear relationship between a dependent variable,
𝑌, and one (or more) independent variables, 𝑋. The independent variable can also be
referred to as the explanatory variable, regressor, predictor, stimulus, exogenous, covariate
or control variable. Similarly, the dependent variable, 𝑌, can also go by many names,
including endogenous, predicted, response, outcome and controlled variable:
𝑌 = 𝛽1 + 𝛽2𝑋 + 𝑢,
where 𝑢, known as the disturbance term (or error term), is a random stochastic variable
that has well-defined probabilistic properties. The disturbance term, 𝑢, may well represent
all those factors that affect consumption but are not considered explicitly.
𝑌 = 200 + 2 𝑥.
For example, if advertising increases by $10 000 (one unit of 𝑥), then sales will
increase by 2 times $10 000 = $20 000. This is rather a simple model.
The expected value of 𝑌 conditional upon 𝑋 reflects the population values (infinite
sampling) for 𝑎 and 𝑏.
𝑅 2 is the goodness of fit we shall define later. Note that the original graph was produced in
Excel, which is handy for creating trend lines in graphs.
A forecaster wants to predict the USD/CAD exchange rate over the next year. He believes
an econometric model would be a good method to use and has researched the various
factors that he thinks affect the exchange rate. From his research and analysis, he
concludes the factors that are most influential are:
• The interest rate differential between the U.S. and Canada (INT),
• The difference in GDP growth rates (GDP), and
• The income growth rate (IGR) differences between the two countries.
We won't go into the detail of how the model is constructed, but after the model is made,
the variables INT, GDP, and IGR can be plugged into the model to generate a forecast. The
coefficients 𝑎, 𝑏, and 𝑐 will determine how much a certain factor affects the exchange rate
as well as the direction of the effect – i.e. whether it is positive or negative. You can see
that this method is probably the most complex and time-consuming approach of the ones
discussed so far. However, once the model is built, new data can be easily acquired and
plugged into the model to generate quick forecasts.
The testing procedure for the ADF test is the same as for the Dickey-Fuller test but it is
applied to the model:
where 𝛼 is a constant, 𝛽 the coefficient on a time trend, and 𝑝 the lag order of the
autoregressive process.
The unit root test is then carried out under the null hypothesis 𝛾 = 0 against the
alternative hypothesis of 𝛾 < 0. Once a value for the test statistic
𝛾̂
D𝐹𝜏 =
𝑆𝐸(𝛾̂)
is computed, it can be compared to the relevant critical value for the Dickey-Fuller test.
If the test statistic is less than1 the (larger negative) critical value, then the null hypothesis
of 𝛾 = 0 is rejected and no unit root is present.
There are different values for the critical DF values depending on the type of test that is
being conducted. Some tests do not include a trend, 𝑡, for example. DF critical values table
below:
1
This test is non-symmetrical so we do not consider an absolute value.
500 -3.44 -3.13 -2.87 -2.57 -0.43 -0.07 0.24
0.61
>500 -3.43 -3.12 -2.86 -2.57 -0.44 -0.07 0.23
0.60
Notes
• The null hypothesis of the Augmented Dickey-Fuller is that there is a unit root,
with the alternative that there is no unit root. If the 𝑝-value is above a critical size,
then we cannot reject that there is a unit root.
• The 𝑝-values are obtained through regression surface approximation from
MacKinnon (1994) but using the updated 2010 tables. If the 𝑝-value is close to
significant, then the critical values should be used to judge whether to accept or
reject the null.
Testing for mean reversion
A continuous mean-reverting time series can be represented by an Ornstein-Uhlenbeck
stochastic differential equation:
where 𝜃 is the rate of reversion to the mean, 𝜇 is the mean value of the process, 𝜎 is the
variance of the process, and 𝑊𝑡 is a Wiener Process or Brownian Motion. In a discrete
setting, the equation states that the change of the price series in the next time period is
proportional to the difference between the mean price and the current price, with the
addition of Gaussian noise. This property motivates the Augmented Dickey-Fuller test,
which we will describe below.
Augmented Dickey-Fuller (ADF) is based on the idea of testing for the presence of a
unit root in an autoregressive time series sample. It makes use of the fact that if
a price series possesses mean reversion, then the next price level will be proportional to
the current price level. A linear lag model of order 𝑝 is used for the time series:
where 𝛼 is a constant, 𝛽 represents the coefficient of a temporal trend, and 𝛥𝑦𝑡 = 𝑦(𝑡) −
𝑦(𝑡 − 1). The role of the ADF hypothesis test is to consider the null hypothesis that 𝛾 = 0,
which would indicate (with 𝛼 = 𝛽 = 0) that the process is a random walk and thus non-
mean reverting. If the hypothesis that 𝛾 = 0 can be rejected, then the following movement
of the price series is proportional to the current price, thus it is unlikely to be a random
walk.
So, how is the ADF test carried out? The first task is to calculate the test statistic (D𝐹τ ),
which is given by the sample proportionality constant γ̂, divided by the standard
error of the sample proportionality constant:
̂
γ
D𝐹τ = 𝑆𝐸(γ̂).
Dickey and Fuller have previously calculated the distribution of this test statistic, which
allows us to determine the rejection of the hypothesis for any chosen percentage critical
value. The test statistic is a negative number and thus in order to be significant beyond
the critical values, the number must be more negative than these values – i.e. less than
the critical values.
A key practical issue for traders is that any constant long-term drift in a price is
of a much smaller magnitude than any short-term fluctuations, and so the drift is often
assumed to be zero (𝛽 = 0) for the model. Since we are considering a lag model
of order 𝑝, we need to actually set 𝑝 to a particular value. It is usually sufficient, for trading
research, to set 𝑝 = 1 to allow us to reject the null hypothesis.
Here is the output of the Augmented Dickey-Fuller test for Google over the period. The
first value is the calculated test-statistic, while the second value is the 𝑝-value. The fourth
is the number of data points in the sample. The fifth value, the dictionary, contains the
critical values of the test-statistic at the 1, 5, and 10 percent values respectively.
(-0.12114782916582577, 0.94729096304598859, 0, 3616, {'1%': -3.432159720193857, '5%': -
2.8623396332879718, '10%': -2.56719565730786}, 25373.287077939662)
Since the calculated value of the test statistic is larger than any of the critical values at
the 1, 5, or 10 percent levels, we cannot reject the null hypothesis of 𝛾 = 0 and thus we are
unlikely to have found a mean reverting time series. An alternative means of identifying a
mean reverting time series is provided by the concept of stationarity, which we will now
discuss.
Stationarity
Financial institutions and corporations, as well as individual investors and researchers,
often use financial time series data (such as asset prices, exchange rates, GDP, inflation,
and other macroeconomic indicators) in economic forecasts, stock market analysis, or
studies of the data itself.
However, refining data is crucial in stock analysis. It is possible to isolate the data points
that are relevant to your stock reports. Stationarity means that the mean, variance, and
intertemporal correlation structure remains constant over time. Non-stationarities can
either come from deterministic changes (like trend or seasonal fluctuations) or the
stochastic properties of the process (if, for example, the autoregressive process has a unit
root, that is one of the roots of the lag polynomial is on the unit circle). In the first case, we
can remove the deterministic component by de-trending or de-seasonalization.
In the case of a random walk with a drift and deterministic trend, detrending can remove
the deterministic trend and the drift, but the variance will continue to go to infinity. As a
result, differencing must also be applied to remove the stochastic trend.
Using non-stationary time series data in financial models produces unreliable and
spurious results and leads to poor understanding and forecasting. The solution to the
problem is to transform the time series data so that it becomes stationary. If the non-
stationary process is a random walk with or without a drift, it is transformed to stationary
process by differencing. On the other hand, if the time series data analyzed exhibits a
deterministic trend, the spurious results can be avoided by detrending. Sometimes, the
non-stationary series may combine a stochastic and deterministic trend at the same time.
To avoid obtaining misleading results, both differencing and detrending should be applied,
as differencing will remove the trend in the variance and detrending will remove the
deterministic trend.
In finance, serial correlation is used by technical analysts to determine how well the past
price of a security predicts the future price.
∑𝑇𝑡=2(𝑒𝑡 − 𝑒𝑡−1 )2
𝑑= ,
∑𝑇𝑡=1 𝑒𝑡2
where 𝑒 is the residual from the regression at time 𝑡, and 𝑇 is the number of
observations in a time series. The critical values of the DW table can be found at a number
of websites, such as http://www.stat.ufl.edu/~winner/tables/DW_05.pdf. Note that most
DW tables assume a constant term in the regression and no lagged dependent variables.
• If 𝑑 < 𝑑𝐿,𝑎 there is statistical evidence that the error terms are positively auto-
correlated.
• If 𝑑 > 𝑑𝑈,𝑎 there is no statistical evidence that the error terms are positively
auto-correlated.
• If 𝑑𝐿,𝑎 < 𝑑 < 𝑑𝑈,𝑎 the test is inconclusive.
Positive serial correlation is the serial correlation in which a positive error for one
observation increases the chances of a positive error for another observation.
auto-correlated.
• If (4 − 𝑑) > 𝑑𝑈,𝑎 there is no statistical evidence that the error terms are negatively
auto-correlated.
• If 𝑑𝐿,𝑎 < (4 − 𝑑) < 𝑑𝑈,𝑎 the test is inconclusive.
Positive serial correlation is a time series process in which positive residuals tend to be
followed over time by positive error terms and negative residuals tend to be followed over
time by negative residuals. (Positive serial correlation is thought to be more common than
the negative case.)
Negative serial correlation example:
Let’s say instead that the computed regression DW stat was 1.45. This is the “inconclusive”
area of the DW analysis and we cannot say one way or the other. Suppose the DW statistic
from the regression was 1.90; then we do not reject the null hypothesis that there exists no
serial correlation.
Homoscedasticity:
Heteroscedasticity:
Var (ε|x1 , … , xk ) = σ2 .
For heteroscedasticity:
Var(ε|𝑥1 , … , 𝑥𝑘 ) = σ2 f(𝑥1 , … . , 𝑥𝑘 ).
For example:
Note that only the first and forth regressors are implicated in the heteroscedasticity.
In some instances, only one regressor may be responsible for the heteroscedasticity so we
can observe the relationship graphically.
For heteroscedasticity,
Var(ui | Xi ) = σ2i .
σ1 , 0, 0, 0
0, σ2 , 0, 0
0, 0, σ3 , 0
0, 0, 0, σ4
In the bivariate relationship in the graph below, the error terms increase as 𝑥 increases.
The problem here is that the heteroscedasticity may not be the result of one variable but
instead from a combination of the regressors:
The White test states that if disturbances are homoscedastic then squared errors are, on
average, constant. Thus, regressing the squared residuals against explanatory variables
should result in a low 𝑅 squared.
Steps of the White test:
a. Regress 𝑌 against your various explanatory variables using OLS.
b. Compute the OLS residuals, 𝑒1 … 𝑒𝑛 , and square them.
c. Regress squared residuals 𝑒12 against a constant, all of the explanatory variables,
their squares, and possible interactions (𝑥1 times 𝑥2 ) between the explanatory
variables (𝑝 slopes total).
d. Compute 𝑅 2 from (c)
e. Compare n𝑅 2 to the critical value from the Chi-squared distribution with 𝑝 degrees
of freedom.
The Breusch-Pagan Lagrange-Multiplier (LM) test is the same as the White test except
that the econometrician selects the explanatory variables to include in the auxiliary
equation. This may, or may not, result in a more powerful test than the White test.
Again, apply the Chi-squared test but 𝑝 = 4. Compare n𝑅 2 to the critical value from the
Chi-squared distribution with 𝑝 degrees of freedom.
The Breusch-Pagan F test is as follows: Suppose the auxiliary equation being tested is
Graphing the squared errors against a suspected cause of the heteroscedasticity may
reveal the nature of the heteroscedasticity. In the case below, the variance of the error
terms increases as the exogenous variable, 𝑥, increases:
Graphical inspection may help to identify situations in which an outlier causes a test to
show erroneously the presence of heterogeneity as in the graph below:
In this case, the squares of the errors are fairly constant, except for the one outlier.
Multicollinearity is a condition where two or more independent variables are
strongly correlated with each other. In an extreme case, called perfect
multicollinearity, one may be a multiple of the other 𝑧 = 3 ∗ 𝑥 or variable 𝑧 is exactly 3
times variable 𝑥. OLS cannot estimate both 𝑧 and 𝑥.
• Problem: 𝑥1 = 2 + 3 ∗ 𝑥2. Does perfect multicollinearity exist?
• Problem: you have two measures of gold produced in Chile over a span of years. In
one year, data assert that 1 000 kg were produced. Another data source states that
for the same year 2 204.62 pounds were produced. (There are 2.20462 pounds in a
kilogram.)
• The variables may measure the same concepts.
• The existence of multicollinearity is not a violation of the OLS assumption.
Detecting multicollinearity
1 Check for the correlation between the predictor variables. If it is high, this may be a
MC warning.
2 Construct VIF (variance inflation factor) by regressing a predictor variable against
all the other predictor variables you are using. If it is greater than 5, this may be
another sign of MC.
Correcting
1 Remove one of the offending variables if the other can stand in for it. You
have to do this if there exists perfect multicollinearity.
2 Standardize the predictor variables.
1
𝑉𝐼𝐹𝑖 = .
(1– 𝑅𝑖2 )
Regressions are:
𝑥1 = ɣ1 + ɣ 12 𝑥2 + ɣ13 𝑥3
𝑥2 = ɣ2 + ɣ21 𝑥1 + ɣ23 𝑥3
𝑥3 = ɣ3 + ɣ31 𝑥1 + ɣ32 𝑥2
Example:
R2 = .2
VIF = 1/(1 − .2) = 1/.8 = 1.25.
In contrast, if the Pearson product moment correlation coefficient is high, say .
95, VIF is
1/(1 − .9) = 1/(. 05) = 20.
Diagnostics
• As a rule of thumb, collinearity is potentially a problem for values of VIF > 10
(sometimes VIF > 5).
• The average VIF is the average VIFi across the K independent variables. If the VIFs
are .2, .4, .1, and .8, the average VIF is (.2 + .4 + .1 + .8)/ 4 = .375.
• Rule of thumb: an average VIF > 1 indicates multicollinearity.
• The square root of the VIF considers how much larger the standard error is as
compared to the situation when variables were not correlated with other predictor
variables,. For example, if VIFi = 6 then sqrt(VIFi) = 2.446. So, the standard error for
the coefficient of that predictor variable is 2.446 times as large as it would be if that
(ith) predictor variable were uncorrelated with the other predictor variables.
• Tolerance: Tolerance (𝛽i) = 1/VIFi = 1 − R2i .
• If a variable has a VIF = 10 then the (low) tolerance = 1/10 = .1, indicates MC, whereas
VIF = 1 indicates a high tolerance of 1/1 = 1.
Condition number
The condition number (often called kappa) of a data matrix is κ (X) and measures the
sensitivity of the parameter estimates to small changes in the data. It is calculated by
taking the ratio of the largest to the smallest singular values from the singular
value decomposition of X. Kappa is
𝑚𝑎𝑥𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒(𝑋 𝑇 𝑋)
√( ).
𝑚𝑖𝑛𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒(𝑋 𝑇 𝑋)
A condition number above 30 is considered to be an indication of multicollinearity but
some say as low as 15. (Again, these are rules of thumb).
Many research papers use the natural logarithm (ln) of macroeconomic and financial
variables instead of the simple value in the analysis. I will present the benefits of using ln
in data analysis below.
Example 1
We want to see the relation between GDP and M1 money supply in the U.S.
Variables:
M1 for the United States, Index 2009=100 of (National Currency), Annual, Not Seasonally
Adjusted
Real Gross Domestic Product, Index 2009=100 of (Index 2009=100), Annual, Not Seasonally
Adjusted
Frequency: Annual data
Period: 1980 – 2014
Source: FRED database
According to Monetary Theory:
When M1 increases, the GDP increases;
When M1 decreases, the GDP decreases.
Now, we calculate this connection using a linear regression in R. GDP will be the
dependent variable while M1 will be the independent variable.
GDP = coefficient*M1.
Coefficient must be estimated in the linear model.
We use ln values in the model as they have a much smoother evolution as compared to
raw variables.
R code:
#gdp and m1 are introduced as arrays
linear_regression=lm(gdp~m1+0)
linear_regression
summary(linear_regression)
File: click here to download R_code_GDP_M1 that can be found in the additional files
folder
It is very frequently the case that we use residuals and predicted values in data analysis.
The analysis of regression residuals may give us some information regarding the
accuracy of predictions, or the type of model used. If the regression residuals are normally
distributed, then the prediction intervals are considered to be accurate. If the
regression residuals are not normal, then the prediction intervals are
inaccurate.
If the residuals have a random pattern, a linear regression is a good fit. If the residuals are
non-random and have a U-shape or an inverted U shape, the variables can be modeled
better using a non-linear model.
Python code:
In this video, we will be comparing homoscedastic and heteroscedastic data generating
processes. Homoscedasticity means that the variance-covariance matrix of the
unexplained part of your variable of interest, 𝜀, is a constant diagonal matrix in the
population. One of the assumptions of the classic linear regression model is
homoscedasticity, or in other words, the same dispersion of errors. Heteroscedasticity, on
the other hand, is when unequal dispersion of errors occurs.
Process 1 is homoscedastic:
𝑦𝑖 = 2 + 0.6𝑥𝑖 + 𝜀𝑖
𝜀𝑖 ~𝑁(0, 𝜎 2 )
Process 2 is heteroscedastic:
𝑦𝑖 = 2 + 0.6𝑥𝑖 + 𝜀𝑖
𝜀𝑖 ~𝑁(0, (1 + 𝑥𝑖 )𝜎 2 )
Note that the variance of the error increases with the size of 𝑥𝑖 .
When we compare scatter plots of data from Process 1 to Process 2, there can clearly be
seen that Process 2 as an unequal dispersion of errors.
Notice in the left-hand graph how the variability of the error around the true expected
value (grey line) is constant across 𝑋. Most of the points are close to the line and the
dispersion around the line does not obviously vary with 𝑋.
Notice in the right-hand graph how the variability of the error around the true expected
value (grey line) increases with 𝑋. Most of the points are still close to the line, but the
dispersion around the line obviously increases with 𝑋.
In this simulated example it is clear to see that there is enough data to accurately identify
the true relationship.
When there is a very large data set (𝑛 = 1000) it is obvious that there is enough
information to accurately infer. Here you see an OLS estimate (cyan) and a GLS estimate
(magenta) of the true slope, but they are so similar you can only see the top one. Both are
consistent, so with this amount of data they will give equally reliable inference.
When we have a very small data set (𝑛 = 20), the estimates can be quite different (OLS
estimates are cyan, GLS estimates are magenta). However, at such a small sample, one
would be unwilling to trust either model.
You might ask: Why does heteroscedasticity matter? It is about how much information
there is in a specific observation.
Under homoscedasticity:
Under heteroscedasticity:
𝑒 α+𝛽x
P = Pr(𝑦 = 1|𝑥) =
1 + 𝑒 α+𝛽x
1
1 − P = Pr(y = 0|x) =
1 + 𝑒 α+𝛽x
To derive the logit, the odds of success to failure are taken as follows:
P
Odds = = eα+𝛽x .
1−P
P
ln(odds) = ln = α + 𝛽x.
1−P
Financial assets have a stochastic behaviour. Predicting their trend is a challenge for
quantitative analysts. The logistic regression can be used for predicting market trends as
it provides a binary result. This type of regression is a probabilistic model which assigns
probability to each possible event.
Dow Jones Index (DJI) trend is predicted using logistic regression implemented
in R.
Algorithm:
1 DJI data is extracted from Yahoo Finance
2 Different indicators are calculated (moving average, standard deviation, RSI and
MACD, Bollinger band)
5 Data is standardized in order to avoid the higher scaled variables which have a
higher impact on the results
Standardized data = (X-Mean(X))/Std(X)
Mean and standard deviation is calculated for each column
library("quantmod")
getSymbols("^DJI",src="yahoo")
dow_jones<- DJI[,"DJI.Close"]
dow_jones
average10<- rollapply(dow_jones,10,mean)
average10
average20<- rollapply(dow_jones,20,mean)
average20
std10<- rollapply(dow_jones,10,sd)
std20<- rollapply(dow_jones,20,sd)
rsi5<- RSI(dow_jones,5,"SMA")
rsi14<- RSI(dow_jones,14,"SMA")
macd12269<- MACD(dow_jones,12,26,9,"SMA")
macd7205<- MACD(dow_jones,7,20,5,"SMA")
bollinger_bands<- BBands(dow_jones,20,"SMA",2)
direction<- NULL
direction[dow_jones> Lag(dow_jones,20)]<- 1
direction[dow_jones< Lag(dow_jones,20)]<- 0
dow_jones<-
cbind(dow_jones,average10,average20,std10,std20,rsi5,rsi14,macd12269,m
acd7205,bollinger_bands,direction)
dimension<- dim(dow_jones)
dimension
issd<- "2010-01-01"
ised<- "2014-12-31"
ossd<- "2015-01-01"
osed<- "2015-12-31"
isrow<- which(index(dow_jones) >= issd& index(dow_jones) <= ised)
osrow<- which(index(dow_jones) >= ossd& index(dow_jones) <= osed)
isdji<- dow_jones[isrow,]
osdji<- dow_jones[osrow,]
isme<- apply(isdji,2,mean)
isstd<- apply(isdji,2,sd)
isidn<- matrix(1,dim(isdji)[1],dim(isdji)[2])
norm_isdji<- (isdji - t(isme*t(isidn))) / t(isstd*t(isidn))
dm<- dim(isdji)
norm_isdji[,dm[2]] <- direction[isrow]
formula<- paste("direction ~ .",sep="")
model<- glm(formula,family="binomial",norm_isdji)
The Linear Probability Model (LPM) is an easier alternative to Logit or Probit. Suppose the
dependent variable can only assume values of 0 and 1. What if we still want to use a
multiple linear regression model?
Yi = b0 + b1 X1i + b2 X2i + ui
̅̅̅̅
bj = 1,2
bj - coefficient
Predicted probability for b∗0 + b1∗ X1i + b∗2 X2i can be outside of the range 0, 1.
Example:
Linear probability model, Home Mortgage Disclosure Act (HMDA) data
Mortgage denial v. ratio of debt payments to income (P/I ratio) in a subset of the
HMDA data set (n = 127)
Source: people.ku.edu/~p112m883/pdf/Econ526/Ch11Part1_slides.docx
Example:
Y = −0.40 + 0.09X1 + 0.14X2
You can reproduce a version of the results with the following code. Since we are using
random number generators, your results will differ slightly. This is a good time to consider
the impact that sampling variation may have, even under the best of circumstances. All
variables are normally distributed and independent. Note that we add a fourth unrelated
random variable to the regression to show an example of an insignificant estimate.
# generate variables:
x1 <- rnorm(100)
x2 <- rnorm(100)
x3 <- rnorm(100)
x4 <- rnorm(100)
epsilon <- 0.8*rnorm(100)
y = 0.5 + 0.5*x1 - 0.3*x2 + 0.4*x3 + epsilon
We will be interpreting and evaluating the different sections of the estimation output to
reach a conclusion about the model.
Consider the following ordinary least squares estimation output:
Coefficients:
---
Coefficients:
---
Coefficients:
---
Coefficients:
---
Coefficients:
---
Coefficients:
---
Is 42% of the variation explained enough? It depends on the context. In the cross section, it
is reasonably acceptable. In time series it would be far too low in many applications.
Just like the t-tests evaluate the statistical significance of each explanatory variable
individually, the F-test evaluates the joint statistical significance of all explanatory
variables (other than the constant). It has a degree of freedom, namely 4,95.
The 4 is the 4 zero restrictions we place on the model, that is the slope coefficients are all
zero. The 95 refers to the sample size of 100 minus the 5 parameters estimated in the
model – 4 slope coefficients and 1 constant. The p-value is the probability of obtaining an
F-statistic as large as we did if the true F-value is zero. We clearly reject this and the
hypothesis that the slope coefficients are all zero.
Coefficients:
---
Almost. While most of the variables are statistically significant, x4 is not. In order to get a
model as parsimonious as possible, we should re-estimate the model without x4. If all the
diagnostic tests remain acceptable, that would be an improved model for
forecasting.
This video showed you how to evaluate regression output. In the next section we will be
looking at the generalized linear model.
A Taylor rule is a monetary-policy rule that stipulates how much the central bank should
change the nominal interest rate in response to changes in inflation, output, or other
economic conditions.
Several decades ago, John Taylor proposed an equation for indicating how Fed should
adjust its interest rate considering the evolution of inflation and output. Nowadays, the so-
called “Taylor rule” has dozens of versions that show the connection between the Central
Bank interest rate and different macroeconomic indicators.
A Taylor rule can be written as following:
it = r ∗ + πT + αx xt + απ (πt − πT ),
where:
it – federal funds interest rate in U.S.
πT – targeted level of average inflation (Taylor assumed it to be 2%)
r ∗ – equilibrium interest rate (Taylor assumed it to be 2%)
xt – output gap (the difference between GDP and potential GDP)
Potential GDP – the output level that does not generate inflationary pressures.
We want to implement a Taylor rule for the U.S. economy using a multiple regression.
r - Effective Federal Funds Rate, Percent, Quarterly, Not Seasonally Adjusted
x – matrix which contains 2 elements
The first component:
The output gap
• Log(Gross Domestic Product, Billions of Dollars, Quarterly, Seasonally Adjusted
Annual Rate)
• Log(Real Potential Gross Domestic Product, Billions of Chained 2009 Dollars,
Quarterly, Not Seasonally Adjusted)
Example 1:
We use the potential GDP from FRED database in order to calculate the output gap.
Results:
Dependent variable: y (Federal Funds rate from U.S.)
Independent variables:
x1 – the difference between registered inflation and the desired inflation
Consumer Price Index for All Urban Consumers: All Items, Percent Change from
Year Ago, Quarterly, Seasonally Adjusted – desired inflation rate (assumed 2%)
x2 – output gap
t-test
Null hypothesis: The estimated coefficient is zero
Alternative hypothesis: The estimated coefficient is not zero
x1 coefficient: t is -3.428
x2 coefficient: t is 4.822
constant: 2.809
The low probability values indicate that estimated coefficients are statistically different
from zero.
Skew: 0.717
Kurtosis: 2.748
An assumption of the linear regression is that the residuals are normally distributed.
Skewness should be close to zero while kurtosis should be close to 3 when the residuals
are normally distributed. JB test indicates whether the residuals are normally distributed
or not. A JB close to zero indicates that residuals come from a normal distribution. In our
case, kurtosis is close to the desired value while skewness is not close to the zero value.
JB is not zero. p-value is higher than 0.05, therefore we cannot reject the null hypothesis
of normally distributed residuals.
Example 2:
We calculate potential GDP using a Hodrick-Prescott (HP) statistical filter.
HP filter:
• GDP is separated into a cyclical and a trend component by minimizing a loss
function
• Takes into consideration time series data (U.S. GDP) and a parameter
lambda – Hodrick-Prescott smoothing parameter. We have to
indicate lambda’s value before the filter calculates the new values. A value of 1600
is suggested for quarterly data. Ravn and Uhlig suggest using a value of 6.25
(1600/256) for annual data and 129600 (1600*81) for monthly data.
>>> gdp2=[9.213436, 9.237790, 9.245457, 9.256489, 9.259902, 9.272225, 9.272329,
9.278121, 9.290482, 9.299706, 9.309018, 9.315043, 9.326353, 9.338795, 9.360922,
9.377278, 9.391695, 9.407665, 9.422844, 9.438448, 9.458270, 9.470710, 9.488381,
9.501636, 9.521414, 9.532409, 9.540255, 9.551544, 9.563333, 9.576531, 9.586699,
9.594602, 9.593451, 9.603260, 9.605284, 9.585339, 9.573865, 9.570836, 9.573879,
9.586480, 9.594316, 9.608351, 9.619645, 9.631036, 9.631574, 9.646070,
9.654199, 9.666834, 9.677622, 9.686245, 9.697011, 9.700912, 9.711261, 9.718314,
9.733429, 9.745564, 9.743554, 9.760091, 9.773105]
>>> import statsmodels.tsa.filters as filter
>>> filter.hpfilter(gdp2, 1600)
t-test
Null hypothesis: The estimated coefficient is zero
Alternative hypothesis: The estimated coefficient is not zero
x1 coefficient: t is 3.958
x2 coefficient: t is 1.474
constant: t is 8.293
For x1 coefficient and the constant p-value is very small (0.00) which indicates that the
coefficients are statistically different from zero. As for x2 coefficient p-value is also small
(0.146) although higher as compared to the first example.
Skew: 0.184
Kurtosis: 2.179
Skew and kurtosis are close to the desired values while the probability is high which
means that we cannot reject the hypothesis that the residuals are normally distributed.
Taylor rules show if the federal funds rate in a certain period is calibrated adequately
considering the inflationary pressures in the economy. We calculate the federal funds rate
values indicated by the Taylor rules determined in this chapter.
Example 1:
r_Taylor_rule1=0.6859x1+0.3360x2+0.0191
We know x1, x2, and the constant and we can estimate the interest rate. In this case the
Federal funds rate and Taylor rule 1 and 2 are introduced in Python.
Amemiya, T. (1977). The Maximum Likelihood and the Nonlinear Three-Stage Least
Squares Estimator in the General Nonlinear Simultaneous Equation Model, Econometrica,
45, issue 4, p. 955-68, https://EconPapers.repec.org/RePEc:ecm:emetrp:v:45:y:1977:i:4:p:955-
68.
Horrace, W. C. and Oaxaca, R. L., 2006. "Results on the bias and inconsistency of ordinary
least squares for the linear probability model," Economics Letters, Elsevier, vol. 90(3),
pages 321-327.
Jeet P. and Vats P. (2017). Learning Quantitative Finance. with R. Packt Publishing.
Scott, M. et al. (2013). Financial Risk Modelling and Portfolio Optimization with R. Wiley.
people.ku.edu/~p112m883/pdf/Econ526/Ch11Part1_slides.docx
During each module students will undertake a short case-study assignment, which is then marked by their peers according to a
grading rubric. The question is provided below.
In the following table, the results of 6 models attempt to explain a dependent variable of interest, 𝑦. You may assume that there is
sufficient theoretical reason to consider any or all of the explanatory variables 𝑥1 , 𝑥2 , 𝑥3 and 𝑥1 in a model for 𝑦, but it is unknown
whether all of them are necessary to effectively model the data generating process of 𝑦.
Coefficient p-value Coefficient p-value Coefficient p-value Coefficient p-value Coefficient p-value Coefficient p-value
Constant 0.06906 0.001575 0.07629 0.000336 0.06969 0.00148 0.07697 0.000312 0.07374 0.000307 0.07472 0.000264
𝒙𝟐 0.23038 <0.0001 0.3267 <0.0001 0.23188 <0.0001 0.32874 <0.0001 0.33752 <0.0001 0.34071 <0.0001
F-statistic 42.55 <0.0001 69.81 <0.0001 39.79 <0.0001 46.5 <0.0001 56.51 <0.0001 42.55 <0.0001
Provide a thorough, rigorous analysis of which of the models is the preferred model for the explanatory variable of interest. Your
analysis should include features of each coefficient, each model, and each of the diagnostic statistics. Do NOT analyse them one-by-
one, but by theme as identified in Module 2 of Econometrics. For the preferred model, give an analysis of the likely correlation among
the explanatory variables.
Your answer will be evaluated on overall coverage, logical progression, and style of presentation.