Trading FX Toolz
Trading FX Toolz
Gesina Gorter
1 Introduction 3
1.1 IMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Pairs trading . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Graduation project . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Trading strategy 7
2.1 Introductory example . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Properties of pairs trading . . . . . . . . . . . . . . . . . . . . 15
2.4 Trading strategy . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Cointegration 35
4.1 Introducing cointegration . . . . . . . . . . . . . . . . . . . . . 35
4.2 Stock price model . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Engle-Granger method . . . . . . . . . . . . . . . . . . . . . . 48
4.4 Johansen method . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.5 Alternative method . . . . . . . . . . . . . . . . . . . . . . . . 62
5 Dickey-Fuller tests 65
5.1 Notions/ facts from probability theory . . . . . . . . . . . . . . 66
5.2 Dickey-Fuller case 1 test . . . . . . . . . . . . . . . . . . . . . 71
5.3 Dickey-Fuller case 2 test . . . . . . . . . . . . . . . . . . . . . 76
5.4 Dickey-Fuller case 3 test . . . . . . . . . . . . . . . . . . . . . 82
5.5 Power of the Dickey-Fuller tests . . . . . . . . . . . . . . . . . 89
1
5.6 Augmented Dickey-Fuller test . . . . . . . . . . . . . . . . . . 94
5.7 Power of the Augmented Dickey-Fuller case 2 test . . . . . . . 105
7 Results 133
7.1 Results trading strategy . . . . . . . . . . . . . . . . . . . . . 133
7.2 Results testing price process I(1) . . . . . . . . . . . . . . . . 137
7.3 Results Engle-Granger cointegration test . . . . . . . . . . . . 138
7.4 Results Johansen cointegration test . . . . . . . . . . . . . . . 140
8 Conclusion 143
Bibliography 152
2
Chapter 1
Introduction
1.1 IMC
IMC, International Marketmakers Combination, was founded in 1989. IMC
is a diversified financial company. The company started as a market maker
on the Amsterdam Options Exchange. Apart from its core business activity
trading, it is also active in asset management, brokerage, product develop-
ment and derivatives consultancy. IMC Trading is IMC’s largest operational
unit and has been the core of the company for the past 17 years. IMC
Trading trades solely for its own account and benefit. IMC is active in the
major markets in Europe and the US and has offices in Amsterdam, Zug
(Switzerland), Sydney and Chicago. By trading a large number of different
securities in different markets, the company is able to keep its trading risk
to a minimum.
3
1.2 Pairs trading
History Pairs trading or statistical arbitrage was first developed and put
into practice by Nunzio Tartaglia, while working for Morgan Stanley in the
1980s. Tartaglia formed a group of mathematicians, physicists and com-
puter scientists to develop automated trading systems to detect and make
use of mispricings in financial markets. One of the computer scientists on
Tartaglia’s team was the famous David Shaw. Pairs trading was one of the
most profitable strategies that was developed by this team. With members
of the team gradually spreading to other firms, so did the knowledge of pairs
trading. Vidyamurthy [15] presents a very insightful introduction to pairs
trading.
Pure arbitrage is making risk-less use of mispricing, which is why one could
call this a deterministic moneymaking machine. The most pure form of
arbitrage is profitably buying and selling the exact same security on differ-
ent exchanges. For example, one could buy a share in Royal Dutch on the
Amsterdam exchange at ¿ 25.75 and sell the same share on the Frankfurt
exchange at ¿ 26.00. Because shares in Royal Dutch are inter-exchangeable,
such a trade would result in a flat position and thus risk-less money.
4
expectation is that the mispricing will correct itself, and when this happens
the positions are reversed. The higher the magnitude of mispricing when
positions are put on, the higher the profit potential.
Example To determine if two securities form a pair is not trivial but there
are some securities that are obvious pairs. For example one fundamentally ob-
vious pair is Royal Dutch and Totalfina, both being European oil-producing
companies. One can easily argue that the value of both companies is greatly
determined by the oil price and hence that movements of the two securi-
ties should be closely related to each other. In this example, let’s assume
that historically, the value of one share Totalfina is at 8 times a share Royal
Dutch. Assume at time t0 it is possible to trade Royal Dutch at ¿ 26.00 and
Totalfina at ¿ 215.00. Because 8 times ¿ 26 is ¿ 208, we feel that Totalfina
is overpriced, or Royal Dutch is underpriced or both. So we will sell one
share in Totalfina and buy 8 shares in Royal Dutch, with the expectation
that Totalfina becomes cheaper or Royal Dutch becomes more expensive or
both. Assume at t1 the prices are ¿ 26.00 and ¿ 208, we will have made a
profit of ¿ 215 minus ¿ 208 is ¿ 7. We would have made the same profit if
at t1 the prices are ¿ 26.875 (215 divided by 8) and ¿ 215.00 respectively.
In conclusion, this strategy does not say anything about the true value of
the stocks but only about relative prices. In this example a predetermined
ratio of 8 was used, based on historical data. How to use historical data to
determine this ratio will be discussed in paragraph 2.4.
IMC is already trading a lot of pairs which were found by fundamental anal-
ysis and by applying their trading strategy to historical data (backtesting).
No statistical analysis was made. From trading experience, IMC is able to
make a distinction between good and bad pairs based on profits. IMC has
provided a selection of ten pairs that are different in quality.
5
The main focus of this project will be modeling the relationships between
stocks, such that we can identify a good pair based on statistical analysis
instead of fundamental analysis or backtesting. The resulting relationships
will be put in order of the strength of co-movement and profitability.
Although one could study pairs trading between all sorts of financial instru-
ments, such as options, bonds and warrants, this project focuses on trading
pairs that consist of two stocks.
1.4 Outline
In the next chapter a trading strategy for pairs will be derived, it illustrates
how money is made and what properties a good pair has. In chapter 3
some basics of time series analysis is briefly stated, which we will need for
the concept of cointegration. Chapter 4 discusses cointegration and two
methods for testing, the Engle-Granger and the Johansen method. Also in
this chapter a start is made with an alternative method. The Engle-Granger
method makes use of an unit root test named Dickey-Fuller, the properties of
this unit root test will be derived in chapter 5. The properties of the Engle-
Granger method are found by simulation in chapter 6. IMC has provided 10
pairs for investigation. The results of the trading strategy and cointegration
tests are stated in chapter 7, the pairs are also put in order of profitability
and cointegration. After the conclusions in chapter 8, some suggestions for
alternative trading strategies are made in chapter 9. In this chapter we will
also give some recommendations for further research.
6
Chapter 2
Trading strategy
7
2.1 Introductory example
Assume we have two stocks X and Y that form a pair based on fundamental
analysis. Also available are the closing prices of these stocks dating back 2
years, which form times series {xt }Tt=0 and {yt }Tt=0 as shown in figure 2.1. In
one year there are approximately 260 trading days, so two years of closing
prices form a dataset of approximately 520 observations for each stock.
60
.
.........
... ..... ..
.. . . ........ .............
....................
. . .. ..
.. . . ...
... .....
40 .
... . ..
.....
.
. ......
..............
.
..
............ ................. . .........
.. ....... .... ... ......... .......... .. ...
.... ... . ..... .. ......... .... .. .................
........ ...... .......... ............
¿ ..
...
...
.........................................
. .
.......
..
...
.
..
...
.........
... .
..
.
... .
... . . . . . .
.
....... .................. ............................................
........................................ .. .
............................
........ .
.
......... .... ................
20 ..
...
... . ..
............... ............... ... ........
... ... ..
.................. .... .....
. ......
0
0 100 200 300 400 500
t
Figure 2.1: Times series xt and yt .
The first half of observations are used to determine certain parameters of the
trading strategy. The second half are used to backtest the trading strategy
based on these parameters, i.e., to test whether the strategy makes money
on this pair.
8
If the price processes of X and Y were perfectly correlated, that is if X and
Y changes in the same direction and in the same proportion (for every t > 0,
yt = αxt for some α > 0, so the correlation coefficient is +1), the spread is
zero for all t and we could not make any money because X nor Y are ever
over- or underpriced. However, perfect correlation is hard to find in real life.
Indeed, in this example the stocks are not perfectly correlated, as we can see
in figure 2.2.
.....
... ........
......
3 ...
...............
...... .... .........
..
.... ...... ................... ....
.
........................
. .. .
.. ... ......... .....
... .......... . .... ........ ... ... ....
..... ..... ..... . ....... ...... ...
.
. .... ... .
.. .
... .........
...... ..... .......... .
. . .... .. . . ...... .
.. ..........
......... ......... .... ........ .... ...
¿ 0 ... ......... ...
... ... ........
.. ... ...
................ . ...............
.
........... .............. . ....................... ........
.. ..... . ..... .. ..
. .
.
.
.
.
.....
... . . .......................... ... .... ......
.
. .. ... ...
.
.
.... ... ..
.... .. ........
...... .... .. ....... .......
...... ....... ........ ..
. .
. . . . .
........ .... ... .. . .
.... ...
...
...
......... ......... .. . .
...... . . .
... ... ... .. . .
.
. ... ..... . ... ... .
.
... .... ..
−3 ..............
............
........
....
...
.
−6
0 100 200 300 400 500
t
Figure 2.2: Spread st .
As mentioned before, we like to buy cheap and sell expensive. If the spread
is below zero, stock Y is cheap relative to stock X. The other way around,
if the spread is above zero stock Y is expensive relative to stock X (another
way to put it is that X is cheap in comparison with Y ). So basically the
trading strategy is to buy stock Y and sell stock X at the ratio 1:1.36 if the
spread is a certain amount below zero, which we call threshold Γ. When the
spread comes back to zero, the position is flattened, which means we sell Y
and buy X in the same ratio so there is no position left. In that case, we
have made a profit of Γ. An important requirement is that we can sell shares
we do not own, also called short selling. In summary, we put on a portfolio,
containing one long position and one short, if the spread is Γ or more away
from zero. We flatten the portfolio when the spread comes back to zero. Just
like the average ratio, Γ is determined by the first half of observations. In
this example we determined a Γ of 0.40. The way Γ has been calculated will
be discussed in paragraph 2.4.
9
After determination of the parameters, the trading strategy is applied to the
second half of observations in the dataset. This results in 13 times making
a profit of Γ. In other words, the spread moves 13 times away from 0 with
at least Γ and back to 0. Note that this involves 26 trading instances, since
putting on and flattening a position requires two. Figure 2.3 and table 2.1
shows all 26 trading instances. The profit, made here, is at least 13Γ: We
use closing prices instead of intra-day data, so we do not trade at exactly
−Γ, 0 and Γ as we can see in table 2.1.
....
..........
...... ....
.
........ ...................... .. ........ ...... .
.
. . .
.. .. ...... ..
.. ..... ... ......... ......... ............. ...................
3 . ........ ........
... . .. .
........... ....
.... ........... ..
.... ..
... ...... .............. ....
... ... .
.
..
...
..
. .. . ..
... ........
... .. ...
...
¿ 0 .................................................................................................................................................................................................................................................................................................................................................................................................................
.
. . . . . . . . ............ . . . ...... . ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................... . ....... . . ........ . ....... . . . . ...... . . . . . . . . . . . . . . . . . . .
. ........... ....
. .
.... .. ..
. . . ... ......
... ..
......... . ....
...... ... ..... ..
...... ... .........
−3 ... ......
.....
......
....
.
−6
300 350 400 450 500
t
Figure 2.3: Spread st and Γ.
10
Table 2.1: Trading instances strategy I.
trade t st position (Y ,X) price Y price X profit
1 268 0.69 (-1,+1.36) 31.49 22.63 -
2 282 -0.07 flat 31.37 23.11 0.76
3 284 -0.47 (+1,-1.36) 30.54 22.79 -
4 289 0.01 flat 31.43 23.10 0.48
5 293 0.55 (-1,+1.36) 32.05 23.15 -
6 300 -0.16 flat 32.81 24.23 0.71
7 302 -1.05 (+1,-1.36) 33.57 25.44 -
8 310 0.17 flat 33.56 24.54 1.22
9 311 0.45 (-1,+1.36) 33.58 24.34 -
10 420 -0.30 flat 40.33 29.85 0.75
11 423 -1.15 (+1,-1.36) 40.79 30.82 -
12 428 0.08 flat 43.15 31.65 1.23
13 429 0.65 (-1,+1.36) 43.43 31.44 -
14 432 -0.19 flat 42.60 31.45 0.84
15 434 -0.47 (+1,-1.36) 42.16 31.33 -
16 435 0.04 flat 42.61 31.28 0.51
17 437 0.82 (-1,+1.36) 42.79 30.84 -
18 440 -0.25 flat 44.01 32.52 1.07
19 444 -1.33 (+1,-1.36) 46.53 35.17 -
20 445 0.12 flat 46.17 33.84 1.45
21 446 1.24 (-1,+1.36) 46.32 33.13 -
22 449 -0.17 flat 45.89 33.85 1.41
23 450 -0.63 (+1,-1.36) 45.46 33.87 -
24 467 0.05 flat 46.19 33.91 0.68
25 468 0.48 (-1,+1.36) 47.16 34.31 -
26 519 -0.21 flat 44.95 33.19 0.69
total profit 11.80
11
Rather than closing the position at 0, one could also choose to reverse the
position when the spread reaches Γ in the other direction. Assume we have
sold 1 Y and bought 1.36 X, because the spread was larger than Γ, we could
now wait until the spread reaches −Γ and buy 2 times Y and sell 2 times
1.36 X. As a result, we are now left with a portfolio of long 1 Y and short
1.36 X. This results in one initial trade and 12 trades reversing the position.
Note that the profit of reversing the position is 2Γ, so the total profit is at
least 12 times 2Γ. These trades are shown in table 2.2.
12
... ...
........ ........
...... ..... ...... .....
........ .... .... ........ .... ....
...
f
. . . . . . ... . . ... ............ ...... . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... .
..... ... ... ... .....
f ..... ... ... ...
f
. . . . . . ... . . ... ............ ...... . . . . . . . . . . ..... . . . . . . . . . . . . . . . . . . . . . . ...... ....
... ..... .....
f
. ... ... ...... ... ... ..... ...... . ... ... ...... ... ... ..... ...... ......
....... ... ... ... ... .. ........... ....... ... ... ... ... .. ........... ... .
........ .... ........ ...
...
. ...... ....... ...
.
. ........ .... ........ ...
...
. ...... ....... ... ...
.
.
...... ..... .. . .
. ...... ..... .
. .
. ..
. .. ..... ..... . ....... ... ... ....... . .. ..... ..... . ....... .... ... ....... ...
......... ... .... ... ... ... ... .......... ... .... ... ... ... ... ...
...... ... ....... ..
.. .... .. ... ... .... ...
.... ... ...... . ... ... . ...
..... ... ....
... f .
.. ...
.
...
....... f ..... ...
...
.
..
.
.. .
.
.
.
. .
....... ...
...
... . ... ..... ... .......... ... . .
.. .
... ....... ... .
...
............. ..... ... . . . . .
............ .
. .
.
.. .
...... ... .... ........ ............ ........ .... ...... ... .... .... ........ ............ ........ ...
... ... ... .. .... ... ... ... ... ...... ... ....... ...
... .. ... ..... ..... ... ... . . . .
. .
. .
. ...
... . ... ... ..... . ... ... . ... . . .
.. .......... . ...
......... ... ......... ......... . ... .
.
. .....
......... ... .
. ... ... ......... ........... .
. ... ...
. .. . ..
.. .
. . . . . . . . . . . . . . . . . . . . . . . . ................ . . ... . . . ..... . . . . . . . . . ... . . . . . . .
.
..... ..... ..... .
.
.
. f . . ..
. . . . . . . . . . . . . . . . . . . . . . . . ..................... . ... . . . ..... . . . . . . . . . . . . . . . . ....
.
..... ..... .....
.
.
f .
.
.
.
. ........ ........ . .... .... . ........ ........ . .... ....
........ ........ ..... .. ........ ........ . ..... ..
..... ...................... ..... ......................
. .... .... ..... . .... ..... .....
.. .
. .
... ...
..... .....
. ...... . ......
................ ...............
..... ... ..... ....
... ....... ... .......
.. ... ... ........ .. ... ... .......
..... ...... ..... .....
. ...... ... ......
. ..... ...... ..... .....
. ...... ... ......
.
. .
.... .......... ............ ...... ...... ... ... .... ........ ............ ...... ...... ... ...
f ...
f
. . ... ....................... . . . . ..... . . . . . ................ .......... ..... . . . ... . . . ................ . . . ...... . . ..... . . . . . . . . . .
........ ...... ... ...... ..... ... .... ... ..
f ... ..
f
. . ... ........................ . . . . . . . . . . . ................ .......... ..... . . . . . . . ................ . . . ...... . . . . . . . . . . . ... . .
........ ...... ... ....... ..... ... .... .. ... ..
............. . .... ... .......... ..... .................. ... ....... ... . ............ . .... ........ ..... .................. ....... ... .... .
.. . ........ . ... .. ..... ... ...
.
...... ..... ..... .... .. . ........ . .. ..... ... .
..... ..... ... ....
..
. .. ...... .. ... .. ....
. ... ... ........ ... .......... ... ..... ..
. .. ...... .. .. ....
. ... ..........
.
. .......... ... .....
............. ..... ..... ... ......
. ... . .. ... . ....... .. ...... ............. ..... ..... ......
. ... . . .
. . .......
.... ...........
... .... .. .
....... .
... .... ... ...... . . ........ ...
. ......
. ... .... .
....... .
... .... ...... . . .
........
... ...
... ..... ..
....... .. . .
. . . ..
........... . .. .... . .
. .. .
..... ... . . ... ..... .
........ .
.
...... . ..
........... .. .... .
. .. .
......
..... . ........ ........ .......... .................. .... ........... ... ..... . . ..
........ .
.
... . ..
........ .. ... . .. ..... ... ..
...... ................... ... .... .............. .. ................. ... ...... . .... ....
.....
.. f
.............. . ...
....... .... .... ..
. ... .. ..
... ... f ..... ..
...
f ....
.. .... ......
.
.
f .....
..
. .
..........
. .
. .
.
........ ... ... ..
.. .
.
.
.
. . .
.. ... ..
... ..
.
.
.
.
.
..
......
.
...
. ..........
.
.
...
.. ... ...... ........ . ... ... .. .... ....... . . ... .. ... .............
........ ...
... . ...
.........
... ............. ........ ........ ... .......... ... .
.
......
......... .. .....
... .... .... ......... .. . .
. .
.
... .....
..
.. ......... ...
. ... ..... ...... .... ..
..
..
. .
... .... ... ...
..... ... ... ..... ... ...... ... ... ...... ...
..... ........ .. ... ... ..... ......
... ...... ...
... .... .........
.
. ...
.....
...... ......... .............
........ ...... .
.. .... ....... .......
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........... . . . . . ....
... ........
f
............
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................ .. . . .... . . f
............
........... ...........
............ ............
....... .......
13
2.2 Data
The price data which IMC uses is provided by Bloomberg. Bloomberg is a
leading global provider of data, news and analytic tools. Bloomberg provides
real-time and archived financial and market data, pricing, trading, news and
communications tools in a single, integrated package to corporations, news
organizations, financial and legal professionals and individuals around the
world.
60
.
...........................
.................
55 .....
...
...
...
...
....
...............
... .......
........ .......... .... ..... ..
........................
.
......... .. ................ .
..............
.......... ..... ......
. ........ ........ .....
... .......... ................
.....
¿
... .
50 ...
...
... ..
...... ...
....
.
...
.
.
40
t
14
Example Consider the following the ex-dividend dates and amounts of a
certain stock.
date amount
04/28/2003 1.20
04/30/2004 1.40
04/29/2005 1.70
15
Table 2.3: Calculation of closing prices corrected for dividend.
Pairs trading is also market neutral: if the overall market goes up 10% it
has no consequences for the strategy and profits of pairs trading. The 10%
loss in the short stock is compensated by a 10% gain in the long stock, and
the other way around if the overall market goes down. We do not have a
preference for up or down movements, we only look at relative pricing.
How to make money with pairs trading was explained in the example in
paragraph 2.1. The amount of money made by trading a pair is a measure
for the quality of a pair. Obviously, more money is better! We make profits
if the spread oscillates around zero often hitting Γ and −Γ. An important
issue for the traders is that the spread should not be away from zero for a
long time. Traders are humans and they tend to get a bit nervous if they
have a big position for a long time. There is a chance that the spread will
never return to zero and in that case it costs money to flatten the position.
16
back. At this time, our portfolio is worth less than when we put it on: the
value of the long position in Y becomes less because Y is getting cheaper
(relative to X) and the value of the short position in X is getting less be-
cause X is more expensive now (relative to Y ). So, if we want to flatten our
portfolio we have to sell Y for less than we bought it and/or buy X for more
money than we sold it.
f .
. . . . . . . . . . . . . . . . . . . . . . . . . ...................... . .......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..... . . ....... ............. ............ ...........
....................... ... ................... .......
.
. ......... ... ....... ...
.... .
. .. . .... ... . .
... .. . . .. . . .
.. ......... . ......... ....
...... ...... ... .. ..... ............ ......
. . ....
f ... .
. ....
. . .................... . . . . . . . . ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........ . . . . . . . . . . . . . . . . .
..
......
f
...
...
.. .
.........
. ...........
......
...
...
...
...
...
... ......
... .......
................. .
.. ................ .
... ...............
. .... .
17
The first half, t = 0, ..., bT /2c, is considered as history and is used to deter-
mine the parameters ratio r̄ and threshold Γ.
The ratio r̄ is the average ratio of Y and X of the first half of observations:
bT /2c
1 X xt
r̄ = .
bT /2c + 1 t=0 yt
The threshold Γ is determined quite easily, we just try a few on the ’history’
and take the one that gives the best profit based on the ’history’. We calculate
the maximum of the absolute spread of the first half of observations, denoted
as m:
m = max (|yt − r̄xt |, t = 0, ..., bT /2c).
t
The values of Γ that we are going to try are percentages of m. Table 2.4
shows the percentages and the outcome for the introductory example of para-
graph 2.1, where m = 2.01. Because of rounding to two digits it looks like
there are several values of Γ which give the same largest profit, but Γ = 0.40
gives the largest profit.
If stn ≥ Γ:
tn+1 = min (t, such that t > tn , st ≤ −Γ).
If stn ≤ −Γ:
tn+1 = min (t, such that t > tn , st ≥ Γ).
18
Table 2.4: Profits with different Γ.
To determine Γ we simply take the one that has the largest profit based
on the history, but in practice we do not take Γ larger than 0.5m. This
profit is a gross profit, no transaction costs are accounted. We neglected
the transaction costs because it turned out they hardly had any influence on
the value of Γ. This is because IMC does not trade one spread, which in
this example was 1 Y and 1.36 X, but they trade a large number of Y and
X, for example 1,000 Y and 1,360 X. The costs that IMC makes consists
of two parts, a fixed amount a plus amount b times the number of traded
stocks. The costs of trading 1,000 Y and 1,360 X would be 2a + 2, 360b. We
always trade the same amount, no matter the value of Γ, so the costs per
trade for all Γ are exactly the same. So the more trades the more costs, but
the costs are really small compared to the profit. When the profits for the
different thresholds are not too close to each other, the Γ when considering
19
the net profits is the same Γ when neglecting the costs. Unfortunately of all
the pairs considered in this report, the pair from table 2.4 is the only one
where accounting transaction costs would have made a difference. There are
three thresholds, 0.30, 0.40 and 1.20, which result in almost the same profits.
Therefor accounting the transaction costs would resulted in the threshold
with the lowest number of trades, Γ = 1.20. In the remainder of this report,
we will neglect transaction costs.
Modified trading strategy There are pairs of stock that work quite well
for a certain time but then the spread walks away from zero and starts to
oscillate around a level different from zero. We can see an example in fig-
ure 2.10. If we do not do anything, we are probably going to have a position
for a long time which is not desirable as explained in paragraph 2.3. The
figure shows us that the relation between the stocks in the pair has changed,
the ratio r̄, determined by the past, is not good anymore. It would be a waste
to lose money on these kind of pairs by closing the position or to exclude
them from trading. A better way is to replace the average ratio r̄ with some
kind of moving average ratio.
5 .. .......
....... .... . . ............
..
... ... ............... ...... .. ... .. ..... .. .. . ....
..... ............. ........................ ...... ....... ... ...
........... .... .... .... ................................... ........... .... ........... ........................ ..... .... .... ..........
... .. ..... ..... ........ ...... ..................... ..... ... ..... ......... ... .... ... .......
.. ........... ........ ........ ... ... ....... .............. .. . ..... .. ........ .. ..
.... .. .........
....... ...... .... ...... .... ..... ......... ...........
........ ...... ...... ............... . .. .. ....
... ........ ... ... ... .
. .....
. .
. .
. .
.
. . . . . . ........ . . ....... . ............ ........ . . ...... . . . . . . . . ...... . . . . . . . . . ..... . .......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. ........ .. .. ........ ...................... ........... . .
............. .
.. .
..........................................................................................................................................................................................................................................................................................................................................................................
0 .. ..
..
. ... .
... ....... ..... ........ ..
. ....... . . . . . . . . . . . . . . . . . . . . ............. . ........ . . . ..... ......... .............. ...... . ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .
..... ... ..... .. . . ... ..
........... ........ ..... ..
.... ........ . ... ..
.. ... ...
... ...
... ..
... ...
.........
.......
....
.
−5
0 100 200 300 400 500
Figure 2.10: Spread oscillates around a new level.
Assume we have a dataset of closing prices, the first half is used in the exact
same way as described before. So we have the average ratio r̄ and threshold Γ.
The backtest on the second half of the data set is slightly different because
we use a moving average ratio r̃t , instead of r̄, to calculate the spread.
20
The moving average ratio we use, is:
# trades κ # trades κ
>15 0 4 6
10-15 1 3 7
8,9 2 2 8
7 3 1 9
6 4 0 10
5 5
If there were a lot of trades in the first half of observations we do not expect
to need a moving average ratio, the table motivates this. The use of a moving
average ratio and this way of determining its value, has some disadvantages
which will be discussed later on.
So the first half of the data set determines three parameters: Average ratio r̄,
threshold Γ and adjustment parameter κ. In the second half of the data set,
the new spread is calculated as:
s̃κ, t = yt − r̃t xt .
Trading the pair goes in the same way as described before, the difference
is the position in X is not equal to r̄ anymore but it is equal to r̃t . The
following example will make this more clear.
21
We take the pair from figure 2.10, available are 520 closing prices of the
two stocks. The first half of observations gives us three parameters:
r̄ = 1.86,
Γ = 0.77,
κ = 5%.
First we look at what the strategy without the modification does on the sec-
ond half of observations, table 2.6 shows the trading instances. Two trades
are made with a total profit of ¿ 1.88. The strategy with the modification
works better, 7 trades with a total profit of ¿ 5.21. Table 2.7 shows all trad-
ing instances. The table also shows that the position in stock X is not longer
constant in absolute sense. For example, with trade number 1 we put on a
position of +1 Y and -1.85 X because r̃t at this time is 1.85. With the second
trade we flatten this position and put on a position the other way around, but
now r̃t is 1.81 so in total we sell 2 shares of stock Y and buy 1.85+1.81=3.66
shares of stock X. The profit of these two trades is calculated with the po-
sition that is flattened, i.e., (51.81-48.70)+1.85*(26.80-28.06)=0.77.
Table 2.7 also shows that not all profits per trade are larger than Γ, one trade
gave a relatively large loss. This happens because the ratio when the position
was put on, differs a lot from the ratio when this position is reversed. The
ratios differ a lot because the actual ratio rt is moving a lot. We can see
all the ratios in figure 2.11. The solid line is the actual ratio rt , the dashed
line is the moving average ratio r̃t and the straight dotted line is the average
ratio r̄.
22
Table 2.7: Trading instances modified strategy.
trade t seκ, t position (Y,X) price Y price X ret profit
1 263 -0.99 (+1,-1.85) 48.70 26.80 1.85 -
2 281 1.07 (-1,+1.81) 51.81 28.06 1.81 0.77
3 358 -0.82 (+1,-1.97) 51.52 26.56 1.97 -2.43
4 392 0.93 (-1,+1.96) 56.38 28.23 1.96 1.57
5 407 -0.94 (+1,-1.98) 55.45 28.52 1.98 1.52
6 459 0.97 (-1,+1.98) 55.27 27.47 1.98 1.92
7 476 -1.31 (+1,-1.99) 57.20 29.38 1.99 1.86
total profit 5.21
2.1
. .. .............
...... ........... ... . ..... .... .. .....
2.0 ......
..........
.
... .. ... .. ..... ... .. ..
... .... .... ....... .... ............ .... .......
. . . .. .......... ... . . ..... .. ..... .
. . .. .. ...
................... ......... .... ....... ... .. ....... ... ............. ..
.... .... .. ........ ................................................................ .............
...... ......... ....... ........ ... ........ ..........
....... ... .
. .. ........... .......................... ....... .............................................. ... .. ... .... ... . .......... ........... ...
......... ........... ......... .... .. ..... .... ..... . ... ....
........ ........... ... ... ... .
......... . ..... ... ......... .........
. ... ...... .
. . ..... ..
............
..............
... ..... ................. ... .. . .......... ...
... ... ... . ....
1.9 ...
. ....... ...........
.
.. .
. .... ..........
.
....... . . . . ...... . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
......... .. ....
... .. .. ..
.... .... ... ....
..... ..........
1.8 .... ...
... ...
... ...
... ...
......... ...
.. .....
...
1.7
300 350 400 450 500
23
Figures 2.12 and 2.13 show the spread calculated with the average ratio r̄
and calculated with the moving average ratio r̃t with κ = 5% respectively.
. .......
...... .. .................
4 .......
..
........
........ ...... ... ...
.... .... ... ..... ....... ........ ...
.. ... ... .......... ...... .....
..... ... .......... . .. .... .... .......
................ ..................... ........... ........ ......... .....
. . . .. .. . . ....
......
4 .....
..................
....... ... .. .. .. . ............... ... .. ...
..............
... .. .. .............. ....... ......... .. .. .. .............. ... .. ...
... ... ... .........
... ....
......... .... ....... ........ ....... ....... ........ ........ ........................ ......... ..... ....... ....... ... ........... ....
...... ........ .
........ .... ... ...... ............ .. . ..... ..... ........... ... .... .... .. ...
.. .... ........ ...... .. .. .....
.......... .... . ...... ..... ..... ....... .......... ..... ... ......
.. ..
2 ...... ..... .......
..... ..... ...............
... ... ... ..
... .... ......... ..........
.. .........
. .. 2 .. ..
... .......
. .
.
.....
.. ....
.. ..
.... .... ....... .. .....
... .... ...
. .
..
.. .... .. ........... .... ........ ... .. . ...
. . .
. . . . .... . . ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... . . ....... ................... ..... .... .............. ......... . . . . . ............ .... . .. . . . . ..... . ....................... . . . . . . . .
. ... .. ... ......... .... .... .... ..... .. . .... .... ..
..
...
.
.. ... ... .. . ............... .. ...... . ...... .... ..... .................. ......... ........ ........ ..... ....
.....
...... ..... ....... .......... .. .. .. ........ ........ .. .. .. ....
0 ............................................................................................................................................................................................
...
...
.
0 ....................................................................................................................................... .........................................................................................
..
.
.
..
..
. .
.
.......
...
... .... ........ .. .... ... .. ..... ........ ..... ..
..... ......... .
. .... ..... .... ........
...
..
. .. ...
... .............
..... . ...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......... . ...... . . . . . . . . . . . . . ......... ............ . . . . . ........ . . ... . .. . ..... . . . ............. ...... . ................
..... .. ...... .. .. . .
. .
. .
..
. .
..
..
. ....
...... ... .... .. ................
... ... ... ... ....
... ... ..
... ..
−2 ... ...
... ...
... ..
−2 ... .
.......
.
............
... ... .......
... ...
....... ..
........
.....
−4 .....
.. −4
300 350 400 450 500 300 350 400 450 500
Figure 2.12: Spread st . Figure 2.13: Spread s̃κ, t , κ = 5%.
From figure 2.10 it is clear that the average ratio r̄ does not fit anymore,
around t = 300 the stocks in the pair get another relation. Replacing the
fixed average ratio r̄ by an moving average ratio r̃t resolves this. As we saw
in the example we can lose money if the moving average ratio, used to cal-
culate the spread, differs a lot between trades. If there is some fundamental
change, such a trade will happen once or twice and the loss that is made will
be compensated by good trades from that moment on. The advantage of
the modified trading strategy is when the relation between stocks in a pair
changes in some fundamental way as in the example above i.e., the spread
is oscillating around a new level, we are still able to trade the pair with a
profit instead of making a loss by closing the position and exclude the pair
from trading.
When there is no such fundamental change but we use the modified strategy,
with κ > 0, it is possible we throw away money with each trade. This hap-
pens if the moving average ratio differs a lot between each two succeeding
trades. We consider an example, suppose we have 520 observations.
24
The first half is used to determine the three parameters r̄, Γ and κ:
r̄ = 1.00,
Γ = 0.62,
κ = 7.
Figures 2.14 and 2.15 show the spread for the second half of observations
calculated with the average ratio r̄, which is the same as κ = 0, and κ = 7
respectively.
.
..
..........
. ........ ..
.. ....................... .. ..................... .
2 . .....
... ............
..................................
.
. .........................
.........
.........
. . .
.......... ..........................................
.
. ....................................... . ....... ...
... ................... ... .... ..
. 2
... ..................... ... ... ..... ... ........ .. .... ....
..... ........................ ... ..................... .... ......... . ... .. ..
.
.. . . ... .
.
..... . . . . . ... .. . . ..... ... .... ... ..
.................. .. . ..................... . . ..
..... . .. .. . .. ...... . .... ............ .......... ...... ....
. . .............................. . . . . . . . . . .... . . . . . . . . . . . . . ..................... . . . . . . . . . ..... . . . . . . . . . . . . . . . . .............................................. ............................................... . . . . . . . . . . . ......................................................................................................... .... ...................... .. . . . . . . . ... . . .......... ... . .
............. ..
..
... . ........... ...
.
. .... .... .. ... .................................................................. ... ..... ................ ...... ........... ... ..................................... ........ ...................
.............................. . ... .
. . .. .......................... ..... . ...... .... ...... ...................... ............ .. ... . ............... ........ ................. ... ................... ...........
......................................................................................................................................................................................................................................................... ................................................................................................................................................................................................................................................................................................................................................
0 ...
... .
...................
.
.
... ........... .... . ...
..
... .
............
........ ... ......................
0 . ... . ........
.......
...
............................ .
.
........... ...
. . ...... ..
.... ... .
....... .....
.
.......
..
. . . . . . . . . . . . . . . ...... . . ... . ........................ . . . . . . . . . . . . . . . . .... . . . . . ............................. . ............. . . . . . . . . . . . . . . . ....... . . . ................. .. . .. . . . . . . . . . . . . . . . . . ..... ... . . . ....... . . . . . . . . .
... ..... ....................... ..
.
... . .. .............
.
.
... ......
.
. . ... .....
.
.
... .... ......... ... . . .. ... ..... .
... ..
... . ............................................ .. ... . .. ................ ..... .... ........... .... .........
... .... ..... ............. .... ...... ..... .... .......... .... .... ....... .... .........
........ .............. ..................... ... ... ... ...
........................ .................... ... ..... ... .....
−2 ...... ..
...
...... ..
.....
...
−2 ... ......
... ....
......
... .....
.......
............
.. ...... ..........
..... . ......
..... ..
...
Trading the spread with κ = 0 results in four trades with a total profit of
¿ 5.69. However trading the spread with κ = 7 results in five trades with a
total loss of ¿ 4.03, table 2.8 shows the corresponding trading instances.
In this example there is a loss with every trade if we use κ = 7, but we make
a substantial profit when we use κ = 0. This is a bit of an extreme example
but what is often seen is that when there is no fundamental change between
the stocks in the pair, the profit is less when using the modified strategy
(κ > 0) then the original strategy (κ = 0). This is a big disadvantage of
the modified strategy, it is at least very difficult to determine if the relation
between the stocks is fundamentally changing. In spite of this disadvantage
we use the modified strategy because we do not want to exclude pairs like in
figure 2.10, we are willing to give up some profit on pairs who do not change
much.
25
Table 2.8: Trading instances modified strategy.
trade t seκ, t position (Y,X) price Y price X ret profit
1 270 0.71 (-1,+1.01) 10.72 9.94 1.01 -
2 328 -0.63 (+1,-1.18) 10.80 9.72 1.18 -0.30
3 378 0.72 (-1,+0.92) 9.93 10.50 0.92 -1.25
4 449 -0.87 (+1,-1.18) 11.59 10.55 1.18 -1.21
5 487 0.63 (-1,+0.90) 9.54 9.88 0.90 -1.27
total profit -4.03
2.5 Conclusion
In this chapter we have derived a trading strategy that resembles the strategy
IMC uses. It is not necessary anymore to do a fundamental analysis to find
out if a pair of two stocks is profitable to trade as a pair. We can apply the
trading strategy on historical data and see if we would have made a profit
if we actually traded the pair. In this way IMC identified a lot of pairs.
We would like to see if we can identify pairs in a more statistical setting,
again using historical data of two stocks, not to estimate profits, but to see if
the two time series exhibit behavior that could make them a good pair. We
will examine the concept of cointegration, but first we need some time series
basics.
26
Chapter 3
This chapter discusses briefly some basics of time series which we will need
for later purposes. More information can be found in [2] and [3].
White noise A basic stochastic time series {zt } is independent white noise,
if zt is an independent and identically distributed (i.i.d.) variable with mean
0 and variance σ 2 for all t, notation zt ∼ i.i.d(0, σ 2 ). A special case is Gaus-
sian white noise, where each ut is independent and has a normal distribu-
tion N (0, σ 2 ).
E(zt ) = µ,
E(zt − µ)(zt−j − µ) = γj ,
27
MA(q) A q-th order moving average process, denoted MA(q), is character-
ized by:
where {ut } is white noise (∼ i.i.d(0, σ 2 )), µ and (θ1 , θ2 , . . . , θq ) are constants.
The expectation, variance and autocovariances of zt are given by:
E(zt ) = µ,
γ0 = (1 + θ12 + θ22 + · · · + θq2 )σ 2 ,
½
(θj + θj+1 θ1 + θj+1 θ2 + · · · + θq θq−j )σ 2 if j = 1, . . . , q,
γj =
0 if j > q.
zt = c + φzt−1 + ut , (3.2)
where {ut } is independent white noise (∼ i.i.d(0, σ 2 )). If |φ| ≥ 1 , the conse-
quences of the u’s for z accumulate rather than die out over time. Perhaps it
is not surprising that when |φ| ≥ 1, there does not exist a causal stationary
process for zt with finite variance that satisfies (3.2). If |φ| > 1 the process
zt can be written in terms of innovation in the future instead of innovations
in the past, that is what is meant by ’there does not exist a causal stationary
process’. If φ = 1 and c = 0 the process is called a random walk. When
|φ| < 1, the AR(1) model defines a stationary process and has an MA(∞)
representation:
µ = c/(1 − φ),
γ0 = σ 2 /(1 − φ2 ),
γj = (σ 2 φj /(1 − φ2 )), for j = 1, 2, . . .
28
AR(p) A p-th order autoregressive process, denoted AR(p), satisfies:
1 − φ1 x − φ2 x2 − · · · − φp xp = 0, (3.4)
all lie outside the unit circle in the complex plain. This is the generalization of
the stationarity condition |φ| < 1 for the AR(1) model. Then the expectation,
variance and autocovariances of zt are given by:
µ = c/(1 − φ1 − φ2 − · · · − φp ),
γ0 = φ1 γ1 + φ2 γ2 + · · · + φp γp + σ 2 ,
γj = φ1 γj−1 + φ2 γj−2 + · · · + φp γj−p , for j = 1, 2, . . .
If equation (3.4) has a root that is on the unit circle, we call that a unit root
and the process that generates zt a unit root process.
29
The log likelihood for an AR(k) model is given by:
T T 1
log L = − log(2π) − log(σ 2 ) + log |Vk−1 |
2 2 2
1
− (zk − µk )0 Vk−1 (zk − µk )
2σ 2
X T
(zt − c − φ1 zt−1 − · · · − φk zt−k )2
−
t=k+1
2σ 2
The first term in (3.5) measures the model fit, the second term gives a penalty
to each parameter. The Akaike information criterion is calculated for each
model AR(k), with k = 1, 2, . . . , K. The k with the smallest value AIC(k),
is the estimate for the model order.
Two other information criteria are the Schwarz-Baysian and the Hannan-
Quint information criteria.
30
First difference operator The first difference operator ∆ is defined by:
∆zt = zt − zt−1 .
Unit root test Statistical tests of the null hypothesis that a time series
is non-stationary against the alternative that it is stationary are called unit
root tests. In this paper we consider the Dickey-Fuller test (DF) and the
Augmented Dickey-Fuller test (ADF).
The assumption of the DF test is that the time series zt follows an AR(1)
model:
zt = c + ρzt−1 + ut , (3.6)
H0 : ρ = 1 against H1 : ρ < 1,
31
The test statistic of the DF test S is the t ratio:
ρ̂ − 1
S= ,
σ̂ρ̂
where ρ̂ denotes the OLS estimate of ρ and σ̂ρ̂ denotes the standard error for
the estimated coefficient.
has a t-distribution. But we do not assume that the time series is stationary,
because the null hypothesis is that ρ = 1. So, the test statistic S does not
need to have a t-distribution. We need to distinguish several cases to derive
the distribution of the DF test statistic.
Case 1 :
The true process of zt is a random walk, i.e. zt = zt−1 + ut , and we estimate
the model zt = ρzt−1 +ut . Notice that we only estimate ρ and not a constant c.
Case 2 :
The true process of zt is again a random walk and we estimate the model
zt = c + ρzt−1 + ut . Notice that now we do estimate a constant but it is not
present in the true process.
Case 3 :
The true process of zt is a random walk, but now with drift, i.e. zt =
c + zt−1 + ut , where the true value of c is not zero. We estimate the model
zt = c + ρzt−1 + ut .
Although the differences between the three cases seem small, the effect on
the asymptotic distributions of the test statistic are large, as we will see in
chapter 5.
32
Augmented Dickey-Fuller test The Augmented Dickey-Fuller test tests
whether a time series is stationary or not when the time series follows an
AR(p) model. One of the assumptions of the Augmented Dickey-Fuller test
is that the time series zt follows an AR(p) model:
1 − φ1 x − φ2 x2 − · · · − φp xp = 0,
has exactly one unit root and all other roots are outside the unit circle.
Then the unit root cannot be a complex number, because the autoregressive
polynomial is a polynomial with real coefficients and if x = a + bi is a unit
root than so is its complex conjugate x̄ = a − bi. This contradicts the null
hypothesis that there is exactly one unit root. Two possibilities remain, the
unit root is -1 or 1. The first possibility gives an alternating series, which is
not realistic for modeling the spread (this becomes more clear in the chapter
of cointegration). Thus the single unit root should be equal to 1, which gives
us
1 − φ1 − φ2 − · · · − φp = 0. (3.8)
with
ρ = φ1 + · · · + φp ,
βi = −(φi+1 + · · · + φp ), for i = 1, . . . , p − 1.
The advantage of writing (3.7) in the equivalent form (3.9) is that under
the null hypothesis only one of the regressors, namely zt−1 , is I(1), whereas
all of the other regressors (∆zt−1 , ∆zt−2 , . . . , ∆zt−p+1 ) are stationary. Notice
33
that (3.8) implies that coefficient ρ is equal to 1. This leads to the same
hypotheses as with the regular Dickey-Fuller test:
H0 : ρ = 1 against H1 : ρ < 1,
34
Chapter 4
Cointegration
This chapter discusses the concept of cointegration and two methods for test-
ing for cointegration, the Engle-Granger and the Johansen method. Other
methods are described in, for example, [13] and [14]. In the last section
of this chapter a start is made with an alternative method. In this report
this alternative method is used for generating cointegrated data but not for
testing for cointegration, although this is possible.
35
Cointegration means that although many developments can cause permanent
changes in the individual elements of yt , there is some long-run equilibrium
relation tying the individual components together, represented by the linear
combination a0 yt .
xt = wt + ²x, t ,
yt = wt + ²y, t ,
wt = wt−1 + ²t ,
where error processes ²x,t , ²y,t and ²t are independent white noise processes.
The series wt is a random walk, so xt and yt are I(1) processes, though the
linear combination yt − xt is stationary. This means yt = (xt , yt ) is cointe-
grated with a = (−1, 1).
.
. .... .
... . ..... ... ..
. ..... .... .... . ..... .. .. ....
.....
.
..... ..... ........ ........ ........ .. ...... .... .....
.. .. . .
. .
. ...
.. ................... . ..... .... ...... ....... ...... ..... ........ ...... ...... ...... .... ......
. .............
.
. .........
. ..... .... ..... .... .....
.
.......
.
....... ... .
.
....... ......
. .
.......
. . .
.. . .. . ... . . . .. .. .
... .......... ..
....................... ........... ... . ..... .......... ..... .... . .......... ..... .... ... .. ..... ..... .. ... ..... . ..... ..... ......
............ ... .... ... ... ....... .......... ..... ...... . .... .... ... .. .... .. .. ..... ..... . .. .. ... .... .... .... .... .... .. .. ... .... ....
........ . ......... . .. .......... ....... .......... .... ....... ...... ............... . ........ ................ ... .....
. ..... .... ..............
. .... .... ..... ........
. ....... ......... ............ ..... .............. ... ....
. .......
. ......
.
.... ...... ...... ......... .............. ...... . ........ .... ..... ... .
. . ... . .... . . . ... .. . . . . . . . . .
.. . . . . . ... . . . . . . . . . . .
. ..
. ............. .... ........... ........ .. ... ...
. . ............... ..... ...... ... ......... ... .. .... ... ... ... ...... ....... .... ...... ........... ...... ...... ... ... ... ... ...... ... ... ........ ... ... ......... ... ... .. ... ... ...... ... .
. .... .... ......... ..... .. ..... ..... ..... ...... ...... .. ....... .. .. ... ... ..... .. .. .... .... .. . ..... .. .... ..... ... ... ... ... ...... ... ... .. ... ... ......... ... ... ..... ... ........ ..
.... .... .... ........ .
. . . . .. .
.. . . . .............................................................................................................................................................................................................................................................................................................................................................................
.... .. .. ..
..................
. .....
..... ........... ...... ..... ....... .... ....... .... ... ........ .............. .... .......... ........ ........ ... ............ ... ........ ........ ........ ........ ... ....... ....... .... .... ....... ........ .... ........ .... ........ .... ....
. ..... .... ... . ..... .........
......... ..... ....... .... .. ......... . ..... ...... ....... .... .. .. ..... ..... .... ... .. .... ..... .... .. .. .... ... ......... .. .. .... .... .... .. .... .. .. ..... .... .........
................... ......................... .... .... .. ...... ......... ...... ...... ...... .... ... ...... ...... ... ...... ...... ...... .... ..... ...... ..... .... . ..... ... ... .. .... .........
. . . . . . .
..
. .... .
. .
.
........... ....... . .... .. .... ....
.... ...... ... . ....
...
.... . ..... ..... ..... ......
..... ..... ..... .....
... .. ...... ..
... .
... ....... ... ..... ............ . ........ ........ . . . . . . ......
.. ....... .. . . . . . .
..... . ...................... .. .. . .
. ... ........ ... ... .. ...... ... ...... ...... ...... ..
. ...... .. ..... ...
.................
.........
.
.
....... ......
. . .. .
..
.
....
... ...... ... .
.
......... ...... . ... ..
..... .. ..
36
Correlation Correlation is used in analysis of co-movements in assets but
also in analysis of co-movements in returns. Correlation measures the strength
and direction of linear relationships between variables. If xt denotes a price
process of a stock, the returns ht are defined by
xt − xt−1
ht = ,
xt−1
with log(1 + ²) ≈ ² as ² → 0, we can approximate this by:
µ ¶
xt − xt−1 xt xt
= − 1 ≈ log .
xt−1 xt−1 xt−1
Correlation can refer to co-movement in the stock returns and in the stock
prices themselves, cointegration refers to co-movements in the stock prices
themselves or the logarithm of the stock prices. Cointegration and correla-
tion are related, but they are different concepts. High correlation does not
imply cointegration, and neither does cointegration imply high correlation.
In fact, cointegrated series can have correlations that are quite low at times.
For example, a large and diversified portfolio of stocks which are also in
an equity index, where the weights in the portfolio are determined by their
weights in the index, should be cointegrated with the index itself. Although
the portfolio should move in line with the index in the long term, there will
be periods when stocks in the index that are not in the portfolio have excep-
tional price movements. Following this, the empirical correlations between
the portfolio and the index may be rather low for a time.
The simple example at the beginning of this section shows the same, that
is, cointegration does not imply high correlation. For illustration purposes it
is convenient to look at the differences, ∆xt and ∆yt , instead of the returns
or xt and yt themselves because in this example they do not have constant
variances. The variance of ∆xt is
where σ 2 , σx2 , and σy2 denote the variances of ²t , ²x, t and ²y, t respectively.
37
In the same way, Var(∆yt ) = σ 2 + 2σy2 . The covariance of ∆xt and ∆yt is
given by
The correlation between ∆xt and ∆yt is going to be less than 1, and when
the variances of ²xt and/or ²yt are much larger than the variance of ²t the
correlation will be low while xt and yt are cointegrated.
The converse also holds true: there may be high correlation between the stock
prices and/or the returns without the stock prices being cointegrated. Fig-
ure 4.2 shows two stock price processes which are highly correlated, namely
0.9957. The correlation between the returns is even equal to 1. But the price
processes are clearly not cointegrated, they are not tied together, instead
they are diverging more and more as time goes on. So, correlation does not
tells us enough about the long-term relationship between two stocks: they
may or may not be moving together over long periods of time, i.e. they may
or may not be cointegrated.
Looking from a trading point of view, the ’pair’ in figure 4.2 is not a good
one. Figures 4.3 and 4.4 show the spread calculated with the average ratio
r and calculated with a 10% moving average ratio ret respectively. In figure
4.3 it is clear that this ’pair’ is not a good one, because the spread is not
oscillating around zero. Figure 4.4 looks better, but actually we are loosing
money with nearly every trade because the ratios when positions were put
on differ a lot from the ratios when the positions were reversed. The ratios
differ a lot because the actual ratio rt is moving a lot, which is due to the
divergence between the stock prices. So, correlation is not a good way to
identify pairs.
38
100 .
..................
...... .............. . .
.......... .............
.
................... . .
.
.
..... .. ...
. ....... ... ..
........................................
....
. ....................... .. ...... ....
................
...
.... ......... ............. ...........
.......... ............ ........... ............
......................... . .... ..................... . ....
.... ..... . . .. . .
... .
..............
......... ...............................
50 .
...... ..
.......................... ......
.. .
. ........... ....
..............................................................................
.
.... ........... . .
. .
0
Figure 4.2: Highly correlated stock prices.
...
.........
5
.... .
..............
.... 1 ..
........ .
............ ....
. ..... .......
.......
............. ... .. ........
...... ....... .. ... .... ... .... . ..
................. ...
..
..
...... ... ............. .............. ........ ............ ........... .... ....
.......... .. ... .. ..... .. ....... .... .......... ......... ... .......... ......... .....
0 ..... ..... .
........ ......
...
........
0 ..
... . .................... ..... ... ... ............... . .......... ..... .................. ................ .... ........ ........ .... .....
.
...... ... .. . ..... ..... ....................................................... ...... . .... .... ...... ......... ... ................ ... ...... ... ... ........ ... ...
......... ................. ..................... ........ ...... ...... ........... ................... ............. ..... ...... ......... ............ .............. ...... ... .... . ... ......
. ... ..... ... .... ....... ..... . ............ .. ..... ... .......... .......... .. ..... ............ ......... .... ... ...... ..........
..
...................... .......... ........ ..... .... ... .. ........... ... ... ... ........ ............ ... ............... .................
. ... . . ...... .......... ... ...... ........... ..............
. ... ..... ....... . ........ .. ... .......... .. ....
..... ........ . ..... ....... . . .
..
. .
............. ... .......
−5 .. ... ..
... .
........................
. ..
−1
....
....
....
...
. .
...... . .
.........
39
found in, among others, [7]. This model is famous for the use in option valu-
ation. We will use this model to show that the logarithm of stock prices are
integrated of order one and to show that it is more or less justified to assume
stock prices themselves are integrated of order one.
In figure 4.5 the daily closing prices of Royal Dutch Shell are plotted. The
figure shows the jagged behavior that is common to stock prices.
.
30 ...
....... .
.
.......... .. .
....
.....
... ... ....
.....
.... ...
. .. ......
...... ..........
......... ....... ........ .... ....... .... ...................... ............... ............. ..
. ... .
. ....... .... ...... ... ....
. .. ... ..... ... ..... .. .... .... . ................ ........ ... ... ......... ......
..... .... ...... ..... .... ...... ...... .... ... . .......
..... ... ..
......
..... ..
.. ...
..
.... ... ....
¿ 25 ....
.
...... ... ...
..... ........ ....... .. ....
.
.
.
....
.
....... .
We first examine the returns of the Royal Dutch Shell stock. Figure 4.6
shows the estimated density of the daily returns with the N (0, 1) density
superimposed, figure 4.7 the empirical distribution function and figure 4.8
the normal QQ-plot. The daily returns were normalized to
ht − µ̂
ĥt = ,
σ̂ 2
where µ̂ and σ̂ 2 are the sample mean and sample variance. These figures
suggest that the marginal distribution of daily returns of the Royal Dutch
Shell stock is Gaussian. The QQ-plot indicates that the match is least ac-
curate at the extremes of the range, the returns have fatter tails than the
normal distribution. Figure 4.9 shows the sample autocorrelation function of
the daily returns. The bounds ±1.96T −1/2 are displayed by the dashes lines,
here T = 520. The figure strongly suggests that the returns are uncorrelated.
Although uncorrelated does not implies independence, we suggest that for
modeling xt we take the returns as normally distributed i.i.d. samples, be-
cause the autocorrelation function for a sample from an i.i.d. noise sequence
looks similar as figure 4.9.
40
.......
.... .........
...
0.4 ..
.
.
.. .. ....... .......
....... .......
...
........ ......
.... ....
....... .....
......
. ..
........ ... .
.
. ......
. . ....
......... ... ..
0.2 ..
..
.....
.
........
.
. ... ...
... .
... ...
.... .
..
.
...... ..... .
......
. .........
...... .....
..
. ......
..
..
.. ........
..
..... ..................
...
...
...
...
...
....... ........................
..
........ ..... ....................................................
................................................ ..
0.0
−2 0 2 4
Figure 4.6: Estimated density.
1.0 ..............................................................
...............
................
.......
. ..
.............
.
...........
..... .
..........
......
......
.
0.5 ........
.......
.
....
.
....
............
..
.............
...
...........
........
...
..............
0.0 .
................................
...
...
...
..
..
..
....
...
−2 0 2 4
Figure 4.7: Empirical distribution function.
..... ..... .. ....
.....................
..............
...........
..............
2 .
....
.
.
.............
.........
.
..................
...............
...............
.
...
..
..
.
.................
................
.............
........
0 ....
..
.
..
.
........
.........
..............
...........
−2 ..
..
..
.
...........
..........
.........
0 2 4
.
.........
...............
−2 .....
......................
........ ........
................
....
................
0 5 10 15 lag 20 25
41
Given the stock price x(0) = x0 at time t = 0, we like to come up with
a process that describes the stock price x(t) for all times 0 ≤ t ≤ T . As
a starting point for the model we note that the value of a risk-free invest-
ment D, like putting money on a savings account, changes over a small time
interval δ as
D(t + δ) = D(t) + µδD(t),
where µ is the interest rate.
There is something that is called the efficient market hypothesis that says
that the current stock price reflects all the information known to investors,
and so any change in the price is due to new information. We may build this
into our model by adding a random fluctuation to the interest rate equation.
Let t = iδ, the discrete-time model becomes:
√
x(ti ) = x(ti−1 ) + µ δ x(ti−1 ) + σ δ ui x(ti−1 ) , (4.1)
where the parameter µ > 0 represents an annual upward drift of the stock
prices. The parameter σ ≥ 0 is a constant that determines the strength of
the random fluctuations and is called the volatility. The random fluctuations
u1 , u2 , . . . are i.i.d N (0, 1). Notice that the returns [x(ti ) − x(ti−1 )]/x(ti−1 )
indeed form a normal i.i.d sequence.
We consider the time interval [0, t] with t = Lδ. Assume we know x(0) = x0 ,
the discrete model (4.1) gives us expressions for x(δ), x(2δ), . . . , x(t). To de-
rive a continuous model for the stock price, we let δ → 0 to get a limiting
expression for x(t).
42
We are interested in the limit δ → 0, we exploit the approximation
log(1 + ²) ≈ ² − ²2 /2 + · · · for small ².
µ ¶ L−1
X
x(t) √ 1
log ≈ µδ + σ δui − σ 2 δu2i .
x0 i=0
2
This is justifiable because E(u2i ) is finite. We have ignored terms that involve
the power of δ 3/2 or higher.
The limiting continuous-time expression for the stock price at fixed time t
becomes:
1 2 )t+σ
√
x(t) = x0 e(µ− 2 σ tW
, where W ∼ N (0, 1).
For non-overlapping time intervals, the normal random variables that de-
scribe the changes will be independent. We can describe the evolution of the
stock over any sequence of time points 0 = t0 < t1 < t2 < · · · < tm by
1 2 )(t √
x(ti ) = x(ti−1 )e(µ− 2 σ i −ti−1 )+σ ti −ti−1 Wi
. (4.2)
This model guarantees that the stock prices is always positive, if x0 > 0.
Model (4.2) is used a lot and is often referred to as geometric Brownian mo-
tion.
We like to model the daily closing prices, we assume that the time inter-
vals ti − ti−1 are equally spaced. That is, we set the time between Friday
43
evening and Monday evening equal to the time between Thursday evening
and Friday evening. We can write (4.2) as
1
√
2 )δ+σ
xt = xt−1 e(µ− 2 σ δ ut
, (4.3)
with δ equal to 1/260, because there are approximately 260 trading days in
a year. This is basically the same as the discrete model (4.1).
From this model follows that log xt is integrated of order one, because
1 √
log xt = log xt−1 + (µ − σ 2 )δ + σ δ ut ,
2
hence
1 √
log xt − log xt−1 = (µ − σ 2 )δ + σ δ ut
2
= constant + Gaussian white noise.
This difference process of log xt is I(0) and because the process log xt itself
is not, it follows that log xt is I(1). This is one of the reasons why cointegra-
tion tests are also applied to the logarithms of stock prices. Unfortunately,
translating cointegration between the logarithm of two stocks into a trad-
ing strategy is less intuitively clear then translating cointegration between
two stock prices themselves. When there is cointegration between the stock
prices, trading the pair is very obvious. Let yt = (xt , yt ) be two stock prices
processes which are cointegrated with cointegrating vector a. We ’normalize’
this vector to (−α, 1), so yt − αxt is a stationary process with mean zero,
which means that yt is approximately αxt . It could be that there is a con-
stant in the cointegrating relation, than yt − αxt does not have mean zero.
This will be discussed in the next section, for now we assume that the mean
is zero. We treat yt − αxt as our spread process described in chapter 2, so we
trade pair (x, y) in the constant ratio α : 1. This is exactly the same as the
trading strategy, if we do not use the average ratio to calculate the spread
but the least squares estimator.
If the logarithms of the stock prices xt and yt are cointegrated with coin-
tegrating vector b, we normalize this to (−β, 1), then log yt − β log xt is a
stationary process. So log yt is approximately β log xt , we cannot trade log-
arithms of stocks so we like to know the relation between xt and yt .
44
Let εt denote the residual process:
log yt − β log xt = εt .
yt = xβt eεt .
It is not clear how we can trade this relation, not with the strategy from
chapter 2. This is the reason why we want to test for cointegration on the
stock prices and not on their logarithms, in order to do that we need xt and yt
to be integrated of order one. In chapter 9 we will make an attempt to come
up with a trading strategy if we have cointegration between the logarithms
of the stock prices.
Model (4.3) does not imply that xt is I(1), this is more easily seen in (4.1).
The difference is
√
xt − xt−1 = µδxt−1 + σ δut xt−1 ,
this has not got a constant expectation, so according the derived stock price
model the difference process is not I(0). Fortunately, we look at the stock
prices {x}Tt=0 for fixed T , µ is a small number between 0.01 and 0.1 and
typical values of σ are between 0.05 and 0.5, so it is not likely that xt−1
becomes very large or very small. That is why the differences divided by the
mean value of xt−1 look a lot like the returns:
xt − xt−1 xt − xt−1
≈ .
xt−1 x̄t−1
The returns are I(0), this indicates that the difference process ∆xt are also
more or less I(0) and stock price process xt more or less I(1). We consider
a realization of model (4.3), where we take µ = 0.03, σ = 0.18 and x0 = 20
shown in figure 4.10. The differences of this realizations is shown in figure
4.11, which looks like pretty stationary. This indicates that realizations of
model (4.3) behave like they are I(1), while strictly under the model they
are not.
45
22
. .........
.... .............
...... ......... ...
......... .... ..... .... ..
.... ... .... .. .. .... .
. ....... .. ... .. .... ........ .......
..... ... .... .. .... ... ...........................
. .
...........
. .... . ...................... . ..........
. . ..
.
. .... ....
. . . . . . . . .. .
..... .. .. .
..............
............. .......
......
... .
... .......... ....................... ...... ....
. .. .
........
20 ...................
..... ....
. ... .
.
.. ....
.. ...
...... ........
...... ....... ............
. .. .
..... ..... .. ......... .
.
. .
. ....
...........
. .
.. .
.
.
.
... ............ ........... .... ......
... .... ..
. . .
. .
..
. .
.
..... ...
.
.
. .....
........ ..........
.
.. . .
.. ... ... ... .. ..... ..... ....... .. ..... ...
.
... .. . ...... ........
... .. ... ... .... . .. ... ... ..... . .. .... ......... ..... .
... .... .. ... ........ ... ..... ................ .. ...... ....... .. .....
... ... ...... .......
. ..
. . ...
. ....... ..... ....... .. ........... ......... ..........
. .
... .......... . ..... .. ........ ..... .................. . . . .
..... ... .. .. . ..
................... ... . ... ...
...... ................................. . .................. ...... ........ ... .. ... ....... .. .................. .... .
.... . . ........... ....... ... . ... ..... . ... ....... ... .
.. . ........ ... ................ .... ........... .
..... . ..... ........ ..
.. .................. . ... ...
18 .. ... .. .. ..
.....
.....
..
... ......
........
−1
Figure 4.11: Differences of realization of model (4.3).
An other way to show that it is justifiable to assume the stock prices are
integrated of order one, is to examine the differences instead of the returns.
At the beginning of this section we examined the returns of Royal Dutch
Shell, let us do the same for the differences ∆xt = xt − xt−1 . Figure 4.12
shows the estimated density of the daily returns with the N (0, 1) density
superimposed, figure 4.13 the empirical distribution function and figure 4.14
the normal QQ-plot. The daily differences were normalized to
c t = ∆xt − µ̂ ,
∆x
σ̂ 2
where µ̂ and σ̂ 2 are the sample mean and sample variance of the differences.
Figure 4.15 shows the sample autocorrelation function of the differences.
These figures look pretty much the same as the figures for the returns, this
suggests that it is justifiable to see the differences of a stock price process as
normal distributed i.i.d samples. This implies that the differences are I(0)
and the stock prices I(1).
46
...............
.... ...
...
0.4 .
...
.
......... ....... .......
. .......
...
...... .....
.. .. ....
....... .....
......
. .. ...
... .... ......
. . .
. .....
....... ... .
0.2 ..........
......
.
.
.. .......
...
... ...
... ..
. .
..... .... .
..
.
.... ........
..
.. ......
.......
. .......
.......
.
...
... ........
.
.......
.
. ..............
.
...
...
...
...
...
..
.......
. ......................
............ ..... ...................................................
....... ......................... ....... ..
0.0
−2 0 2 4
Figure 4.12: Estimated density.
1.0 ........................................................................
.................
.................
....... .
.
..............
.
...........
.... ..
.........
........
.......
0.5 ......
............
.
...
..
.. ...
...........
..
...........
.........
.............
.........
..
..
..
..............
.
0.0 .
.............................
...
...
...
...
...
..
..
...
...
−2 0 2 4
Figure 4.13: Empirical distribution function.
..... ....... .. .....
..........................
...............
.............
.............
2 .
.......
..........
.
..
..
.
......................
.
................
..............
. .
...
.
.....................
.
.
.. ..
..............
................
.........
0 . .....
.
.
.........
............
................
...............
−2 ..
...
.
........
..........
............... 0 2 4
.
.....
............
−2 ........................
......... .......
....... .......
........ .............
.
.. ........
0 5 10 15 lag 20 25
47
So far we have discussed why it is likely that stock price processes are in-
tegrated of order one, but we can also do unit root tests on the data we
want to test for cointegration. The unit root test we use in this report is the
(Augmented) Dickey-Fuller test, introduced in chapter 3. The first test is:
The outcome of this second test should be to reject H0 , which makes is likely
that the price processes are I(1). Which case of the DF-test should be used
is discussed in the next section, the critical values of these tests are derived
in chapter 5 and the results of these tests for data used in this report are
stated in chapter 7.
48
to test whether the residuals of the regression are stationary using a Dickey-
Fuller test. Because if the residuals are stationary, the linear combination
a0 yt is stationary, which means yt is cointegrated with cointegrating vector a.
Looking from a pairs trading point of view we have two stock prices pro-
cesses, yt = (xt , yt ). We like xt and yt to be cointegrated such that the
spread εt = yt − αxt oscillates around zero, again we have ’normalized’ the
cointegrating vector a to (−α, 1). A stationary process has constant expecta-
tion but it is not necessarily equal to zero. In order to get a stationary process
with mean zero, we can include a constant in the cointegration relation such
that the spread becomes:
εt = yt − αxt − α0 .
For example, consider the pair (xt , yt ) which is generated with the relation:
yt = 2xt + 20 + εt ,
.
. . ... ....... .......................
............................ ... .................................... ........................... ............ .. .............................
............................. ........ ................. ............................................ ................
50 .
.....................
...........................................
.
..
..
..
..
..
..
..
..
..........................
. . ..
..
.
............................ ................ .....................
... ......... ...............
0
Figure 4.16: Paths xt and yt for α0 = 20.
α̂¬α0 = 2.41.
49
.
....... .. .
.. .. ... ....................
........... ..
...... ... ....... ......... ....
............ . .... ............. ............
1 ......................
.
.
... ..
......... ........
......... .
.........................
............. ..........
..............
.................
....... .............
.
.
...
.....
.. ... ....
.. ...
..
...
........ ......... ..
... .....
. ..
.... ......
......... ..........
....... ...
.
........... ...........
............. ......
.
... .....
..... .......
..... ..
..
.....
.......
0 .....
...........
..
... ..
................
............
..
.... ... ...... ..... .... .... .. ..
.......... ...... ...
.
...
...
. .........
. .
.... ..
...
. ... .. .... ..
. .... .. ...... .... ... ...... .. .. ..
..... ... ..... ......... .....
..........
.. ...........
.. ... .... ....
...
......
......................
......... ... ... ......... ... .......
....... ..
... ..
... . .. .............. ............... .... . ......
......... ...... ......... ........ .....
. .... .
. .
. .........
...... .
......
.. ............... . ...... .............. ...... ...
..
......
..... ....... ..... ..... .....
0 .
......
....
..
.....
...
. ...
...
....
.....
.....
........
.....
..
.. .....
......
....
............. ..... ...... ..
...... ....... ..... ...
....... . ...
...
..
.
....
.....
...
....
.....
.......
. ...
......
.........
. ......
...
...
.
..
.........
........
.
...
...
....
.
.
..
......
.....
−5 .......
....
.
.. .....
...........
. .
...... ...
... ..
..
.
.
.
.
..
.
...
....
.....
...
...... .... ...... ... ....... ...... ..... ..... ..
..... ..
. . . ..
. . ..... ...
. ............... .....
.. ... ..
....... .
..
.
......... .
. .... .. .................... ... ...... ....
........ ........ ............. ......... ... ........ ........ ......... ....... ..
....... ..... ....... ..... ...... ......... . .... ... .......
.
..... ......... .... . ........ ........... .. .... .....
... .... ..... .......... ....... .......... .......... .......
........ ...... ....... .............. ......... ................. ....... .......
....................... ..................... .... ... ......
............ ...... ..... ...... .........
−1 ..................
................
...............
.......
...............
....
...... ........... ....
....
.
....
.. .
−10 ....... ................
... ..............
........ ......
...... .
.......
..
Figure 4.17 shows the corresponding spread processes. The left is yt − α̂α0 xt −
α0 , the right is yt − α̂¬α0 xt . The left figure of 4.17 looks a lot better but there
is a disadvantage. In section 2.3 about the properties of pairs trading was
described that pairs trading is more or less cash neutral. The trading strat-
egy is cash neutral up to Γ if we neglect costs for short selling, in other
words each trade costs or provides us with Γ. The cash neutral property is
a property we like to keep. If we trade the spread from the left figure of
4.17 it is not cash neutral anymore. Assume that the predetermined thresh-
old Γ is equal to 1. The first time the spread is above 1, the value of x is
¿ 43.73 and the value of y is ¿ 108.59, which gives that the spread at this
time equal to 108.59-2.01*43.73-19.67=1.02. Then we sell y which provides
us with ¿ 108.59 and buy 2.01 x which costs us 2.01*43.73=87.90. So we are
left with a positive difference in money of ¿ 20.69. The first time the spread
is below -1, the value of x is ¿ 52.66 and y is ¿ 124.51, which gives that the
spread at this time equal to 124.51-2.01*52.66-19.67=-1.01. Then we buy y
which costs us ¿ 124.51 and sell 2.01 x providing us 2.01*52.66=105.85. So
this trade costs us ¿ 18.66. This way of trading is not cash neutral, each
trade costs or provides us approximately α0 .
50
A possibility to resolve this is to neglect α0 , so we trade the spread from the
right figure of 4.17. In this example it is probably not worthwhile, because
the spread has a clear downward trend. Let us consider the two different
spreads for α0 = 1:
εt = yt − 2xt − 1,
shown in figure 4.18.
.
............ ...
......
1 ............
............
........... .......
... .....
.....
.............
.....................
............ ........
.............
.... ..........
..... .......
............ .. ..........
..
....
...
1 .
............
............................
. . .....
. .. .
. . . .. . . .
... ... ................ ........
. .... .
...... ....... .. . ........
......... .......
. ..
....... . . ....
...... ...... ....... ...... ... ... .... . ....
... ... ... ....... ... ....... ........... ...... .............
......
.... .... ... .. ...... .... ...... ...... ............. .... .........
...
......
. ...
.
. ..... ..
. .
. ......
.
. .. . ......................
.
. ........... ...............
.
. . .
.
.......
... .
....
.
....... .
...
..
..... ..
....
.
.
....
..... .
..... .... .
..... . .... ...
... ........ ........ ... ....... .... ...... ........ ...
....... ..... .... ........ ..... ..... ... ... .... ...... ................ ... ....
..... .... ........ ...... ..... .... .... ...... ... ........ .. . ... .
...... ...... . . .
0 ...... .....
.
....
...
.
.
...
. ...
...
.....
.
. .
.......
......
..
..
.......
.
.
...
....
0 .....
.......
.....
.....
......
.....
.......
..
..
...
.....
..... ...
...
..
....
.
...
......
......
.
...
.......
..
...
..... ..... .... ...... ..
... ..... ..... .... ..... ........
....... .... ... ...
..... ... ...... ...... ... ....... . . ..
......... .........
. ......
. ...... ........
. ... .
.... ....... ..
....
. .
..... ..
...
...... .
..
. .
..... .
.. ....... .
.... ..... ......
......
....
.......
........ ....
.. ..... .... ... ....... ..
......
......... ...... ........ .... ... ........
.......
....... ... ... ..... .... ..
........ ....... ........... ........ ....... ........... ..... .... ... ..... . ...
..... .. . . .
..... . ..
... .......
.... ..... .
.... ......... . .. .
. .
. ..
....... .......... .
..
.
..
.. . .....
............. ......... ..... ...... .... ........
......... ............. ........................ ..... .. ...... .. ....
........ .... ................. .. ...... .... ...
.... ... ...
−1 ...................
..................
.......
....................
........
.....
.
............
.. −1 .. .......
...... .......
......... .........
.......... .......
.......
......
........
.... ......
..
.
.
...
.
...
....
...
.
....
...
................... .... .... ...... ......
... .. ... ..
.......
...... ..... ............. .... ......
.
.
. ........ ............ ........ ...........
........................ ...... ...........
......... ........ ..............
....... ....................
.. .............. .
.......
..
..
Now the spread where α0 is neglected looks almost as good as the spread with
α0 . In conclusion, in order to keep the cash neutral property and the trading
strategy from chapter 2, α0 should be close to zero such that neglecting it still
gives a stationary spread process. So when testing real stock price processes
xt and yt for cointegration, we only estimate α and test the residual process
yt − α̂xt for stationarity. A suggestion for an alternative trading strategy is
stated in chapter 9, which is able to trade a pair when α0 cannot be neglected.
51
By contrast, if (−1, α) is not a cointegrating vector between x and y, then
yt − αxt is I(1) and from proposition 2 in section 5.6,
T Z 1
1 X 2 D 2
(yt − αxt ) → λ W (r)2 dr,
T 2 t=1 0
Now that we have a method for estimating the cointegrating vector, the
second step in the Engle-Granger method is examining the residuals with a
Dickey-Fuller test. In chapter 3 was described that there are several cases,
so the question remains which case do we use when testing for cointegration.
In most literature about cointegration, it is not stated which case is used and
why, but from the critical values used can be seen that case 2 is used most
often. One discussion found in Hamilton [6], is the following:
Which case is the ’correct’ case to use to test the null hypothesis
of a unit root? The answer depends on why we are interested in
testing for a unit root. If the analyst has a specific null hypothe-
sis about the process that generated the data, then obviously this
would guide the choice of test. In the absence of such guidance,
one general principal would be to fit a specification that is a plausi-
ble description of the data under both the null and the alternative.
This principle would suggest using case 4 for a series with a obvi-
ous trend and the case 2 for series without a significant trend. For
example, the nominal interest rate series used in the examples in
this section. There is no economic theory to suggest that nominal
interest rates should exhibit a deterministic time trend, and so a
natural null hypothesis is that the true process is a random walk
without trend. In terms of framing a plausible alternative, it is
difficult to maintain that these data could have been generated by
it = ρit−1 + ut with |ρ| significantly less than 1. If these data were
52
to be described by a stationary process, surely the process would
have a positive mean. This argues for including a constant term
in the estimated regression, even though under the null hypothesis
the true process does not contain a constant term. Thus, case 2
is a sensible approach for these data.
We do not have a specific null hypothesis, so according to this quote we
should use case 2 because there is no trend in spread processes. In the next
chapter we investigate the power of the three different tests, case 1 through
case 3, maybe we can find another reason to use Dickey-Fuller case 2.
If the number of stocks in a pair is greater than two, n > 2, the Engle-Granger
method has a disadvantage. We estimate the cointegrating vector with OLS
regression, if yt = (y1t , y2t , . . . , ynt ) we regress y1t on (y2t , y3t , . . . , ynt ). So the
first element of the cointegrating vector is set to be unity. This normalization
is not harmless if the first variable y1t does not appear in the cointegrating
relation at all, in other words, its coefficients is equal to zero but is set to one.
53
Then the OLS estimate β̂ is not simply the inverse of α̂, meaning that these
two regression will give different estimates of the cointegrating vector. Thus,
choosing which variable to call y1 and which to call y2 might end up making
a difference for the evidence one finds for cointegration.
For these reasons we discuss the Johansen method in the next section. First
a summary is given for testing on cointegration with the Engle-Granger
method:
Regress y1t on (y2t , . . . , ynt ), a constant maybe included but with our
trading strategy we do not want to include this constant. This regres-
sion gives the estimation â.
Then the residuals of this regression, which is our spread process, are
given by
e = y1t − â2 y2t − · · · − ân ynt ,
which resembles the real error process:
with ρ = 1.
54
We use the AR(p̂) fit
Compare the outcome with the critical values of the Dickey-Fuller test.
The critical values of the Dickey-Fuller test will be derived and simulated in
the next chapter. Engle-Granger is a two-step method, first we do an OLS
regression and then a Dickey-Fuller test. In chapter 6 we will examine if the
first step influences the critical values, in other words, are the critical values
for Engle-Granger really the same as for Dickey-Fuller.
55
Model (4.5) can be written as
where
ρ = Φ1 + Φ2 + · · · + Φp ,
β i = − (Φi+1 + Φi+2 + · · · + Φp ) , for i = 1, 2, . . . , p − 1.
with
E(εt ) = 0,
½
Ω for t = τ,
E(εt ετ ) =
0 otherwise.
β 0 = −BA0 , (4.8)
56
The goal is to choose (Ω, c, β 0 , β 1 , . . . , β p−1 ) so as to maximize (4.9) subject
to the constraint that β can be written in the form of (4.8). The Johansen
method calculates the maximum likelihood estimates of (Ω, c, β 0 , β 1 , . . . , β p−1 ).
The second step is to calculate the sample covariance matrices of the OLS
residuals ût and v̂t :
T
1X
Σ̂vv = v̂t v̂t0 ,
T t=1
T
1X
Σ̂uu = ût û0t ,
T t=1
T
1X
Σ̂uv = ût v̂t0 ,
T t=1
Σ̂vu = Σ̂uv 0 .
Σ̂−1 −1
vv Σ̂vu Σ̂uu Σ̂uv , (4.12)
with the eigenvalues ordered λ̂1 > λ̂2 > · · · > λ̂n . The maximum value
attained by the log likelihood function subject to the constraint that there
are h cointegrating relations is given by
h
Tn Tn T T X
L∗0 = − log(2π) − − log |Σ̂uu | − log(1 − λ̂i ). (4.13)
2 2 2 2 i=1
57
The third step is to calculate the maximum likelihood estimates of the pa-
rameters. Let â1 , . . . , âh denote the (n × 1) eigenvectors of (4.12) associated
with the h largest eigenvalues. These provide a basis for the space of coin-
tegrating relations. That is, the maximum likelihood estimate is that any
cointegrating vector can be written in the form
β̂ 0 = Σ̂uv ÂÂ0 .
ĉ = π̂0 − β̂ 0 .
Now we are ready for hypothesis testing. Under the null hypothesis that there
are exactly h cointegrating relations, the largest value that can be achieved
for the log likelihood function was given by (4.13). Consider the alternative
hypothesis that there are n cointegrating relations. This means that every
linear combination of yt is stationary, in which case yt−1 would appear in
(4.7) without constraints and no restrictions are imposed on β 0 . The value
for the log likelihood function in the absence of constraints is given by
n
Tn Tn T T X
L∗1 =− log(2π) − − log |Σ̂uu | − log(1 − λ̂i ). (4.14)
2 2 2 2 i=1
can be based on
n
X
2(L∗1 − L∗0 ) = −T log(1 − λ̂i ). (4.15)
i=h+1
58
An other approach would be to test the null hypothesis of h cointegrating
relations against h + 1 cointegrating relations. A likelihood ratio test of
H0 : h relations against H1 : h + 1 relations,
can be based on
2(L∗1 − L∗0 ) = −T log(1 − λ̂h+1 ). (4.16)
Like with the Dickey-Fuller test, we need to distinguish several cases. There
are also three cases for the Johansen method, but they are different than the
Dickey-Fuller cases:
Case 1 : The true value of the constant c in (4.7) is zero, meaning that
there is no intercept in any of the cointegrating relations and no determin-
istic time trend in any of the elements of yt . There is no constant term
included in the regressions (4.10) and (4.11).
Case 2 : The true value of the constant c in (4.7) is such that there are
no deterministic time trends in any of the elements of yt . There are no re-
strictions on the constant term in the estimation of the regressions (4.10)
and (4.11).
Case 3 : The true value of the constant c in (4.7) is such that one or more
elements of yt exhibit deterministic time trend. There are no restrictions on
the constant term in the estimation of the regressions (4.10) and (4.11).
For both tests, which can be based on (4.15) and (4.16), the critical val-
ues for the three different cases can be found in [10] and [11]. Unfortunately,
the critical values are for a sample size of T = 400. Although the data of
the ten pairs IMC provided consist of 520 observations, these critical values
will be used when testing the ten pairs for cointegration. I assume that the
critical values are not that different for a sample size of 520. For case 1 this
is very likely because Johansen showed that the asymptotic distribution of
test statistic (4.15) is the same as that of the trace of matrix
·Z 1 ¸0 ·Z 1 ¸−1 ·Z 1 ¸
0 0 0
Q= W(r)dW(r) W(r)W(r) dr W(r)dW(r)
0 0 0
59
And fortunately, case 1 is the case we will use because we do not want
an intercept in the cointegrating relations, as was explained in the previous
section, and we assume there is no deterministic time trend in the price pro-
cesses. The Johansen case 1 test can be compared with the Dickey-Fuller
case 1 test, there is no constant and we do not estimate one. There is not
really a Johansen case which can be compared to the Dickey-Fuller case 2,
with the Johansen case 2 test, the constant c is not necessarily equal to zero.
The critical values for case 1 and T = 400 for both test statistics (4.15) and
(4.16) are shown in tables 4.1 and 4.2 respectively.
Note that if g = 1, then n = h + 1. In this case the two tests are iden-
tical. For this reason the first rows of the tables are the same.
60
For the first test, we use the second row of table 4.1. We basically test the
null of no cointegration between the two stocks against the stocks themselves
being stationary. Although the alternative hypothesis does not imply ’real’
cointegration, because every linear combination of yt is stationary since yt is
already stationary, rejection of the null is taken as evidence of cointegration.
For the second test, we use the second row of table 4.2. We test the null
of no cointegration between the against the alternative of a single cointegra-
tion relation.
For the third test, we use the first row of either table. We test the null
of one cointegrating relation against the stock prices being stationary al-
ready. Basically we test if the relation is a ’real’ cointegrating relation.
If the third null hypothesis is rejected, the test indicates there are two coin-
tegrating relations which means the stock prices themselves are stationary.
As we saw in section 4.2 we do not think that stock prices are stationary,
but if they are we can trade them as a pair like any other pair. We could
even trade each stock as a spread process. That means, we apply the trading
strategy on the price process instead of the spread process. But this would
not be cash and market neutral anymore, and is seen as far more risky. So
with two stocks in a pair, we would like there to be one or two cointegrating
relations, but we expect there is only one. In chapter 7 the results of the
different tests for the 10 pairs are given. They are compared with the results
from the Engle-Granger method.
In the previous section was stated that the Johansen method has an ad-
vantage compared to the Engle-Granger method when there are more than
two stocks in a pair, n > 2. With Johansen we do not impose the first el-
ement of the cointegrating relation to be unity, we normalize the estimated
cointegrating relation such that the first element is unity or as Johansen pro-
posed, normalizing such that â0i Σ̂vv âi = 1. With three stocks in a pair, we
would like there to be one, two or three cointegrating relations but we expect
that there are no more than two. With our pair trading strategy it does not
matter how many relations there are as long as the stock are cointegrated,
because we only trade one relation. This relation will be the eigenvector
corresponding to the largest eigenvalue of the matrix in (4.12) because, ac-
cording to Hamilton [6], this results in the most stationary spread process.
61
4.5 Alternative method
In this section a start is made with an alternative method. Assume, like the
Engle-Granger and Johansen method, price processes xt and yt are integrated
of order one: xt , yt ∼ I(1). Denote with zt the vector of the differences of
these price processes µ ¶
xt − xt−1
zt = .
yt − yt−1
Then each component of zt is I(0), i.e. stationary. Notice that
µ ¶ X t
xt − x0
= zi .
yt − y0
i=1
Engle and Granger showed that a cointegrated system can never be repre-
sented by a finite-order vector autoregression in the differenced data ∆yt =
zt . The outline of the deduction is that if zt is causal, i.e. zt can be written
as a linear combination of past innovations, and (xt , yt ) are cointegrated then
zt is non-invertible. This implies that if (xt , yt ) are cointegrated zt cannot
be represented by a VAR(p).
Then
t
X t−2
X
zi = Θ2 w−1 + (Θ2 + Θ1 )w0 + (Θ2 + Θ1 + Θ0 ) wi
i=1 i=1
+ (Θ1 + Θ0 )wt−1 + Θ0 wt .
62
P
If v is a cointegrating vector, i.e. a vector such that v ti=1 zi is stationary,
than every multiple of v is also a cointegrating vector. We can make some
kind of normalization so we can write v = [−α 1]. For t > 2
P
(yt − αxt ) − (y0 − αx0 ) = [−α 1] ti=1 zi
The mean of (4.17) is constant for every Θ1 , Θ2 and α. The variance, how-
ever, is not. The number of terms in (begin) and (end) are the same for every
t, so only the variance of the (middle) part of (4.17) is depending on t. To
resolve this, Θ2 , Θ1 and α have to satisfy:
[−α 1] (Θ2 + Θ1 + Θ0 ) = 0.
The same argument goes for q > 2. So if the difference process zt is as-
sumed to be an MA(q), then for (xt , yt ) to be cointegrated the parameters
have to satisfy:
63
It should be possible to construct a new method for testing for cointegra-
tion. With real data it is obvious we can determine the difference process zt .
It is, however, pretty difficult to estimate the parameters of the MA(q) with
only 500 observations, specially when q becomes large. But if we could, than
we could base a hypothesis test on the estimated eigenvalue closest to zero
of the estimated matrices. We do not proceed with this in this report.
64
Chapter 5
Dickey-Fuller tests
65
5.1 Notions/ facts from probability theory
First we need some definitions and theorems. For the following three defi-
nitions we assume that {XT } is a sequence of random variables, and X is a
random variable, and all of them are defined on the same probability space
(Ω, F, P).
Notation: XT → X a.s.
Convergence in probability:
The sequence of random variables {XT }∞
T =1 converges in probability towards
random variable X if
P
Notation: XT → X.
Convergence in distribution:
The sequence of random variables {XT }∞T =1 converges in distribution towards
random variable X if for all bounded continuous functions g it holds that
E g(Xt ) → E g(X).
D
Notation: XT → X.
66
Law of large numbers:
Let X1 , X2 , . . . be a sequence of i.i.d. variables such that E|Xt | < ∞, then
T
1X
Xt → E X1 a.s. for T → ∞.
T t=1
(i) W (0) = 0 ,
(ii) For any time points 0 ≤ t1 ≤ t2 ≤ . . . ≤ tk ≤ 1, the increments
[W (t2 ) − W (t1 )], [W (t3 ) − W (t2 )], . . . , [W (tk ) − W (tk−1 )] are
independent multivariate Gaussian with [W (s) − W (t)] ∼ N (0, s − t) ,
(iii) W (t) is continuous in t with probability 1.
67
Now we like to derive something that is known as the functional central limit
theorem. Let ut be i.i.d variables with mean zero and finite variance σ 2 . Given
a sample size T , we can construct a variable XT (r) from the sample mean of
the first rth fraction of observations, r ∈ [0, 1], defined by
bT rc
1X
XT (r) = ut ,
T t=1
where bT rc denotes the largest integer that is less than or equal to T times r.
For any given realization, XT (r) is a step function in r, with
0 for 0 ≤ r < 1/T,
1u /T for 1/T ≤ r < 2/T,
XT (r) = (u1 + u2 )/T for 2/T ≤ r < 3/T,
..
.
(u + · · · + u )/T for r = 1.
1 T
Then p
bT rc bT rc
√ 1 X bT rc 1 X
T XT (r) = √ ut = √ p ut .
T t=1 T bT rc t=1
By the central limit theorem
bT rc
1 X D
ut → N (0, σ 2 )
bT rc t=1
³p √ ´ √ √
while bT rc/ T → r. Hence the asymptotic distribution of T XT (r)
√
is that of r times a N (0, σ 2 ) random variable, or
√ D
T [XT (r)/σ] → N (0, r).
Consider the behavior of a sample mean based on observations bT r1 c through
bT r2 c for r2 > r1 , than we can conclude that this too is asymptotically normal
√ D
T [XT (r2 ) − XT (r1 )] /σ → N (0, r2 − r1 ).
√
More generally, the sequence of stochastic functions { T XT (·)/σ}∞ T =1 has an
asymptotic probability law that is described by standard Brownian motion
W (·):
√ D
T XT (·)/σ → W (·) . (5.1)
68
There is a difference between the expressions XT (·) and XT (r), the first
denotes a random function while the last denotes the value that function
assumes at time r, it is a random variable. Result (5.1) is known as the func-
tional central limit theorem. The derivation here assumed that ut was i.i.d.
Proposition 1:
Suppose that zt follows a random walk without drift
zt = zt−1 + ut ,
where z0 = 0 and ut is i.i.d. with mean zero and finite variance σ 2 . Then
P D
(i) T −1/2 Tt=1 ut → σW (1) ,
R1
P D
(ii) T −3/2 Tt=1 zt−1 → σ 0 W (r)dr ,
P R1
(iii) T −2 Tt=1 zt−1
2 D
→ σ 2 0 W (r)2 dr ,
P
(iv) T −1 Tt=1 zt−1 ut D
→ σ 2 (W (1)2 − 1) /2 .
Proof of proposition 1:
(i) follows from the central limit theorem. W (1) denotes a random vari-
able with a N (0, 1) distribution, so σW (1) denotes a random variable with
a N (0, σ 2 ) distribution.
The area under this step function is the sum of T rectangles, each with width
1/T : Z 1
XT (r) dr = z1 /T 2 + · · · + z T −1 /T 2 .
0
√
Multiplying both sides with T:
Z 1 √ T
X
T XT (r) dr = T −3/2 zt−1 .
0 t=1
69
Statement (ii) follows by the functional central limit theorem and the con-
tinuous mapping theorem.
70
PT P
But ST (1) →D σ 2 W (1)2 and by the law of large numbers T −1 t=1 u2t → σ 2
which proofs (iv).
The t statistic S, used for testing the null hypothesis that ρ is equal to some
particular value ρ0 , is given by
ρ̂ − ρ0
S=
σ̂ρ̂
where σ̂ρ̂ is the standard error of the OLS estimate of ρ:
à T
!1/2
X
σ̂ρ̂ = rT2 / 2
zt−1
t=1
with
T
1 X
rT2 = (zt − ρ̂zt−1 )2 .
T − 1 t=1
When (5.2) is stationary, i.e. ρ < 1, S has an limiting Gaussian distribution:
D
S → N (0, 1).
But Dickey-Fuller tests the null hypothesis that ρ = 1, so we like to know
the limiting distribution of S when ρ = 1. Then we can write S as:
ρ̂ − 1 ρ̂ − 1
S= =³ . (5.3)
σ̂ρ̂ 2
PT 2 ´1/2
rT / t=1 zt−1
71
The numerator of (5.3) can be written as:
PT
zt−1 ut
ρ̂ − 1 = Pt=1
T 2
. (5.4)
t=1 zt−1
Apart from the initial term z0 , which does not affect the asymptotic distri-
butions (unfortunately it could affect the finite sample size distributions, we
will see this later on), the variable zt is the same as in proposition 1. So it
P
follows from proposition 1 (iii) and (iv) together with rT2 → σ 2 , that
as T → ∞:
1
D σ 2 (W (1)2 − 1) /2 2
(W (1)2 − 1)
S→³ R1 ´1/2 = ³R ´1/2 . (5.5)
2 2 2 1/2 1 2
σ 0 W (r) dr (σ ) 0
W (r) dr
Set W(0)=0.
72
Build path W by: W ( ni ) = W ( i−1
n
) + ui for i = 1, 2, . . . , n.
For each path the fraction in the right-hand side of (5.5) can be calculated,
approximating the integrals with Riemann sums. Then the density of S can
be estimated with applying a Gaussian kernel estimator on all these values.
Figure 5.1 shows the estimated density for 5,000 paths and n = 500.
0.4 ....
...
.........
..... ..........
.....
....
.... ...
... ...
...
... ....
..
. ...
.... ...
...
... ...
... ...
.. ...
. ...
... ...
... ...
...
... ...
..
0.2 .
.
..
.
..
...
...
...
...
... ...
... ...
... ...
....
..
. ....
...
. ....
....
... .....
.... ....
..
.... .....
.....
......
. ......
.
.....
. ......
.
...
..... .......
.........
...
...
...
...... ...............
0.0 .................................................................... ........................................................................
−4 −2 0 2 4
Figure 5.1: Asymptotic density of DF case 1 test statistic.
We can approximate the 1%, 5% and 10% critical values by calculating the
corresponding quantiles of all the calculated fractions. Table 5.1 shows the
critical values according to this simulation and the values according to Hamil-
ton [6].
These critical values belong to the asymptotic distribution (5.5), which de-
scribes the distribution of the DF case 1 test statistic if the sample size T
goes to infinity.
73
We approximate the critical values for finite sample sizes T , by simulating in
a different way:
Set z0 = 0.
Calculate ρ̂.
Calculate σ̂ρ̂ .
We can approximate the 1%, 5% and 10% critical values by calculating the
corresponding quantiles of the simulated test statistics. For finite T , the
critical values are exact only under the assumption of Gaussian innovations.
As T becomes large, these values also describe the asymptotic distribution
for non-Gaussian innovations. Table 5.2 shows the critical values according
to this simulation for different values of T and σ 2 . Table 5.3 shows the critical
values according Hamilton [6]. The critical values should be independent of
σ. Table 5.2 shows roughly the same values for different σ 2 , but as σ 2 becomes
large there is more dispersion. Figure 5.2 shows the estimated density of the
simulated test statistics for different values of σ 2 and T = 500, the graph of
figure 5.1 is also displayed.
74
Table 5.3: Hamilton’s critical values DF case 1.
T 1% 5% 10%
100 -2.60 -1.95 -1.61
250 -2.58 -1.95 -1.62
500 -2.58 -1.95 -1.62
.............
................ ....
0.4 .
...........................................
.................... .........
.........
................ ........
......... .........
........... ...........
...........
. ............ ............
.......... .........
........ ..........
. ..........
.....
. ........
...........
.....
.
......... .........
. .........
......... .......
........
. ........ ......
. ....... .......
.....
0.2 .
.
........
........
......
........
.......
.......
......
......
. .......
......
.. .........
....
.
. ..........
....
.
. ..............
..
. ...........
.. .................
..
.
....... ..............
.
.
......
.
.
.............
.............
.
.......... ..............
..
.
.
...
.......... ..............
..............
...
..
.
..........
. ...........
...
..
..
............ ...........
.................
...
..
..
..
...... ............................................
0.0 .
...............................................................................................
...
..
..
..
..
... ........................................................................
−4 −2 0 2 4
Figure 5.2: Estimated density of DF case 1 for different σ 2 and T = 500.
The initial term z0 does not affect the asymptotic distribution. Unfortunately
it does affect the distribution when the sample size is finite. With Dickey-
Fuller case 1, we basically fit a line that goes through the origin. If the initial
term is large the slope of this line, ρ̂, is closer to one then when the initial
term is small. The standard error for ρ̂, σ̂ρ̂ is a lot smaller for a large initial
value then for a small initial value. That is why the test statistic for a large
initial value is likely to be larger than the test statistic for a small initial
value. The estimated densities for initial values z0 = 0, 1, 10, 50, 100, 500 are
shown in figure 5.3. The solid lines correspond to z0 = 0, 1, 10 , the dashed
lines correspond to z0 = 50, 100, 500. We see a shift to the right as the initial
value increases. The density found with simulating Brownian motion is not
displayed, it lies among the three solid lines.
75
.........................
0.4 ...........
.......................................... ........ ...
......... ................. .........
.................. .. . ..
......
........ .... ..... .......... ..... .........
.. ......... . .......... ...... .. ..
......
. ..... ....
. ......... . ......
.
...... .. .......... ... ..
.. . .......... .. ......
...... .... ... ..........
.......... ....... ..
..
..... . ... .
. ........ . . ...
..... .... ....
. .......
.
... ... ...... ....... ..
... ... . .. ....... .. . ...
.. .. ... .. ....... . ..
...... ..... ..
...
... ... .... ...... . . ...
..... . .. .. ........ ....
0.2 .
.
.
..... ... ....
...... . .. ..
....... ..... ...
.....
....... ...... ..
......... .... ..
............. ..... ........ .
. ..... . . .. ....... .........
. .......... ..... ........ ... ..
...... .....
.......... . .. .
.. . ...... ....... .
.
.......... .... ...... . ....
........... ... ..... .......
................. ........
.
..... ....... ..
....... ........
. ..
.......... . .....
. ........ .........
. .
........... ........ . .......... .. .......
...
.
.. .......... ..........
. . . .
.................. ...
.
. .......... .... ......
. . .
..
.
.
..
.
............... .......
. ............ .............. ..
...
...
..
..
..
....................... .......
. ............................ .
.................................................. .
0.0 . .
........................................................................................................ . . .............................................................................. .......
−4 −2 0 2 4
Figure 5.3: Estimated density of DF case 1 for different z0 and T = 500.
zt = c + ρzt−1 + ut , for t = 1, . . . , T,
where ut ∼ i.i.d with mean zero and finite variance σ 2 . We are interested in
the properties of test statistic S = ρ̂−1
σ̂ρ̂
under the null hypothesis that c = 0
and ρ = 1. The OLS estimates are given by
· ¸ · P ¸−1 · P ¸
ĉ P T P zt−1 P zt
= 2
ρ̂ zt−1 zt−1 zt−1 zt
v = A−1 w ,
76
then
Yv = YA−1 w
= YA−1 YY−1 w (5.7)
= (Y−1 AY−1 )−1 Y−1 w.
From the proposition in paragraph 5.1 follows that the first term of the right
side of (5.8) converges to
· P ¸ · R ¸
1P T −3/2P zt−1 D
R 1 σ R W (r)dr
−→
T −3/2 zt−1 T −2 zt−1 2
σ W (r)dr σ 2 W (r)2 dr
· ¸· R ¸· ¸
1 0 R 1 R W (r)dr 1 0
= (5.9)
0 σ W (r)dr W (r)2 dr 0 σ
77
The second element in the vector in (5.11) states that
1
R
D 2
(W (1)2 − 1) − W (1) W (r)dr
T (ρ̂ − 1) −→ R £R ¤2 . (5.12)
W (r)2 dr − W (r)dr
where
· P ¸−1 · ¸
£ ¤ T z t−1 0
σ̂ρ̂2 = rT2 0 1 P P 2 , (5.13)
zt−1 zt−1 1
T
1 X
rT2 = (zt − ĉ − ρ̂zt−1 )2 .
T − 2 t=1
P
From equation (5.14) and rT2 → σ 2 follows
R · ¸−1 · ¸
D £ 1 ¤ W (r)dr 0
T 2 σ̂ρ̂2 −→ 0 1 R 2
R
W (r)dr W (r) dr 1
1
= R £R ¤2 .
W (r)2 dr − W (r)dr
78
Finally, the asymptotic distribution of test statistic S is
T (ρ̂ − 1)
S =
[T 2 σ̂ρ̂2 ]1/2
R
(W (1)2 − 1) /2 − W (1) W (r)dr
D
−→ ³R £R ¤2 ´1/2 . (5.15)
2
W (r) dr − W (r)dr
We can find this asymptotic distribution and the corresponding critical val-
ues by simulating a lot of paths W in the same way as in the preceding
paragraph. The results are shown in figure 5.4 and table 5.4. Figure 5.4
shows that the distribution of the DF case 2 statistic is shifted more to the
left than the DF case 1 statistic.
.........
.... .......
... ...
... ...
... ...
... ...
..
. ...
.. ...
0.4 .
..
.
.
.. ...
...
...
... ...
..
. ...
... ...
..
. ...
...
... ...
... ...
..
. ...
..
. ...
..
. ...
... ...
.. ...
. ...
... ...
0.2 .
.
..
.
..
.. ...
...
...
.. ...
.. ...
.. ...
.... ...
...
..
. ....
..
. ....
.
.. ....
..
. ....
. ....
..
.... ....
...
. .....
...
. .....
...
.. ......
..
...
......
. ..........
.............
........
. ..........................................
0.0 .
....................................................
...
...
...
...
...
...
...
...
. ...............................
−6 −4 −2 0 2
Figure 5.4: Asymptotic density of DF case 2 test statistic.
79
Table 5.4: Critical values for DF case 1.
1% 5% 10%
Hamilton -3.43 -2.86 -2.57
simulation -3.43 -2.85 -2.59
These critical values belong to the asymptotic distribution (5.15), which de-
scribes the distribution of the DF case 2 test statistic if the sample size T
goes to infinity. We find the critical values for finite sample sizes T , by simu-
lating paths zt as in the preceding paragraph. The only difference is the way
we calculate ρ̂. Again we simulate for different values of σ 2 . The results are
shown in table 5.5, table 5.6 shows the critical values for DF case 2 according
to Hamilton [6]. Figure 5.5 shows the estimated density of the simulated test
statistics for different values of σ 2 and T = 500.
80
............
................................
............... .....................
......... .........
.......... .......
......
....... .........
......
. .......
0.4 .
..
.
.....
.
...
.....
....
.....
.....
.....
. ......
.....
. ......
.....
. ......
...
. ......
......
. ......
.....
.....
. .....
..... .......
...... .......
.........
...... .........
....... .......
...... .......
0.2 ..
.
.....
......
.....
......
.......
.....
......
. ...... .......
........ .......
.
....... .......
..
. ........
.............. ........
.......
.........
. .............
.............
.....
.
. ..............
..
.
....... ................
..
........
. ................
......... ...............
.
........ .................
...
..
. ..................
..
...
..
..
..
..
..
..
....... .......................
.....................................
.................................................................................................................
0.0 ...................................................................
−6 −4 −2 0 2
Figure 5.5: Estimated density of DF case 2 for different σ 2 and T = 500.
Like in the preceding section, the initial term z0 does not affect the asymp-
totic distribution. And fortunately, with case 2, it does not affect the dis-
tribution when the sample size is finite as well. With case 2 we estimate a
constant even though it is not present in the true model, we basically fit a
line which does not have to go through the origin. Then the slope of line,
ρ̂, is not closer to one if the initial value is large, such as with case 1. That
is why the test statistic for a large initial value is likely to be the same as
the test statistic for a small initial value. The estimated densities for initial
values z0 = 0, 1, 10, 50, 100, 500 are shown in figure 5.6.
..........
.............................
............ .........................
........ ..............
. ........... ...........
......... ...........
. ......... ............
0.4 .
.
. ..........
.........
.........
.............
...........
.............
...........
.......
. ............
......... ........
. ........ .........
. ....... ..........
. ......... ........
......... ........
.
......... .........
. ..........
.............. ............
.........
. .........
....... ......
. ........
.......
. .......
0.2 .
..
.....
.
.....
.....
.......
......
.........
......
.......
. ..........
........
. ......
.........
.......
. ...........
.........
. ............
.......
.
.
. ...............
.
........ ...............
.. .........
..........
.
. ............
..
....... ..........
. .
........... ........
...
. ...........
....
.
..
.
........
. ...............
.
..
............ ...............
.
..
..
.
..........
...
. .....................
0.0 ............................................................................................................. .. ...................................................................................................................
−6 −4 −2 0 2
Figure 5.6: Estimated density of DF case 2 for different z0 and T = 500.
81
5.4 Dickey-Fuller case 3 test
In this section we consider again the AR(1) process with a constant
where ut ∼i.i.d with mean zero and finite variance σ 2 . We are interested in
the properties of test statistic S = ρ̂−1
σ̂ρ̂
under the null hypothesis that ρ = 1
and c 6= 0.
We examine the four different sum terms in the right side of (5.17) sepa-
rately. First notice that (5.16) can be written as:
zt = z0 + c t + (u1 + u2 + . . . + ut ) = z0 + c t + vt ,
where
v t = u1 + . . . + ut , for t = 1, . . . , T, with v0 = 0 .
Consider the behavior of the sum
T
X T
X
zt−1 = [z0 + c(t − 1) + vt−1 ] . (5.18)
t=1 t=1
82
The third term in (5.18) converges when divided by T 3/2 , according to propo-
sition 1 ii ):
T
X Z 1
−3/2 D
T vt−1 → σ W (r)dr .
t=1 0
The time trend c(t − 1) asymptotically dominates the other two components:
T
1 X P
2
zt−1 → c/2 .
T t=1
T
X T
X T
X
+ 2z0 c(t − 1) + 2z0 vt−1 + 2c(t − 1)vt−1
t=1 t=1 t=1
| {z } | {z } | {z }
O(T 2 ) Op (T 3/2 ) Op (T 5/2 )
The time trend c2 (t−1)2 is the only term that does not vanish asymptotically
if we divide by T 3 :
T
1 X 2 P 2
z → c /3
T 3 t=1 t−1
83
PT
From the central limit theorem follows that t=1 ut is of order Op (T 1/2 ).
And finally
T
X T
X
zt−1 ut = [z0 + c(t − 1) + vt−1 ] ut
t=1 t=1
T
X T
X T
X
= z0 ut + c(t − 1)ut + vt−1 ut
t=1 t=1 t=1
| {z } | {z } | {z }
Op (T 1/2 ) Op (T 3/2 ) Op (T )
from which
T
X T
X
−3/2 P −3/2
T zt−1 ut → T c(t − 1)ut .
t=1 t=1
This results in the deviation of the OLS estimates from their true values
satisfy · ¸ · ¸−1 · ¸
ĉ − c Op (T ) Op (T 2 ) Op (T 1/2 )
=
ρ̂ − 1 Op (T 2 ) Op (T 3 ) Op (T 3/2 )
In this case the scaling matrix is
· ¸
T 1/2 0
Y=
0 T 3/2
84
Therefore
· P ¸ µ· ¸ · ¸¶
T −1/2
P u t D 0 2 1 c/2
−→ N ,σ
T −3/2 zt−1 ut 0 c/2 c2 /3
= N (0, σ 2 A) . (5.21)
where
· P ¸−1 · ¸
£ ¤ T z t−1 0
σ̂ρ̂2 = rT2 0 1 P P 2 (5.23)
zt−1 zt−1 1
with
T
1 X
rT2 = (zt − ĉ − ρ̂zt−1 )2 .
T − 2 t=1
T 3/2 (ρ̂ − 1)
S= .
T 3/2 σ̂ρ̂
85
We have already shown that
· P ¸−1 µ · P ¸ ¶−1
T zt−1 T z t−1
Y P P 2 Y = Y −1 P P 2 Y −1
zt−1 zt−1 zt−1 zt−1
· −2
P ¸
P1 T P 2z t−1
=
T −2 zt−1 T −3 zt−1
converges in probability towards A.
P
Because rT2 → σ 2 , the denominator converges towards
P √
T 3/2 σ̂ρ̂ −→ σc/ 3 .
Thus, the test statistic S is asymptotically Gaussian. The regressor yt−1 is
asymptotically dominated by the time trend c(t − 1). In large samples, it is
as if the explanatory variable yt−1 were replaced by the time trend c(t − 1).
That is why the asymptotic properties of ĉ and ρ̂ are the same as those for
the deterministic time trend regression. Therefore, for finite T test statistic
S has a t distribution.
In conclusion, when the true model is a random walk with a constant term
(ρ = 1, c 6= 0) and we estimate both ρ and c then the t test statistic S has
an asymptotic distribution equal to the standard Gaussian distribution:
D
S −→ N (0, 1)
This test statistic is referred to as the Dickey-Fuller case 3 test statistic. The
critical values for T → ∞ are given in table 5.7.
For finite T the Dickey-Fuller case 3 test statistic is t distributed, but the
degrees of freedom are large so it is almost standard normal. We can also
find the critical values for finite T , by simulating paths zt as in the preceding
paragraphs. Again we simulate for different values of σ 2 . The results are
shown in table 5.8. Figure 5.7 shows the estimated density of the simulated
test statistics for different values of σ 2 while T = 500 and c = 2.5, the stan-
dard normal density is also displayed.
86
Table 5.8: Simulated critical values for DF case 3.
σ2 = 1 σ2 = 5 σ 2 = 10
T 1% 5% 10% 1% 5% 10% 1% 5% 10%
100 -2.36 -1.76 -1.37 -2.51 -1.82 -1.46 -2.48 -1.84 -1.49
250 -2.37 -1.68 -1.33 -2.41 -1.74 -1.35 -2.40 -1.79 -1.40
500 -2.38 -1.75 -1.33 -2.41 -1.69 -1.35 -2.45 -1.76 -1.38
0.4 .............................
.......................................
............ ....... ...........................
. . ........................... ..............
... . ...........
................... ...........
............. ...........
............... ............
......... ............
........ .............
.. ............
. ........ ............
.. ........ ............
........... ............
..........
..........
. ............
........... ........
0.2 .
.
.
..........
...............
.
.......... .......
.........
...........
.............
............... ..............
..
. .........
. ...........
.. ............ ............
. .
............ ...........
. . . ............
. .
....................
.
.............
...............
. ...............
.. . ...............
. .................
. ...............
.. . .
.
................
. ...............
................
. ...
.
................. .....................
. .
......................... . ......................
.
....
. .
....
..
.. ..
...................... .............................
0.0 ..
...
..
.
............................................................................ ....
..
..
..
.
..
...
.... .......................................................................................................
−4 −2 0 2 4
Figure 5.7: Estimated density of DF case 3 for different σ 2 and T = 500.
To see if the value of the constant c has an impact on finite sample distri-
bution of the Dickey-Fuller case 3 statistic, we simulate paths zt for c =
0.1, 0.5, 1, 2.5, 10. The results are shown in figure 5.8, the standard normal
density is also displayed. The graph most left corresponds with c = 0.1.
87
0.4 .......
.......................
.......................................
........ .....................................................................................
..... . .
........................ ................
. . .... ... ...............
. . . .... ...................
... ............ ..... ....................
..
. .................. ..... .................
... ...............
. ....
... ... ...........
..
. ............. .... ... ..........
... . ......... .... ... ..........
... . ........... . ... ......
.... ...............
.
...
...
... .......
... .......
... . ........... .... ... ........
. .. ...........
. . ... ......
... .............. ... ... ........
0.2 . ..
...
.
.
.
.
.
. ..........
..................
....
...
....
... .........
... .........
.... .......
. .. ... ....... . ... ... ........
. .. .. ..........
. .... .......
. . .. ......... ... ............
.. .. .... .............
.. .. . ............. .
. ............
..............
.... .............. ....
. . .. ..
.... ...............
.
. . ... . .................. ..
.....
.............
...
. . . ................. ................
.
.... . . . .
........................ .
...... ................
.... . . . . ...... ...............
. . . .... . . . . . ........................ ... .................
..... .. ........................... ...... ...................
. ... . . ..... . .. . . ...................
... .. .. .. ....... . .
. .
...
...
..
.
.........................
.
...........
. . .. .........................................................................................
.. .. ...
..................................................................................................................................................................
0.0 ....... ......................................................................................................
−4 −2 0 2 4
Figure 5.8: Estimated density of DF case 3 for different c and T = 500.
It looks like a small value of c causes a shift to the left. In figure 5.9 the
estimated distribution for c = 0.01, 0.05, 0.1 is plotted with solid lines. The
dashed line is the distribution of case 1 and the dotted line is the distribu-
tion of case 2. For c = 0.01 the estimated density is almost the same as the
density found for case 2. This makes sense because the steps taken in the
case 2 en case 3 tests are exactly the same, the only difference is the true
model for case 2 has no constant and for case 3 it has. So for a decreasing
constant the case 3 test statistic converges to the case 2 statistic. The other
way around is also valid: for an increasing constant, in absolute value, the
case 2 test statistic converges to the case 3 statistic because the tests are the
same but now there is a constant in the true model.
.............
..... ............
.... ....
...
. .....
.. ...
...... ........................
. ... ......... ...... .......
0.4 .
. .... ......
.... ....
.... ... ..
..........
. ..............
.............. ........... ..
....
.
.
.... ... ...... .. ..... .....
. ..... ...
..... ..
..... ... .
.... .. ... .... ..... .
... .. ... ..
..... ... ... . ... .... ... ...
... .
.. ... .... ...
...... .... ...
. .. .
... ..
... ...
..... ... .. ...
. ..... .... ...
..... ... .. . ..... ... ... .....
.
. .
.... .. . ... .
..... ... ... ..
. .
..... ... . ... ...
..... ... ... .. . . ... ...
..... ... .
. . .... ... ...
. . .
..... .. . ... .....
..... ... .. ..
0.2 ..... ...
..... ...
..... ... .. .. . ... .
..
.
..
. .
.. ... ...
.. ...
.
.. ...
.
.
...
....
...
...
...
...
..
. .
.. .. ... ... ....
...... ... ..
. . ..
. . .. . ... ..
.... ... ... . . ... ... ....
. .. . . ... ..... .... .....
... .. .... .
..... .... ....... ... ... ..
.
. ..... ....
...
....
...
..
.....
..
. .. . . . ......... .... .
.....
..... .... ..... ...
.
........ .... ..... ..
............... .......
. . ..
.......
....
...... ...
..... .....
........... ......
.
. ....
. .. ... .. ..
. ..
.......... ..... . .......... ....... ..... ..
. . ... .
........... ........ .
. ..... .
... ...... ...... ....... ..... ..
.
. .. .. . .. ... .
.............. ......... ..........
.
. .....
.......... .......... ..
.
..... ...
.
..... .. .. . .
. .................... .............. .......................... ....... ...
... ...
................................................................................. ....... .......
0.0 ............................................................................................................. ....... ....... ....
−4 −2 0 2 4
Figure 5.9: Estimated density of DF case 3 for small c and T = 500.
88
To be consistent, we also simulate for several different initial values z0 . Fig-
ure 5.10 shows the results for z0 = 0, 1, 10, 50, 100, 500 while T = 500 and
c = 2.5. The figure suggests that the initial value z0 does not affect the
density of the test statistic for finite sample sizes.
0.4 .........
...............................
..........................................................
.................. ...........
..
.....................
. ........
.. . .......
.......... ........
. ............
.
. ......
......... .....
.. . . ......
..............
. ......
......
.........
. ....
..........
. .....
....
..
....... .....
..........
. .....
.......
. .......
0.2 .
........
..
.........
.
.....
.
. .......
..........
........
.......
. .........
.......
. .........
. ......
.
.
. ............
. .
.
.
....... ............
.
............ .
. ..
.. .........
. . ..............
. ..........
...........
. ..
.............. ...........
. . .
................. . ...........
. . .
..
..
. ...
..
................
. .............
..................
. .
................. ................
..
..
.
...
.
....
..
.
..
.
.
..
...........
. ...................
.
... .. .................................. .
0.0 ..................................................................................................... . .............................................................................................
−4 −2 0 2 4
Figure 5.10: Estimated density of DF case 3 for different z0
There also exist a case 4 for the Dickey-Fuller test, this includes a determin-
istic time trend in the true model. We are not interested in spread processes
with deterministic trends so we do not discuss this case.
89
First we summarize the previous sections.
1% 5% 10%
-2.58 -1.95 -1.62
Case 2: The true model of case 2 is zt = zt−1 + ut where ut ∼ i.i.d with mean
zero and finite variance σ 2 . We estimate the model zt = c + ρzt−1 + ut . The
critical values for test statistic S = ρ̂−1
σ̂ρ̂
when T = 500 are
1% 5% 10%
-3.44 -2.87 -2.57
1% 5% 10%
-2.33 -1.64 -1.28
We have seen that the initial value z0 does affect the finite sample distribu-
tion of Dickey-Fuller case 1 but does not affect case 2 and case 3. The value
of c does affect the distribution of case 3: as c becomes smaller the distribu-
tion converges to the distribution of case 2. IMC has provided 10 pairs, the
range of ĉ of these 10 pairs is (−0.01, 0.1). The absolute initial value z0 of
the 10 pairs is less than 1.5 for 9 of the 10 pairs. With one pair z0 is 106. So
we are interested in the power of the three tests for small values of c and z0 ,
but we will also look at large values of z0 .
90
We start with generating paths with c = 0 and z0 = 0. In all following
tables T = 500, σ = 1 and the number of generated paths is 1,000. Table
5.9 shows the number of rejections for the different tests and different values
of ρ. For ρ = 1 we have simulated paths under the null hypothesis of case
1 and case 2, the number of rejections are in line with what we expected.
The case 3 test does not perform very well, with ρ = 1 it rejects the null
hypothesis that ρ = 1 632 out of 1,000 times on the 10% level. For ρ just
under 1, the case 1 test performs better than the case 2 test.
Table 5.10 shows the number of rejections for generated paths with c = 0 and
z0 = 100. For ρ = 1 we simulated under the null hypothesis of case 1 and
2, the number of rejections for case 1 is small. This was expected because
of figure 5.3. Again, the case 3 test does not perform very well when ρ = 1.
The case 1 and 2 tests do perform well, with ρ slightly less than 1 they reject
the null of an unit root.
91
Table 5.10: Number of rejections, c = 0, z0 = 100.
Case 1 Case 2 Case 3
ρ 1% 5% 10% 1% 5% 10% 1% 5% 10%
0.99 1000 1000 1000 1000 1000 1000 1000 1000 1000
0.995 1000 1000 1000 406 683 805 874 974 995
1 8 26 47 10 52 106 157 461 631
1.01 0 0 0 0 0 0 0 0 0
Table 5.11 shows the number of rejections for c = 0.1 and z0 = 0. For ρ = 1
we have simulated under the null hypothesis of case 3 but the case 3 test still
rejects to many times. Because the null is fulfilled we expect the number
of rejections to be around 10, 50 and 100 for the 1%, 5% and 10% levels
respectively. In figure 5.8 we already saw that the case 3 test is dependent
of the value of c, when c = 0.1 the distribution of the case 3 test statistic
is shifted to the left compared to its asymptotic distribution. We see that
with this setting the case 2 test performs more or less the same as with c = 0
and z0 = 0 except when ρ = 1, in which case it rejects less. The null is not
satisfied for the case 2 test, so this is not a bad outcome. The less rejections
for ρ = 1 the better. The case 1 test performs less compared to case 2 test
as well as the setting c = 0 and z0 = 0.
92
Table 5.12 shows the number of rejections for c = 0.1 and z0 = 100. It
is remarkable how good the case 1 test performs, it rejects almost every time
when ρ is slightly below 1 and does not reject when ρ ≥ 1 even though
the null hypothesis is not satisfied. In section 5.2 was explained that this
test basically fit a line through the origin and because the scatterplot starts
around (100,100) it estimate ρ very accurately which makes the standard
error relatively small. With this setting, we know there is an intercept of 0.1
but this is so small compared to the starting point of 100 that the test does
not overestimate ρ too much. So when we generate path for ρ < 1, ρ̂ − 1 is
negative and divided by the small standard error the test statistic is a large
negative value, so the null is rejected. When generating paths with ρ ≥ 1,
ρ̂ is always slightly above 1, so the test statistic is a large positive value, so
the null is not rejected.
For illustration purposes, table 5.13 shows the the number of rejections for
c = 1 and z0 = 0. The value of c is now much larger than the values of ĉ
for the 10 pairs. We see that case 1 test lost all its power, the case 2 test
performs well and the case 3 test is finally performing as it should when ρ = 1
and is very powerful.
93
Table 5.13: Number of rejections, c = 1, z0 = 0.
Case 1 Case 2 Case 3
ρ 1% 5% 10% 1% 5% 10% 1% 5% 10%
0.9 0 0 0 1000 1000 1000 1000 1000 1000
0.95 0 0 0 1000 1000 1000 1000 1000 1000
0.975 0 0 0 998 1000 1000 1000 1000 1000
0.99 0 0 0 1000 1000 1000 1000 1000 1000
0.995 0 0 0 997 1000 1000 1000 1000 1000
1 0 0 0 1 7 12 13 59 119
1.01 0 0 0 0 0 0 0 0 0
This section clearly indicates that the Dickey-Fuller case 3 test is not the one
we should use when testing pairs for cointegration. Unfortunately it does not
clearly distinguish case 1 and case 2. Case 1 performs better for c = 0, z0 = 0
and c = 0, z0 = 100 and c = 0.1, z0 = 100, but case 2 performs better for
c = 0.1, z0 = 0 which is most seen in the 10 pairs. In the remainder of this
report we will focus on the case 2 test because of Hamilton’s view given in
section 4.3 and because this section does not clearly indicate to do otherwise.
Another possible reason to use case 2 instead of case 1 could be that that
the first step of the Engle-Granger method, which is a linear regression to
estimate α, influences the power of the two tests. This will be considered in
chapter 6.
94
where ut ∼ i.i.d(0, σ 2 ).
ρ̂−1
We are interested in the properties of test statistic S = σ̂ρ̂
in the three cases:
95
We can derive the asymptotic properties in a similar manner as in the pre-
ceding sections. To keep this section from being to tedious, we only derive
the properties for case 2. We state the outcomes for case 1 and case 3 at the
end of this section, the derivations can be found in Hamilton [6].
Proposition
P 2: P∞
Let vt = ∞j=0 θj ut−j , where j=0 j · | θj | < ∞ and {ut } is an i.i.d sequence
2
with mean zero, variance σ , and finite fourth moment. Define
∞
X
2
γj = E(vt vt−j ) = σ θs θs+j , for j = 0, 1, . . . , (5.28)
s=0
∞
X
λ = σ θj , (5.29)
j=0
zt = v1 + v2 + · · · + vt , for t = 1, 2, . . . , T , (5.30)
with z0 = 0. Then
PT P
(i) T −1 t=1 vt vt−j → γj for j = 0, 1, . . .
1
PT (λ2 W (1)2 − γ0 )
2
for j = 0
D 1
(ii) T −1
t=1 zt−1 vt−j → (λ2 W (1)2 − γ0 )
2
+γ0 + · · · + γj−1 for j = 1, 2, . . .
PT D R1
(iii) T −3/2 t=1 zt−1 → λ 0
W (r) dr .
PT D R1
(iv) T −2 2
t=1 zt−1 → λ2 0
W (r)2 dr .
PT D
(v) T −1/2 t=1 vt → λ W (1) .
PT D 1
(vi) T −1 t=1 zt−1 ut → 2
σλ(W (1)2 − 1) .
96
Asymptotic distribution ADF case 2
We assume that the sample is of size T + p, (z−p+1 , z−p+2 , . . . , zT ) and the
model is
Under the null hypothesis of exactly one unit root and the assumption that
zt follows above AR(p) model with c = 0 and ρ = 1, we show that zt behaves
like the variable zt in proposition 2. Because zt is integrated of order one and
vt = ∆zt ,
and all roots of Φ(x) = 0 are outside the unit circle because vt is stationary
and we assume it is causal, like all other autoregressive models in this report.
Then vt has a MA(∞) representation
∞
X
vt = θj ut−j
j=0
which polynomial is
Θ(x) = 1 + θ1 x + θ2 x2 + · · ·
97
All p − 1 roots, which is a finite number of roots, of Φ(x) are outside the unit
circle, so there exists an ε > 0 such that the modus of all roots are larger
than 1 + ε, so Φ(x) 6= 0 for |x| < 1 + ε. Within the radius of convergence
1 + ε, the analytic function Θ(x) is differentiable:
∞
X
0
Θ (x) = jθj xj−1 .
j=1
This shows we can use proposition 2 without making any further assumptions.
The deviation of the OLS estimate β̂ from the true value β is given by
" T #−1 " T #
X X
β̂ − β = xt x0t x t ut . (5.31)
t=1 t=1
98
Like in the derivation of DF case 2 we need a scaling matrix, in this section
we use the following (p + 1) × (p + 1) scaling matrix:
√
T √0 · · · 0 0
0 T ··· 0 0
Y= . .. √
.. . ··· T 0
0 0 ··· 0 T
With multiplying (5.31) by the scaling matrix Y and using (5.7) we get
( " T # )−1 ( " T #)
X X
Y(β̂ − β) = Y−1 xt x0t Y−1 Y−1 xt ut (5.32)
t=1 t=1
−1
P
ConsiderP the matrix Y xt x0t Y−1 . Elements in the upper left (p × p)
block of xt x0t are divided by T , the first p elements of the (p + 1)th row
or (p + 1)th column are divided by T 3/2 and the element at the lower right
corner is divided by T 2 . Moreover,
P P
T −1 vt−i vi−j → γ|i−j| from proposition 2(i) ,
P P
T −1 vt−j → E(vt−j ) = 0 from the law of large numbers ,
−3/2
P P
T zt−1 vt−j → 0 from proposition 2(ii ) ,
P D R
T −3/2 zt−1 → λ W (r) dr from proposition 2(iii ) ,
−2
P 2 D 2
R 2
T zt−1 → λ W (r) dr from proposition 2(iv ) ,
where
γj = E(∆zt ∆zt−j ) ,
λ = σ/(1 − β1 − · · · − βp−1 ) ,
σ 2 = E(u2t ) ,
and the integral sign denotes integration over r from 0 to 1. Thus,
γ0 · · · γp−2 0 0
" T # .. .. .. ..
X . ··· . . .
D
Y−1 xt x0t Y−1 −→ γp−2 · · · γ0 0 0
R
0 ··· 0
R 1 λ R W (r) dr
t=1
0 · · · 0 λ W (r) dr λ2 W (r)2 dr
· ¸
V 0
= ,
0 Q
99
with
γ0 γ1 · · · γp−2
γ1 γ0 · · · γp−3
V = .. .. .. ,
. . ··· .
γp−2 γp−3 · · · γ0
· R ¸
Q = R 1 λ R W (r) dr
. (5.33)
λ W (r) dr λ2 W (r)2 dr
The distribution of the last two elements in (5.34) can be obtained from
statements (v) en (vi) of proposition 2:
· P ¸ · ¸
T −1/2
P u t D σW (1)
→ h2 ∼ 1 . (5.35)
T −1 zt−1 ut 2
σλ(W (1)2 − 1)
This gives that the deviation of the OLS estimate from its true value is
· ¸−1 · ¸ · −1 ¸
D V 0 h1 V h1
Y(β̂ − β) → = . (5.36)
0 Q h2 Q−1 h2
100
The last two elements of β are c and ρ, which are the constant term and
the coefficient on the I(1) regressor, zt−1 . From (5.33),(5.35) and (5.36),
their limiting distribution is given by
· 1/2 ¸· ¸
T 0 ĉ D
→
0 T ρ̂ − 1
· ¸· R ¸−1 · ¸
σ 0 R 1 R W (r)2dr W (1)
1 (5.37)
0 σ/λ W (r) dr W (r) dr 2
(W (1)2 − 1)
where e denotes a p + 1 vector with unity in the last postition and zeros
elsewhere. Multiplying the numerator and the denominator by T results in
T (ρ̂ − 1)
S= P 1/2
.
{rT2 eY( xt x0t )−1 Ye}
But
³X ´−1 n ³X ´ o−1
eY xt x0t Ye = e Y xt x0t Y e
· −1 ¸
D V 0
→ e0 e (5.38)
0 Q−1
1
= nR ¡R ¢2 o .
λ 2 2
W (r) dr − W (r) dr
By (5.37) we have
1
R
D 2
(W (1)2− 1) − W (1) W (r) dr
T (ρ̂ − 1) → (σ/λ) R ¡R ¢2 . (5.39)
W (r)2 dr − W (r) dr
P
Using (5.38) and (5.39) together with rT2 → σ 2 , we finally get
1 2
R
D 2 (W (1) − 1) − W (1) W (r) dr
S → ³R £R ¤2 ´1/2 , (5.40)
2
W (r) dr − W (r) dr
101
which is exactly the same as the asymptotic distribution of the Dickey-Fuller
case 2 test statistic. So the critical values are the same as in table 5.4 in
section 5.3 without making any corrections for the fact that lagged values
of ∆zt are included in the regression. This is also true for the other cases,
Augmented Dickey-Fuller case 1 test statistic has the same asymptotic dis-
tribution as Dickey-Fuller case 1 and ADF case 3 the same as DF case 3.
Like in the preceding sections we can simulate the density of the test statistic
for finite sample sizes, we show the results for the case 2 test when p = 2.
We simulate for different values of σ when T = 500 and β1 = −0.1, and
naturally ρ = 1, c = 0. We took this value for β1 because this value is seen
a few times in the 10 pairs IMC provided. The estimated densities of 5,000
simulated test statistics for σ 2 = 1, 5, 10 are shown in figure 5.11. Also the
asymptotic density we found for the case 2 test, figure 5.4, is plotted with
a dashed line. The different graphs coincide nicely. With this setting, the
’original’ AR model with lagged terms instead of differenced terms is:
zt = 0.9zt−1 + 0.1zt−2 + ut .
1 − 0.9x − 0.1x2 = 0 ,
has roots 1 and −10, so the assumption of exactly one unit root is fulfilled.
..................
............. ..........
...... ........
.......
.
. ....
... ......
...... ....
......
.
0.4 .
.
.
.
...
......
.
......
....
......
.....
.
.
.... .......
....
..
. ......
....
. .....
.....
. ......
......
. ....
.....
...
. ...
.....
. ...
......
. ...
....
. ...
...... ....
0.2 ..
.
...
.
.......
....
.....
....
.....
....... .....
..
. .....
.....
. ....
......
.
.
......
........
.
....
. ........
.
........
. .........
..........
..
.
......... .............
.........
.
. ............
. .
........... ............
...
.. .............
.
...
...
.
.
...........
. .......................
.. .
..... .....................................................................................
0.0 ..........................................................................................
−6 −4 −2 0 2
Figure 5.11: Estimated density of ADF case 2 for different σ 2 , T = 500 and
β1 = −0.1.
102
To see what influence β1 has we also vary its value while keeping σ 2 fixed at
1. The results for β1 = −0.9, −0.5, 0, 0.5, 0.9, 1 are shown in figure 5.12. The
awkward graph corresponds with β1 = 1, the ’original’ AR model is:
zt = 2zt−1 − zt−2 + ut
1 − 2x + x2 = 0
has twice root 1, so there are two unit roots. That is probably why the graph
for β1 = 1 does not look like the other ones. For the other values of β1 the as-
sumption of exactly one unit root is fulfilled. The values of β1 for the 10 pairs
when an AR(2) model is fit on the spread process are in a range of (-0.25,0.1).
.....
.... ....
.................
.......................
........... ........................
..........
. ..............
..
....... ...............
................
........
........
0.4 .
.
.
........
........
...........
............
.........
.........
............. .........
........ .........
.
............ ...........
.........
......... ...........
....
.....
. .........
. .........
........ .........
....
. .......
. ....... ........
...... ....... ............................
. .....................
0.2 .
.
.......
........
........
. . . .... . .. . .... . . . . . . . . . ... . . .....
. . . . . ....... . . . . . . . . . . . . .
.........
.........
............
.......
.......
......
.......
......
........ ......
. . ..
. ......
...... ......
. ......... .....
. . . .
............ .....
........ .......
.. ........... .....
.....
..... ......
.. . . . . .
................ .....
. .
..........
. . . .. .....
. .
.............
.
............
. . .. ..
.....
.....
........... ................. ......
..
.. . . . ..
..
................
.
..
.........
. ... ..
. ..
.
.. .
.......
........
.
........... ................... .........
. ..
...
. . . . ..
..
.. ..
..................
................... .................
0.0 .
..........................................................................
...
..
..
..
..
..
..
..
. ..
...
...
...
...
..
..
....
................................................................................. .....................................
−6 −4 −2 0 2 4
Figure 5.12: Estimated density of ADF case 2 for different β1 , T = 500 and
σ 2 = 1.
103
We also show the results for two higher order models. First p = 3, fig-
ure 5.13 shows the estimated densities for three different settings of β1 and
β2 . We take β2 equal to -0.1 and β1 is -0.2, -0.1 and 0.1 successively, these
are also values seen with the 10 pairs. For these values the autoregressive
polynomial has exactly one unit root and the other roots are outside the
unit circle, so the null hypothesis is satisfied. Also the graph of figure 5.4 is
displayed, again they coincide nicely.
......
............................................
............ .......................
................ ...........
.. .........
........ .........
...... .......
0.4 .
.........
..........
.........
.........
.........
......... .........
. ........... ........
........... .......
.......... ......
. ......
..........
. ......
...... .....
.......
. .....
.... ....
.
.
... ......
.
.
. ......
...
0.2 .
.
.
.
.....
.
.
.
....
.......
.......
......
....
....
.
. ....
......
. ....
....
.
...
. .......
.
.... .......
.
..
. .........
..
.
..... .........
..
.......
. ..........
.
.....
. .........
...
.......
. ...............
.................
...
..
..
.......
. ....................
......... .......................................
0.0 .
....... ................................
...
...
..
...
..
..
..
..
..
..
..
...
..
..
..
..
..
..
..
..
..
..
..
..
..
..
... .........................................................
−6 −4 −2 0 2
Figure 5.13: Estimated density of ADF case 2 for different β1 and β2 , T = 500
and σ 2 = 1.
These three settings also represents most of the 10 pairs and that the null
hypothesis is satisfied was checked with Maple. We see that the densities
show more dispersion and do not coincide with the asymptotic density as
nicely as for the lower order models above.
104
......
.................
........................... ..........
.. . .
.............. ...................
............. .............
....... ..
................
............ .......
0.4 .
....
.
........
....
..
...............
.........
. ......
......
. .......
..... .......
.........
.....
. ..........
......... ..........
.
.
. . .........
....... ...........
..... .... ...
.
.... ......
.. .........
...... .........
0.2 .
.........
.....
.
........ ............
.............
...........
.........
.. ........
. ......... .........
........ ..........
......... .........
.........
.......
.
. ...........
..........
.
.
. ...........
..
..
........ .................
.
.......... .........
.
... ................
...
...
..
...... .............
.
..
.......
. ...................
0.0 .
....... .............................................................................
..
..
..
..
..
.. .................................................................................................
−6 −4 −2 0 2
Figure 5.14: Estimated density of ADF case 2 for different β1 , T = 500 and
σ 2 = 1.
In the next section we look again at all these models to see if the power of
the Augmented Dickey-Fuller case 2 test is influenced by the value of p.
zt = ρzt−1 + β1 ∆zt−1 + ut ,
for we take ut i.i.d standard Gaussian random variables, sample size T = 500
and z0 = 0. Table 5.14 shows the number of rejections for several values of
ρ and β1 . We use values of β1 which are seen with the 10 pairs IMC pro-
vided. When ρ = 1 the null hypothesis is satisfied, the other roots of the
autoregressive polynomial lie outside the unit circle, they are -4, -10 and 10
respectively for β1 = −0.25, −0.1, 0.1. We see that under the null hypothesis
the test behaves as expected. The power is quite similar to the Dickey-Fuller
case 2 test in table 5.9. The power is better for the positive value of β1 .
105
Table 5.14: Number of rejections, p = 2.
β1 = −0.25 β1 = −0.1 β1 = 0.1
ρ 1% 5% 10% 1% 5% 10% 1% 5% 10%
0.9 986 1000 1000 998 1000 1000 1000 1000 1000
0.95 447 845 949 582 919 976 792 980 997
0.975 68 297 491 104 395 589 179 502 708
0.99 18 75 162 13 89 199 28 117 232
0.995 15 70 138 13 66 137 10 74 144
1 9 49 97 7 45 88 9 42 89
1.01 0 1 4 0 1 2 0 1 2
Table 5.15 shows the number of rejections for the AR(3) model:
We use the same values of β1 and β2 as in the previous section: β2 = −0.1 and
β1 = −0.25 − 0.1, 0.1. For these values of β1 and β2 and when ρ = 1 the null
hypothesis of exactly one unit root is satisfied. The table does not indicate
that the power when p = 3 is much less than the power of the test when p = 2.
Lastly, table 5.16 shows the number of rejections for the AR(5) model:
where we used the three parameter settings from the previous section. This
table indicates that the power of the test with p = 5 is less than the power
of the test for smaller values of p. Specially the first setting of parameters
shows that the power of the test is less for p = 5.
106
Table 5.15: Number of rejections, p = 3 and β2 = −0.1.
β1 = −0.2 β1 = −0.1 β1 = 0.1
ρ 1% 5% 10% 1% 5% 10% 1% 5% 10%
0.9 978 998 1000 986 1000 1000 1000 1000 1000
0.95 406 765 900 487 851 944 684 941 984
0.975 63 283 482 86 287 473 129 399 615
0.99 17 83 163 22 89 177 27 113 219
0.995 13 74 134 13 71 136 19 67 128
1 12 51 114 13 62 104 14 58 104
1.01 0 0 2 0 2 3 1 5 6
107
108
Chapter 6
Engle-Granger method
109
6.1 Engle-Granger simulation with random
walks
The Engle-Granger assumes we have two prices processes {xt , yt }Tt=0 where
each individually is integrated of order one, I(1). Then xt and yt are cointe-
grated if there exists a linear combination that is stationary:
xt = xt−1 + ut , (6.1)
with ut i.i.d N (0, σx2 ) variables and x0 an initial value. Then the difference
xt − xt−1 is white noise, so xt ∼ I(1). Now we have xt , we like to generate yt
such that yt − αxt is AR(p) for some α and some p.
In this section we look at a few different settings, but only for p = 1 and find
out whether the distribution and power of the Engle-Granger test statistic
differs from the earlier derived distribution and power of the Dickey-Fuller
case 2 test statistic. First, this is done under the null hypothesis of the DF
case 2 test, this means that there is no constant in the spread process but
the constant is estimated. Second, when there is a small constant present
in the spread process. Last, for p = 1, we generate yt with α0 but do not
regress on a constant to find out whether this is still cointegrated according
to the Engle-Granger method.
110
AR(1) under the null hypothesis of DF case 2
We want the spread process to be an AR(1) process:
yt − αxt = εt = β0 + βεt−1 + ηt ,
for {ηt } we take i.i.d N (0, ση2 ) variables. Then we can generate yt like:
Under the null hypothesis of the Dickey-Fuller case 2 there is a unit root and
no constant in the spread process, β = 1 and β0 = 0. The processes xt and yt
are cointegrated if we take β < 1. Figure 6.1 shows a sample path for x and
y when β = 1 and figure 6.2 for β = 0.5, with both graphs α = 0.8, β0 = 0,
x0 = 25, σx2 = ση2 = 1 and T = 500.
....
......... ......... .
.. .......
... ........... .... ...
....................... . . ........ ...
.
........ ..
125 .
.
..
..
. .
..................... ..... .....
.. ....
.
...
..... .
..
....
.. ..... .
.............
......
.. ............. ....
..
..
........... ......
. .. .
..........
..
..... ............ .
.... ..... ............................................
75 .
...............
..
..
..
..
.
....... ............ .... ..
.
.... ...............
.......
...
... . .
.... ............
.......
..
.
..
......................
.... ... ........ .....
.. .......... ............ ..... ....
.................... .. .. ..... . ............... . ...
........... . ............ ........................................
. ..
............. .................................... ..................
..... .......... .. ...................... ... . ................. ...... .. ....
........ ............................. .. .. .........
.. .............
... ..............................................
.. ..................................... ............. ....
.
.... .. ... .....
.............. ....
25 .....
10
0 100 200 300 400 500
111
To see whether the critical values of Engle-Granger are more or less the
same as for Dickey-Fuller case 2, i.e. to see if estimating α has an affect
on the critical values, we simulate a lot of paths xt and yt under the null
hypothesis and calculate the test statistic S. The procedure is:
Simulate xt .
Calculate α̂.
Then we estimate the density of the simulated test statistics, again with a
Gaussian kernel estimator. This we can compare to the density we found for
the Dickey-Fuller case 2 test in chapter 5.
Figure 6.3 shows the estimated densities for different values of α for T = 500,
x0 = 25, σx2 = ση2 = 1 and β0 = 0. The figure also shows the Dickey-Fuller
case 2 from figure 5.4. Figure 6.4 and 6.5 show estimated densities for dif-
ferent values of σx2 and ση2 respectively, for the same parameters as above
and α = 1. When the null hypothesis is completely satisfied, that is β = 1
and β0 = 0, these densities look a lot like the density for the Dickey-Fuller
case 2 density. So it looks like the preceding step to the DF test, namely
estimating α, does not really affect the critical values. To see if the power of
the test is not affected by the preceding step, table 6.1 shows the number of
rejections for different values of β and α. It is clear from the table that the
power of the Engle-Granger method is not dependent of the value of α. This
table should be compared with the columns of case 2 of table 5.9, because
there is no constant and the initial value of the spread process is 0. We see
112
.........................
...................................
........... .......
.........
. .....
...... .......
0.4 .
........
.
......
.........
.......
....
......
.....
. ........
. ...... .......
......... .........
........ .....
. ........ ......
........
. ............ ........
. ......... .......
......... .............
0.2 .
.
........
......
.
........ ..........
........
........
.......
..
......
. ...........
.....
. .........
..
. .........
.
.. .............
..
.
.......... ...............
...................
. ..
.........
.
.
. .............
. .
..
............
. ................
..
..
..
.
................. ......................
0.0 .
....... ...........................................................
..
..
..
..
.. .. ...............................................................................................................................
−6 −4 −2 0 2
Figure 6.3: AR(1) Estimated density.................for
........
EG test statistic, α = 0.1, 0.5, 1.
. .. .
.... ..........................................
... .......... ...............
... ...... ... ...........
... .......... .............
0.4 .
.
... ...........
.. .........
.. ........
............
............
.........
............
... ......... ..............
.. ........
. .............
.. .........
. ............
..........
. ............
............. ............
..........
. ............
............ ...... .....
. ...... .......
0.2 .
.
.............
............
..............
...... ........
....................
............ ....
. ........... ..............
. .......... .... .............
.... ............
. . . ......... .... .........
... ................ ...............
. . ........... ......................
. .
................. ...... .............
...
..
.. ...........................
.
...
....
..
.
..
.
.
..
.
...................
.
. .............................
.......................................................................................................
0.0 . .
....... .......................................................................................... ...................................
−6 −4 −2 0 2
Figure 6.4: AR(1) Estimated density of.. EG test statistic, σx2 = 0.1, 0.5, 1, 5.
..... .
............................
....................................
........... ...........
. ........ .........
..
... ......
0.4 .
.. ..........
.
.. .........
.......
........
........
........
... .......... ..........
... ....... ..........
... ....... ...........
.. .....
. .........
.......... ...........
........... .............
.......... ............
. .............
...........
0.2 .
.
.
..........
.............
.............
..............
.................
........ ... ........
. ........
. ... .........
. .... ........
............ .... .........
.. .......... .... ..........
.
........ ..................
. . .
. .
..
. .....................
.. .
.
................... .....................
...............
...
..
.
.................... .
. ..................
....... . .................................. .....
................................................................................................
0.0 ........... .............................................................
−6 −4 −2 0 2
Figure 6.5: AR(1) Estimated density of EG test statistic, ση2 = 0.1, 0.5, 1, 5.
that there is practically no difference between these columns and table 6.1,
which indicates that the power of the Engle-Granger method is as good as
that of the Dickey-Fuller test. The estimation of α does not have a negative
influence on the power of the test, which is a nice property.
113
Table 6.1: Number of rejections, AR(1) and β0 = 0
α=1 α = 0.5 α = 0.1
β 1% 5% 10% 1% 5% 10% 1% 5% 10%
0.9 1000 1000 1000 1000 1000 1000 1000 1000 1000
0.95 722 975 1000 724 964 993 717 961 993
0.975 169 485 686 164 466 683 145 469 691
0.99 29 133 248 38 123 242 36 143 254
0.995 14 67 149 17 75 161 18 87 163
1 14 52 105 8 51 109 10 52 115
1.01 0 0 3 1 1 2 1 4 6
114
power of the Dickey-Fuller case 2 test with a small constant, as seen in table
5.11. So it looks like the Engle-Granger test statistic has the same properties
as the Dickey-Fuller case 2 test statistic, for small constants.
......
. ...
... ..
...
..
. ..
.
...
.. ..
..
0.4 .
... . ..
...
..
..... ..............
...
.
.. .
.... ............ . . .
. .
.. .. .......
. .. . . ..... . .
.. ... . .... ..... .
... .... .
....
.... .
.
... . .. .... .
. ... .... .
... ..
. . .... .
... . . ....
.. .
. ... .
.
.. . . . .... .
... . ...
. . . ... . .
. . .. ...
... .. .
0.2 .
....
..
..
.
.
...
.
.
. .
.
...
....
...
....
.
.
.
. .. . .... ....
.
. .
. . .... .
.. .... .
... . .
. .... . . ...
. . . ...... .
.. ..... . .... .
.... .
. .
..
. .. .
. .
. . ....
... ....... . ...... ....... .
.. ..
.. . . ..
...... .
.
..... ........ . ......
. .........
.
.
.
.. .
...
. . ....... ........ .
........... . ......... . .
.............. . . . ....... .
..
.
..
. ...... . . ....... ....... ............................................. .
................................................................
0.0 .... ....... ....... .........................................
−6 −4 −2 0 2
Figure 6.6: Estimated density for EG test statistic, β0 = 0.1.
β 1% 5% 10%
0.9 1000 1000 1000
0.95 733 969 993
0.975 144 444 650
0.99 33 120 240
0.995 25 86 170
1 6 31 62
1.01 0 0 0
115
processes with a trend in their spread process, do not form a good pair for
our trading strategy. But for a small value of α0 , there is not a big trend and
the price processes form a good pair as seen in figure 4.18. It is interesting
to see how the Engle-Granger method performs if there is a small α0 but
it is neglected. We can generate cointegrated data where the cointegrating
relation has a constant. We generate yt with:
From this equation we can see that with this generating scheme including
α0 is a bit lame, including α0 is the same as including a larger value of β0 .
We have already seen what happens for larger values of β0 in figure 6.6. But
at last, we now have found a reason to use Dickey-Fuller case 2 instead of
case 1! The power of case 1 is practically zero when there is a constant, see
table 5.13. Table 6.3 shows this is also true when we perform the preceding
step of estimating α. The table shows the number of rejections when we use
case 1 in the Engle-Granger method and the ’normal’ Engle-Granger which
uses the case 2 test. When we use the DF case 1 test in the Engle-Granger
method instead of the case 2 test, the power is almost zero. The paths were
generated with x0 = 25, T = 500, α = 1, σx = 1, ση = 1 and β0 = 0. For
the value of α0 used to make table 6.3 and β < 1, we do see xt and yt as
a good pair, so we like the Engle-Granger method to see them as cointegrated.
116
When performing the Engle-Granger test on real data we do not know if
there is a small constant in the cointegrating relation, so from now on we
only look at the DF case 2 test. Because we use the DF case 2 test within
the Engle-Granger method, this method makes the following assumptions:
So far, we have seen that when all assumptions of the Engle-Granger method
are fulfilled, the Engle-Granger test statistic has the same distribution and
power properties as the DF case 2 test statistic. In other words the first step
of estimating α does not have an influence. We have seen that when there
is a constant in the spread process, so not all assumptions are fulfilled, the
distribution makes a limited shift to the right. With limited we mean that
the Engle-Granger statistic does not converge to the DF case 3 statistic, like
the DF case 2 statistic does when there is an increasing constant. Last, we
have seen when there is a constant in the cointegrating relation it is better
to use the DF case 2 test within the Engle-Granger method instead of the
DF case 1 test. In the next section we examine what happens when the price
processes xt and yt are not strictly integrated of order 1.
The approach for simulating price processes xt and yt is the same as in the
preceding section, only the paths for xt are simulated with the stock price
model instead of random walks:
p
xt = xt−1 + µ δt xt−1 + σ δt ut xt−1 , (6.4)
117
where ut are i.i.d N (0, 1). Then xt is not exactly integrated of order 1, there
is an upward drift µ so the expectation of the differences is not constant.
We look at small values of µ and for a for a finite sample size T = 500, so
xt is almost integrated of order 1. By simulating a lot of paths for xt and
corresponding yt we are going to see if this effects the Engle-Granger method.
We again simulate yt such that the spread process is AR(p) and to fulfill
the remaining assumption of the method, we do not include a constant β0 in
the spread process. For p = 1 the results of the simulations are the same as
in figure 6.3 through 6.5 and table 6.1, that is why they are not displayed. It
looks like the Engle-Granger method is not sensitive for xt not being exactly
integrated of order 1. In this section we consider the situation when the
spread process is an AR(2) process.
for nt we take i.i.d. N (0, ση2 ) variables. Then we can generate yt like:
118
...............
........................................
. ..
.
.
.............. ........ ..............
.. ... . .. ...
........ ... .... ..
.......... ... ...... ...
......... .... ... ...
... ..... ...
0.4 .
.
....... ...
..... ...
..... ...
......... ...
... ... ...
......... ...
.......... ... ........ ...
...... ... ...... ...
....... ...
. ...... ...
. ....... ... ...... ...
...... ... ....... ...
.
........ ... ........ ...
. ...... ...
...... ... ......... ..
. ........ ... ........ ...
....... ... .... .....
....... ... ... ....
. .. .....
....... ...
. ... ..
...... ... .........
0.2 .
.
.
.... ...
..... ...
... ....
...... ....
............
..... ...
. ..........
..... ...
. ...........
...... ... ..........
. .... ....
.......... ........
............
.
.........
............. ........
.
.......... ........
. . ..........
. . .................. ........
...........
.................... ...............
....
............ ..............
. . .
................. . .............
. . . . ... ...............
.. ... ....
..
..
.............................
.
. ..................... .
0.0 . .
....... ............................................................................................... .....................................................................................................................
−4 −2 0 2
Figure 6.7: AR(2) Estimated density for EG test statistic , α = 0.1, 0.5, 1.
Table 6.4 shows the number of rejections for three different values of β1 .
Compared to table 5.14, which shows the corresponding power of the Dickey-
Fuller case 2 test, the power of the Engle-Granger method has not declined. It
seems that the Engle-Granger performs the same for data that is not exactly
integrated of order one, as for data that is.
119
6.3 Engle-Granger with bootstrapping from
real data
So far we have simulated paths xt and yt from scratch to find the critical
values of the Engle-Granger method. In this section we build paths xt and
yt by bootstrapping from real data. The data are the ten pairs of stocks that
IMC provided. First we describe the bootstrap procedure and then we look
at some results of the ten pairs.
Bootstrap procedure
Assume we have a pair that consists of two stock price processes xt and yt ,
for t = 0, . . . , T , which are integrated of order one. Let us assume further
that there exists an α such that yt − αxt follows an AR(p) process:
H0 : β = 1 against H1 : β < 0 .
The first step in the bootstrap procedure is to estimate α with OLS, which
results in α̂. Then we can calculate the spread process:
et = yt − α̂xt , t = 0, . . . , T,
this resembles the true spread process εt which is assumed to follow an AR(p)
process.
In the preceding sections we knew the value of p but now do not, since
we are working with real data. The second step is to estimate p with the
information criteria described in chapter 3, which results in p̂.
The third step is to estimate the coefficients of the AR(p̂) model with linear
regression, which results in β̂, β̂0 , β̂1 , . . . , β̂p̂−1 . Then we can calculate the
residuals:
120
this resembles the true residuals ηt which are assumed to be white noise.
The fourth step is to calculate the test statistic for the real data
β̂ − 1
S= ,
σ̂β̂
Now we are ready to build a new path yt∗ that belongs to the original xt .
This is done in the following way:
with ηt∗ is taking uniform out of nt . We initialize the new path by:
ε∗i = yi − α̂xi , i = 0, . . . , p̂ − 1.
We treat the new pair {xt , yt∗ } the same way as with the original pair {xt , yt }.
That is, we calculate α̂∗ and spread process e∗t = yt∗ −α̂∗ xt which should follow
an AR(p̂) process. Then we estimate the coefficients of this AR(p̂) process
and calculate the test statistic S ∗ :
β̂ ∗ − 1
S∗ = .
σ̂β̂ ∗
By building a lot of new paths yt∗ and calculating the corresponding test
statistic S ∗ , we can calculate the density of these bootstrapped test statis-
tics. Then we can see if the test statistic of the real pair is exceptional. The
estimated density should also give an indication for the critical values of the
Engle-Granger method.
Results
The ten provided pairs are named pair I, pair II,..., pair X. We start with a
pair for which all three information criteria indicate that the spread process
is AR(1), pair II. By spread process we mean the residuals from the first
121
regression, et . The two stocks used are the same stock but listed on different
exchanges. The spread process is shown in figure 6.8. This is not necessarily
the spread we trade, in chapter 2 we discussed the adjustment parameter κ
which can result in a different spread. With pair II we will find κ = 0, so
the spreads for this pair look the same. In pair trading, this is as good as it
gets: we have a large number of trades, we never have a position for a long
time and the risk of the two stocks walking away from each other is minimal
because they are in fact the same.
−0.1
The spread series look stationary and according to the Engle-Granger method
the two stocks are cointegrated. The test statistic is -17.5, compared to the
1% critical value which is -3.44, we see that the null hypothesis of no coin-
tegration is rejected. Applying the bootstrap procedure on this data set, we
get figure 6.9. The dashed line is the density of the Dickey-Fuller case 2
test statistic. This figure does not give an indication that the density of the
Engle-Granger test statistic differs from the Dickey-Fuller case 2 statistic.
122
.............
... ...
... ...
..
. ..
.......... .... .....
..
. . ...
....
. . .
.
.... .... ....
. ...
....
0.4 . .
.
.....
..
.... ....
. ...
... ....
....... .. ..
.. .
.......
......
.... .....
. ... ..
. ...
..... ...
...
. ... ......
....... .....
0.2 ......
.
. ..
.
.. .....
......
...
... ...
... ... ... ..
. ... ... .
. ...
. .... ....
...... ....
..... ...
....... .........
......... .....
...........
.
..
..
....
. .......
.
...
...
....... ................
................................
....
0.0 ....... ..................................................................................... ............................................................. .......
−6 −4 −2 0 2
Figure 6.9: Estimated density for EG test statistic by bootstrapping from
pair II.
Let us consider a pair for which all information criteria say that the spread
process is AR(2), pair VII. The spread process is shown in figure 6.10. It
does not look as good as figure 6.8, but this still is a good pair. According to
the Engle-Granger method the stocks in this pair are cointegrated, the test
statistic is -4.65. The bootstrap procedure results in figure 6.11. The esti-
mated density coincides with the density of the Dickey-Fuller case 2 statistic.
...
.....
........ .
...........
... .......
1 ..
...
....
.....
.
...
....
.............. .......
..
.
....
.
. .. .....
.. ... ....
.. .
..... .. . .. .. .. . ...... ...... .............. .... .
..................... .. ... ... ..... ............... ....... ........... .. .......
.. ............ .. .. ... ... .... ...... . ..... .... .... ........ ..... ....... .. .....
............. . ....... ........ .....
. ........ .......... ...... ............. ..........
. ..
..... .... .. ..... ......
. . . .... .................. .......... ....
. ..... .....
.
.. .
... ..... . ..... . . .
.. ............ .. .... .. .
.
.. . .
....... . .. . . . . .
.. . . ..
............... .. .
........... .... ..
...... .... .... .................. .. .. ..
.. ... ..................... ..... .... ....... ....... .......... .. ....... ...... ......... .. ...... .. .
....... ... ... ....
0 ... . . ... ......... ........... .......... ........
.................... . ... ..... .....
...... ......
... ...........
.. ... ..
. .. .. .... ........... ..
........ ...... .... .. ....
...
...
.... ............... ... .......... ...........
..... ............. ............. ............ ........
.................... ... ........ ...
................ . .. .... ...
............ . . .................. ................. ......... ....... ....... ..... ............. ... .... ... .... ..
.... ...... .... .... ... . . .
. .
. .
........ ......... ...... ...
. .
.....
.. ............... ............ ... . ........
..... . ...... .. ... .. ..................... ..........
........... ... .. .. .. ... .... ................. .....
... ..... ... .... .... .. .
. .. .....
. ..... ............... . . ..
−1 . ...... .
.
...
.
... ..
.. .
....
123
..............
... ..........
... ........
...
... ....
...
.....
0.4 .
....
.
...
.....
.....
.....
..
...
. ...
... ...
......
. ......
... ....
....
. .....
.... ......
. ...
....
. ......
....
. .....
..
0.2 .
.
..
.
.
...
.
....
......
....
.....
.
. ......
...
. ...
...
. .......
..
. ......
. .... .
.
....
. ........
..
.
. ......
.. .....
...
.... .......
...
...
.... .........
..............
.....
0.0 ................................................................................. ..................................................... .......
−6 −4 −2 0 2
Figure 6.11: Estimated density for EG test statistic by bootstrapping from
pair VII.
Let us consider a pair for which all information criteria indicate that the
spread process is AR(3), pair VI. The spread process is shown in figure 6.12.
This looks a lot less interesting than the previous figure: initially the spread
is below zero for a long time and at the end the spread is above zero for a
long time. This shows that trading the spread would have resulted in only a
few trades and we would have had the same position for a long time. But this
is not necessarily the spread we trade as stated, in the next chapter we will
see the spread we would have really traded. According to the Engle-Granger
method the stocks in this pair are not cointegrated, the test statistic is -
2.23. The null hypothesis of no cointegration is not even rejected at the 10%
level. The bootstrap procedure results in figure 6.13. The estimated density
is a bit bumpy but still coincides with the density of the Dickey-Fuller case
2 statistic. Even when the real data is not cointegrated, according to the
Engle-Granger method, the bootstrap procedure finds nearly the same den-
sity as the density of the Dickey-Fuller test statistic.
So far we have seen pairs for which all information criteria find the same
small value of p. IMC also provided a pair for which the information criteria
find p to be very large, pair V. As described in chapter 3 we fit an AR(k)
model, for k = 1, . . . , K, on the data and see for which k the criteria have to
lowest values. For this pair, even if we set K = 100 the criteria have to lowest
value for p = K. This indicates that the spread process does not follow an
AR(p) model. The spread process is shown in figure 6.14. It is obvious that
124
...
.....
.......
.... ......
2 .. ..
....... .. ..........
........
.
. .
.. ....... ....
.
.
. ...........
.
.....
... ..................... ... ....................
. .............. ...... .. .... .......... ...
....... ............................ ........................ ......
. .
.. .
. .
. ........................
.
. . .
..............................
.
.... ...... .. .... . . . ... . . . . ........ .. .
.. . .
. ...... ........ ... . .................................. ....... ..... ..
........... .. ... .... . . .. ........... ....
..... ...... . ... .
. ......................
. . .
.
0 . ...........
...
.
............... . . ........ ....
.. . ....
.
.
........... ......
.
.. .
.. .. .
..
....
.
. .
.
................
.
.
.
.
.
.
................. .
. .
.. .
.
............................ .... ........ ...... .........
.
. .
.. . . . . . . .
.
.
.
......... .
..................... .......... ......................... ...... ...................... ......... ........ ..... .......... ........ .... ...... . .... . ........ ...................... ...................... .......
....... ... ............................ .......... ....... ... .................. ... .. ... ... .. . ......... . ... ............................... .. ..... . ...... .. .
...... ..... .... ...... ..... ............. ... ....... ... ........ ..... ................. ........ .... .... .......... .....
...... .........
.
...... .......
. . . . ............. ... .. ... ...
..... .... ...... .
..
.
−2 ...
..
.
....
.. ..
........... ..........
..... ..
....... ...
...
.... ....
0.4 .
.
...
.
......
... ......
....
.....
.. ......
. ...
....... .......
... . ... .
... . ... ..
.. ...
. ... ...
.. .
. ... .
... .... ... ...
.. ... .
0.2 .
.
.
.. ....
.. .
....
... ..
... ...
... .
. ... ...
....
. .... .
.. .... ..
.......
.
.......
.....
....
. ......
.....
........
. .....
......
. ............
.
. . .
......... ........
..... ............
............. ......................
0.0 .
....... ..............................................
...
..
..
... ....................... ....... .......
−6 −4 −2 0 2
Figure 6.13: Estimated density for EG test statistic by bootstrapping from
pair VI.
this ’pair’ is not suitable for pair trading. The Engle-Granger method does
not reject the null hypothesis of no cointegration because the test statistic is
-1.04 when p = 10 and 0.63 if p = 100. To apply to bootstrap procedure, we
set p = 10. The result is shown in figure 6.15, which coincides surprisingly
well with the density of the Dickey-Fuller case 2 test statistic.
125
....
.........
... ....
...... .... ...
200 ......
.....
...
....
.
.....
.... .......... ... .. .. ...
................ ..... ......... .... ..
100 ... .. ..
............ ...
.. . ... ...
.........
.
. .. .
. ..
.
.
. .. ......... .... .......................
...........
.
.
............ . ....... ... ...... ...... .........
..... .. ... ... . .........
... .... ......
... . ............
.
... .. ......
...... .......
0 ...
....
.... .
.. ........................ ... .
.
. ...
.... .
....... ... .... ... ......
..... .. ............ ... ... ...
.
.
...
..
............. ..... .. ..... ...... ... ...... ......... .. ....... ....
. ............ ....
... ........... ................ .
......
.............. .............. .
. .
.... ............. .... ..... .... .......... ......................
.
.
...... . ..... .. ........ ............. ..... .......... ............... ............ ............... ............ ......
−100 . .... ......
... ..
............
......
........... ... ............ ........ ...
.... ... ..........
..... ..... ..
.. ..
.
..
.......................
. ....
..... ...
..... ...
.... ...
0.4 ..
.
.
.......
.
.
...
...
....
......
...
. ...
..
. .....
... ......
... ....
... ......
.. ....
. .....
...
. ......
.... ...
0.2 .
...
.
...
.
.
...
......
....
....
.....
....
. ......
....
...
. .......
.....
. ......
..
... ......
.
........
. ..........
.....
.....
. ............
. .
....... . ........
.... ..............
..
..
..
..... ...................................
0.0 ..
.
....... ................................................................................. ..
...
.. .............................................. .......
−6 −4 −2 0 2
Figure 6.15: Estimated density for EG test statistic by bootstrapping from
pair V.
126
We will generate data such that the difference process zt follows an MA(2)
model:
· ¸
xt − xt−1
= zt = Θ2 wt−2 + Θ1 wt−1 + Θ0 wt ,
yt − yt−1
127
Table 6.5: Number of rejections, Σ = cI.
c 1% 5% 10% p̄
2 47 89 147 9.9
1 25 77 125 9.6
0.5 24 61 116 9.1
0.1 11 49 102 7.6
Consider the situation when the innovations are correlated, we take Σ of the
form: · ¸
1 ρ
A= .
ρ 1
Table 6.6 shows the number of rejections of the Engle-Granger test for differ-
ent values of ρ. Even for ρ = 1 the Engle-Granger method does not perform
well.
To see what happens, figure 6.16 shows the spread process of a realization
xt and yt . This does not look stationary, there seems to be a trend in the
spread process. This could mean that with this setting there a is a constant
in the cointegrating relation, α0 . Figure 6.17 shows the spread process if we
regress the same realization of yt on the same xt and a constant.
128
... .
. ........
......................
. .........
..........
250 ...
..
.....
.
...
... ...
.
.... . .
. . ..... .. ..
...... ....... .... ........ .... ......... .......... ...
............ .. .............................. ...... ... ...... ...
200 .......... ..... ..... ................ .............. ........
.... .... .. ........ ....
....
... . ..... .. ...
..... ........
... ... .......
.
.
....... ....
.
.... . . .
.
..... ...........................
. .. ..
150 .... ......
......... .. .
..
. ....
.
. .
.. .......... ....... ..
........ ......... . ........
.. .....
. ......
..... . .. .... .
.
... .. .......... . . ............ ......
...................................
100 .
.
..
. . ....
.
... ... . . .. . .
.
..... .......... ........ ..........
............ .... .
.
.
..
.
....
. ....
.
... ..
.
.
........
.
.
...
.
.. .. . ...... ........... .. ................ ... . ......
... .. ...... .... ... .... .... .......
.... ... ........ ........... ............. .. ........
.... .........
. .... ... ..............
.
..
. .
....
... ...... ..
............ ........... ...... ....... ... ...
50 .. ... ..
........ .. .. ...
... .. ..
.. .....
.. ... ...
............. ...
. .. ...
... .. ...
......
..
..
.. . .. ........ ...
... .... .. ..
........................ ..... .....
...
...... .... ...
..... .
0 ....
.
...
...
10 ....
..
...
......
.... . .
.
...... .. ..
.. ..... ... ..... ... ... ..... ... .. ..
..... .
...... ... .... ............... ........ ... . .... ....... ...
. ...
. ......
. ... .. .......... ..... ...
.......... ... . .. ....... ............... ....... .. ..... ... . .... ... .... ....... ...
. . .
........ ...... .... ... ... ... .................. ... .. ..... ....... . ...
5 ......... ... ... ... .........
..... ... .. ... .. ..........
................... ...... ... ............. ... ... .......... ... .... ...... ........ . ......
............ .... .. ........... ... .. .. ........... .... ..... .... . ....... .. .....
.
......... ......................... ................ ... ...... . .................. ...... ...... ............... . ...... .......... ... ........................ ........ ... ...... ... .......... . ....... ............ . .
.
....... ........
..... ....... ..
.. ........ ....... .. .. . .... .
.
.
.
.
......
... .
.
.
.
.... .... .... ................... ...... ... . ......... ...... ...
... ..... ..... ........... .. ........ ... ... ... ...... ........... .. .... ..
. . .
..... .. ..... .............. .. ........ .......... ..... ............ .... ..... ..
. . .
..
.
......... ....................................................... ............... .................. ......... ....... ............ .... .................. ...... ........................ ..................... ...... ... ............ ... ......... ................... .......... ................ ................ ........ ...... ................... ...... .................... ........ ........................... ......................... .. ......................................... ............................................... ............ ....
........ ................................................. ............ ................ ................ ....... ......... ........ ........................... ......... ............. ......... ...................... ...... ... ........... ..... ...................................... ...................... . .......... ...... ......... ............... ..... ........................... ................ ................ .................................. ... .......................... .................
.. ... . .. . . . . . . . . . . . . . . . . . . . .. . . . .
.
.................................................... .......................................... ....... ................ ........................ ........... ........ ......... ........................... .. .................. ................................................... .................. ........ ............... ...... ...... ............ ... .. ....... ........... ..................... .... ... .. ...................... ...................
. . . . . . . . . . . . . . . . .. .. . . . . .. .
0 ............................................ ......... ...... ......................... .................... .... .... ...... ..................... .............................. ............... ......... .......... ...... ................................ ...... .................. ............................................................. ........................... .............................. ......................... ...... ...... ... ... . . ............. ........................... ...... ... ... ......... ...... ... .....................
................. ...................... ......... .... ...................... ....... .... .......... ...... ........ .................... ....... ......... ... ................ ......... ............. ................................................... ............ ..... ......... ................ ....... .................. ........ .......................... ........................ ..... ...... ... ...
.............. ................... .......... .. .................. ...... ... . ... .... ......... ...... ..... . ............ ................. ........ ........ ............................................. ...... ............... ..... ........ .............. ................ ... ........... ...... ... ..... ... ................. ...... ..... .......
........ ................... ..... .... ............. .... .. .............
...... ...... .............. ...... ...... ...... ...... ...... ... ........
.......... ............. ....... ............ .... .. .... ........ .... ......... ................ .... .... ................. .......... .. .... .......... .. .... ............ ..... .. . ......... .... ..... .. ........ .... .... ...... ... .... ............ .... ..... .... ... ......... .......
....... ......... ..... ........ ...... ...... ... ...... ............ ... ............ ............ ... ...... ...... ............ ......... .......... . ...... ... ...... ...... ............ ...... ... ... ...... ...... ............ ............ ..... ... .......... ... ...... ........... ... ...... ... ... ........... ......
.. ...... ... .. .... ..... .. . ..... ........ ... ......... ..... ... .. ..... ..... ......... ........ ..... ..... . ... ...... ... ..... .. ........... ......... ... ........ .. ..... ..... .. ... ....... ... . ........ ..
−5 ... ...
... ...
... ...
.
.
.
. ..... .. ..... ...... ... .. ..... ...
.
... .. ..... .... .. . ..
... ... ..... ... ..
.
. .
..
.
.
.
...
.. .... .. ... ...
.
....
.
.
....
.
...
.... .
...
...
. . ...
.
.
.
.
......
.....
...
. . .
.
. .
.
.... .... . ..
. ..
.
........ ......... ... ......
. .
. .... .... . ... ... ..
... .. .. ..
... ... .. ..
......
129
So far, we have generated cointegrated data but not in the way we want
it to be cointegrated, which is data with a small or no constant α0 . We look
at a different setting of parameters:
· ¸ · ¸ · ¸
1 −1 −1 2 1 0
Θ2 = , Θ1 = , Θ0 = .
−1 0 2 0 0 1
The matrix (Θ2 + Θ1 + Θ0 ) has eigenvalue zero with eigenvector [−1 1]0 .
Figure 6.18 shows a realization of the spread process, when yt is only re-
gressed on xt and not on a constant. In other words, we neglect a possible
α0 .
10
. ..
... ...
... ... . ..
.. .. ..
.... . .
. .
. .
.
.. .. ... ... ... . .. ... ... ... ...
..
...
... ... ... ... ... . ... ... ... .. ... .. ... ... .. .. .... ..
5 ...
...
......
.......... .. .......
..
.. ...
........
....
..........
... .....
... ................... ..... ...... .
.... ... ...
...... ....... . .. ... .. .............. ... ... .. ............ ......... ... .. . ...
... ... ... .....
.... ... .
.
... .. ... ... ... ... . ..... ...
.. . ... ... ... ... ... . ... ...... ............ ...
.
.. .. .......... ... .... ...... .... ...... .. ........ . .... .. .... . ...... ... . .. .. .. ...... .. ... .. ... .......... . ..... .. .. .. .. .... .. ... .. ... ... .. .. .. .... ..... ...
.. .. ......... ... ......... .. .... .... .. . .... ... .... ........ . .... .. .... ............ .. ........ ... ... .. ... ... ... ...... . ... ......... .. .... .......
..... ......... ...................... .. ..... .................. ...... ..... ...... .. .......... . . ................................... .................. .. ..... ......... ................. ..... ....... ..... ................ ..... .. ..... ....................... .................. ..... .. ........ ... ............................ ... ....... ............. ... ........ ..................
. ........ ............... ............................ ..................................... .......................... ........................ ................... .............................................................................. ............ ............................................. ............ .......................... ...... ...... ................................................ ................... ............................ ........................................................................... ................... ...... .................... ...........................
............................................................................................................................................................. .................................................................................................................................................................................................................................... ..................................................................................................................................... .................................................................................................................................................................................................................................. ..............................................................................................................................................................................................................................................
....................................... .............................................................................................. .................. ..... ................................................ .................................................................. ... ........................................ .....................................................................................................
...... ....... ................................. ........... ......... .................................................................................... .... .. ....................... .............................................................. .... ........................................................................... ... .............................. ......................................................................... .................................................
........ ......... .......... .................. ............ .......... ........................ ...................... ... ..... ............................ .... .................... ..... .................... ............................................... ....... ...... ............................................................................. ... ......................................... ........................................................... ....................................... ........ ....................... ............. ..................... ....
........ ........ .. .... ........ .......... ............. .............. ... ........ ... ....................... ..... ..... ............................. .... ................... . .......... ........ ......... ..... .................. ... ........... ......... ... .............. ........................................ .........................
0 .. ... .. .. .. .. ......... ....
... ... .... ..
.....
.... ... ...... .........
... .. .........
.........
......
. .. .. .... .... .. ..
.. .. .. .. ..
. .. ... ...
.... ....... .. . ...... ........
... ... ... ..... ... ....
.................... ... .......... ..... ..... ......... ........ .. ........... ....... . .. .... ...
............... ... ....... .. ... ......... ..... ....... ....... . ... ...
........ ... .. ... ... ... ..... . ... ... ... ..
... .... . . . .
. .... . . .
. .
. .
. .
. .
. .
. ..... .. . .. .. ...... ... ...
... ... ... ... ... . .. . .. . ... .. ... ..... .. .. .... .. ...
... ... ... . ... ... ...... ... ...
... ... .. .. . ... . .
...
... ... . .
... ..
−5
0 100 200 300 400 500
Figure 6.18: Realization spread setting 2, ρ = 0.5.
The Engle-Granger method performs very well, even when the spread process
does not follow an AR(p) model, the test behaves exactly how we want it. If
there is a large constant α0 in the cointegrating relation, it does not reject
the null hypothesis of no cointegration. Although the data is cointegrated, it
is not cointegrated the way we want it, that is with a small or no α0 . If there
is a small or no constant in the cointegration relation, the test has rejected
the null hypothesis almost every time.
130
Table 6.8: Number of rejections, setting 2 Σ = cI.
c 1% 5% 10% p̄
2 946 980 992 4.83
1 974 988 997 4.75
0.5 981 992 995 4.60
0.1 995 998 999 3.95
131
132
Chapter 7
Results
In this chapter the results for the ten pairs IMC provided are discussed. To
be clear, IMC provided 2 years of historical closing prices for each stock of
the ten pairs. According to IMC, among these ten are some very good pairs
which means they make high profits. Some are losing money and some are
mediocre. In the first section we apply the trading strategy to the historical
data to see which pairs would have been profitable and put the pairs in order
of profitability. We like to see if the stocks in a profitable pair are cointe-
grated, and if the stocks in a pair that loses money are not cointegrated. In
other words if profitable and cointegration coincides. We apply two different
cointegration test, the Engle-Granger and the Johansen method, but first we
will examine in the second section if the assumption of the price processes
being integrated of order 1 is fulfilled. In the third and fourth section the re-
sults for respectively the Engle-Granger and the Johansen method are stated,
the pairs are put in order of the levels of rejection of the cointegration tests.
133
threshold Γ, as explained in section 2.3. The results/profits are shown in ta-
ble 7.1. The traded spread processes of the 10 pairs are shown in figure 7.1,
these are the spreads with the adjustment ratio if present. The upper left
corner is the spread for pair I, upper right corner for pair II, and so on. To be
clear, the spread of the second half of observations is displayed and this is the
spread which is traded. The dashed lines are the corresponding thresholds Γ.
Even the highest profit may look a bit small, but recall that we do not have
to invest a lot of money. On the other hand to loose the same amount as the
highest profit, the two stocks have to walk 50% away from each other in the
wrong direction, which has little chance of occurring. Profits above ¿ 1,000
are considered good enough to trade, profits below ¿ 1,000 are considered not
to be worthwhile. But profit is not the only criteria, the number of trades is
also important. Obviously, the more trades the higher the profit. But this is
not the only reason, in chapter 2 was explained that traders do not want a
position for a long time because that involves risk and the number of trades
is an indication for this. According to IMC, pair IV is still a good one. We
get exactly the same selection of good and bad pairs as IMC if we set the
minimal amount of trades equal to 7. IMC already decided which of the 10
pairs is a good one and which is not based on trading experiences, before
providing the data. A pair is considered good enough to trade if the profit is
134
..
. ....
....... ......
5 ..... ........
.. ..........
.........
.... ...
.
...
... .. ......
... ....
......
. ..........
0.1 ....
..
............. ....... ....... ........ ....... ....... .......... ........... ........ ....... ........... ................ ....... .. . ...
.............. .
.... .. ... .... .... .. ......... ....... .. ..... . ... . .. .....
.............. ..
......... ........... . ........ .... ... ... ...... ........... ... ......... . ..... .............. ........................ ............... .. ............ ..................... ....... ..
. ......
.
.... .......... .... .. .... .... ..
....... .... ..... ........... .......... .......... .... .......... ............ .... .............. ................ ............................. .................................. .......... ....... ... .......... ....... ....... ....... .............. ..
. ... .. .. .... ....... ..... .... .......... ... ... ............... ........ ... ................. .. ............. .......................... ......................... ............. ... .................................... .... .......... . . ....................... . .. ........ . ... ......
0 . ...... .......... ....... ....
......... .. ........ ..
..... ...... .....................
..................... ..... ........
.
..
.
.
....
..
. .
. .
.
................. ............ ...... .........
.... ............ ..... ...... ... ..
. ... ....
.............
.... .......
.... ... .
.. .
0.0 ........................ ...... .............. .............. ........ ....................... ......... .. ............................ ........ ... ................. ... ... . ...............
.
.. .........
. .
................ ............ ....... .......................................... ............................ ....................... ............................................................... .................................................................................................................................
. ..
. . ..... ..... ... .. .. ......... ............................. ...... ....... ...................................... .....................................................
....... . ............ ... ... ... .... .. ........ ....... ...... ... .... ...... .. ........... ............................. ....
. .
....... ....... ....... ............ .......................... ....... ....... ....... ....... ....... ....... ....... .. .. ... .... . ...... . .. ..... ....... .......... ...... ......................... ...
.. ... .. ... ... ...... .... ...... . .. ..
... .. . . ..... ..... ........ .... ...
........
...... −0.1 . ... ..
−5 ....
....
......
300 ...
................
400 500
300 .. ......
.. ......
......... ...... ........
400 500 .............
......... .... .
........... ........ .... .. ......
1 ....
... ....... ... .......
... ...
.....
..
2 ... ...
.. .....
... ..
.... ....
.......
... ...
........... .... .......
. .. . .. . ... . .. . . .
.
............................. ..... .
.. ...
.
....
.
.
. ...... ....... .......... ........................................... ............... ............... ............................. ........... ......... ...................... ....... ......... ..
........ ............. ........ .......... ................................... ....... ....... .................... ....... ....... .......... .. ... ... ........ ... .................... ...... .... ... ...... .......... . ...... ...... ...... .... .....
...... ..... ...... ......... .. .. ........... ...... .. .. .. .... .......
0 . ..........
.. ..........
.
. .
.
...
.
.......... ........ ...
....... ....... .............. ............ ........... ....... ....... ....... ....... ........... .............. ....... .........................
... ................
.......
.......... ...
0 .... ....
....
.
... ...
.......
...
.. ... .... ............ .... ..... .............. .... .. .... .... ........ ...
.. . ... . . .
...
. .
. . . . .. . . ...
. .
. . .
. ...... ............ . . ..... ....... ..... ..........
.......... ......... ....... ....... ....... .......... ....... ....... ....... ....... ....... .............. .......................
. . . .
... .. ..........
. . .
... ......... ... .............. .... ..... ..
.
. ...
. .. . .
............. .. ..
... .... ..................... ... .. .... ..... ....
..
...... .... . ... ... .
−1 ...........
... −2 ... ...
.........
.
........
. ..
10 .. ......
300 ....
400 500 .
... 1 300 400 500
.. ...... . ......... ... .....
.........
.
..
.............
... ....... .... .............. .......
. ....
.. .
. ...
.
............ .................. ........... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ..
.. .. .. ... ... . .... ......... .................
... ..... .. ... .... .. ...............
.... ....... ..... ..
......
.
...
... ... .... ... .. ...... ...... ........ ... .. ... ........................... .............. ..... . ..... ..
... ................... ...... .... ......... ............................. .. .... .... .. ..... ............... . .......... .... ... ..... . ......... ..... ... ...
0 . .
.. . . ....... .. ....... ... ... ... ... . .....
.......... .... .. ... .. .. .. .. ....
......... .... .. . .. .. ... . ..
. . ....
........ .. ..... .....................
.. ..... ...................... .........
........... .......... ..... ..............
.
........
............ ..............
...... ......
........ ........
0 ...... . .. ... .................... .... .............. ........
. ... ....... ... ...... .......... ........... ..........
............. .. .... ....... ........ .........
. .. ...
.
.
... . ..
......................... .....
........... ..
.
. ..
.......................
........ ..... ... ... ..... .. .. ... ...
...... .
. . .
.. ... ... .... .... . .
. ............ . .... . . . ........................ ..... ....................
....... ... .. ........... ........ ..
... ... ............ ...... ... .. ................. .
.
... ..... .. ....... ........................................ ........... ......... .................. ....... ....... ............................... ................... ....... .. ..........................................
... .... .......... . . .
........
... ...... ... .......... .... ....... ... ............
................ .. ......
....... ...
.. .. .. . ................ ....
.
. .
.
.....
. . .......... .. .................. .. ....... .............
.
.. .
...
..
. .
.
..
. .
..
.
. ..... .. ..... ..
.............. ... .....
... .. ................... .. ...... ...... .. ..... ... ..............
.. ... ........ ......
...
−10 .......... ...
.......... ...
........
................ ... .....
...........
.
...... ..
.
........ ....
.......... ...
......... −1 ...... ....... .. .................... ......... ....
... ... ... ... .......... .. ....
. ......... .. ............. ... ..
.............. .. ....... .. ....
... .. .
... ...
.
...... . ............. ................. ... ....... ... .......
........ ...... ...... .................. .. .......... ... ......
... ...... ...... ............... . .......
............ .....
..... ..
.........
... ..... ... .... ... ........
.. . ... . ....
.. ...
135
above ¿ 1,000 and the number of trades is no less than 7, otherwise the pair
is considered not to be worthwhile. The ordering of the 10 pairs based purely
on the results from the trading strategy, where the first five are considered
good or good enough, the remaining five are not, is:
1 pair II
2 pair X
3 pair VII
4 pair VIII
5 pair IV
6 pair I
7 pair III
8 pair VI
9 pair IX
10 pair V
We briefly discuss the spreads from figure 7.1. The spread for pair I is rarely
hitting its threshold Γ although the adjustment parameter κ is large. The
spread for pair II looks good, but it could be better if we had used κ = 1.
After t = 425 the spread is a relatively long time below +Γ, with κ = 1 we
would have made a profit of ¿ 6721 in 36 trades. The spreads for pair III, VI
and IX are rarely hitting their tresholds Γ, the adjustment parameter κ is
small but increasing it does not have a positive effect. For pair III increasing
κ to 5, results in a loss of ¿ 2,108. For pair VI the profit gets smaller, while
the number of trades increases. The spread for pair IV shows the reason
why we use an adjustment parameter, without it this pair would have traded
twice with a total profit of ¿ 385. For pair V the threshold Γ is not displayed,
because it is 19.68. Lowering the threshold results in a loss when we keep
κ = 8, only when we also reduce κ to 1 or zero we get a small profit. The
spreads from pair VII, VIII, and IX look good, they hit their thresholds Γ
regularly and produce a nice profit. Changing the parameters slightly does
not effect the number of trades, it affects the profits slightly.
136
7.2 Results testing price process I(1)
Both cointegration tests require that the stock price processes xt and yt are
integrated of order one. In section 4.2 was derived that it is reasonable to
assume stock price processes fulfill this requirement, but in this section we
will do a unit root test on the stocks of the 10 pairs to see if the requirement
is fulfilled. The unit root test we use is again the (Augmented) Dickey-Fuller
case 2 test, we perform the test twice. The first test is:
H0 : xt ∼ I(1) against H1 : xt ∼ I(0) .
The outcome should be not to reject H0 . The second test is:
H0 : xt ∼ I(2) against H1 : xt ∼ I(1) ,
which is equivalent to:
H0 : ∆xt ∼ I(1) against H1 : ∆xt ∼ I(0) .
The outcome of this second test should be to reject H0 , which makes it likely
that the price processes are I(1).
The Dickey-Fuller test fits an AR(p) model to the stock price process, we
estimate p with the information criteria from chapter 3 and set the maxi-
mum value of p, which is K, equal to 10. Table 7.2 shows the outcomes of
both tests, where we used the following critical values
1% 5% 10%
-3.44 -2.87 -2.57
since we have roughly 520 observations, T = 520. The stocks in a pair are
denoted with x and y, the test statistic of the first test is stated along with
whether the null hypothesis is rejected. The outcome of ’not rejected’ is de-
noted with symbol ¬ otherwise the level is stated. The average value of the
estimated p also stated and the results of the second test are stated in the
same way.
The table shows that it is likely that all stocks from the 10 pairs are inte-
grated of order one.
137
Table 7.2: Results I(1).
Test 1 Test 2
stock statistic outcome p̄ statistic outcome p̄
I -x -1.7 ¬ 4 -11 1% 4
I -y -2.1 ¬ 4 -11 1% 4
II -x -1.4 ¬ 1 -12 1% 3
II -y -1.3 ¬ 2 -25 1% 1
III -x -2.1 ¬ 1 -22 1% 1
III -y -2.2 ¬ 1 -26 1% 1
IV -x -1.4 ¬ 1 -24 1% 1
IV -y -1.5 ¬ 1 -22 1% 1
V -x -1.4 ¬ 8 -8 1% 8
V -y -0.3 ¬ 10 -8 1% 10
VI -x -0.6 ¬ 4 -12 1% 4
VI -y -0.6 ¬ 4 -12 1% 4
VII -x -1.1 ¬ 1 -25 1% 1
VII -y -0.9 ¬ 1 -25 1% 1
VIII -x -1.4 ¬ 1 -23 1% 1
VIII -y -1.5 ¬ 1 -23 1% 1
IX -x -1.4 ¬ 1 -25 1% 1
IX -y -0.9 ¬ 1 -25 1% 1
X -x -0.5 ¬ 2 -23 1% 1
X -y -0.7 ¬ 1 -16 1% 2
1% 5% 10%
-3.44 -2.87 -2.57
138
We perform the cointegration test on the whole data set, so we have 520 ob-
servations per stock. Recall that the profits were determined for the second
half of observations. As stated in section 4.3, the Engle-Granger method is
not symmetric. The results can be different for regressing xt on yt and the
other way around. That is why we perform the Engle-Granger test twice.
The results are stated in table 7.3.
We see that there is only one pair where the outcome of the two tests are
different, pair X. The estimated cointegrating relation are for all pairs prac-
tically the same:
α̂1 ≈ 1/α̂2 .
So the disadvantage of the Engle-Granger method of not being symmetric
does not seem to be very harmful when testing pairs for cointegration. The
pairs are put in order of the test statistic, the idea is that the lower test
statistic the lower the level of rejection, which is more evidence for being
cointegrated. For example, the Engle-Granger method rejects the null hy-
pothesis for pair II even at 0.1% level, while pair VIII is only rejected at
5%. So there is more evidence that pair II is cointegrated than pair VIII,
which is why we prefer pair II.
139
The ordering of the 10 pairs based on the Engle-Granger method is
1 pair II
2 pair VII
3 pair X
4 pair VIII
5 pair IV
6 pair VI
7 pair III
8 pair I
9 pair V
10 pair IX
where there is evidence for cointegration for the first five pairs, and no ev-
idence for the remaining five. This ordering is not exactly the same as the
ordering found with the trading strategy, but they coincide on what is good
and what is not. The five pairs which are considered to be worthwhile trad-
ing are cointegrated and the five pairs that are not worthwhile trading are
not cointegrated according to the Engle-Granger method. The first in both
orderings are the same, this is the pair that consists of two equal stocks but
listed on different exchanges. In the first half of the ordering, the good ones,
only places 2 and 3 are switched, the others are at the same places. The
second half of the two orderings differ a lot.
The critical values for each test are in table 7.4, these are for a sample size
of T = 400. Although the data of the 10 pairs IMC provided consist of 520
observations, these critical values will be used when testing the 10 pairs for
cointegration.
140
Table 7.4: Critical values for Johansen test.
Test 1% 5% 10%
1 16.31 12.53 10.47
2 15.69 11.44 9.52
3 6.51 3.84 2.86
One issue that was not addressed in section 4.4 was how to find p. The
Johansen method assumes that the vector process yt = (xt , yt ) follows a
VAR(p) model. In S-PLUS, the program used for all simulations and calcu-
lations in this report, exists a built-in function called ’ar’ which fits a VAR
model using Yule-Walker equations. The function determines the order of
the VAR with the Akaike information criterion. This function is used for
estimating p. We set the maximum value of p equal to 10 and the minimum
value equal to 2 because the first step of the Johansen method is to fit a
VAR(p − 1) on the differences ∆yt . The results of the Johansen test are in
table 7.5.
141
The Johansen method is symmetric, there is no difference if we set yt =
(xt , yt ) or yt = (yt , xt ). The test statistics and the estimated cointegration
relations are exactly the same. We consider the stocks of the pair being coin-
tegrated if the null hypothesis of the first and the second test are rejected
and the null hypothesis of the third test is not rejected.
The Johansen method finds the same pairs cointegrated as the Engle-Granger
method, pair II, IV, VII, VIII and X. The levels for rejection the null hypoth-
esis of no cointegration are the same. Only for pair X the results differ a bit,
but this is because the Engle-Granger method had two different outcomes,
the first test had rejected at 1% and the second test at 5%. The Johansen
method has rejected pair X at 5%. There are no real differences for the
cointegrated pairs, the estimated cointegrating relations are also practically
the same. The biggest difference is for pair VIII, where the Engle-Granger
method estimates α equal to 5.338 and the Johansen method 5.231. The
two methods differ more for pairs that are not cointegrated, the differences
between the estimates of α are larger. But according to these methods the
pairs are not cointegrated so there does not exist an α such that yt − αxt is
stationary.
1 pair II
2 pair VII
3 pair VIII
4 pair X
5 pair IV
6 pair III
7 pair I
8 pair VI
9 pair V
10 pair IX
where there is evidence for cointegration for the first five pairs, and no evi-
dence for the remaining five. This ordering differs slightly from the Engle-
Granger ordering. But most important is that the two methods coincide on
which pairs are cointegrated and which are not. And this in turn coincides
with the results from the trading strategy.
142
Chapter 8
Conclusion
The goal of this project was to apply statistical techniques to find relation-
ships between stocks. The closing prices of these stocks, dating back two
years, are the only data that have been used in this analysis.
143
In chapter 6 we examined the properties of the Engle-Granger method, which
consists of a linear regression followed by the Dickey-Fuller test on the resid-
uals of this regression. The main question was, which Dickey-Fuller case
to use and whether the critical values of the Engle-Granger method are the
same as those for this Dickey-Fuller test. We saw that case 2 was the most
appropriate one for the way we want to test for cointegration, that is with-
out a constant in the cointegrating relation. There was no indication, based
on simulations, that the critical values from the Engle-Granger test differ
from those of the Dickey-Fuller case 2 test. Also the power of the two tests
were found similar when the assumptions of the method were fulfilled. The
Engle-Granger test appeared to perform well, even when some assumptions
were not fulfilled. The Engle-Granger test assumes that the residuals follow
an autoregressive model. When we generated cointegrated data with residu-
als that are not likely to be autoregressive, the method still rejects the null
hypothesis of no cointegration often.
IMC has provided a selection of ten pairs that are different in quality. In
chapter 7 we applied the trading strategy from chapter 2 to the historical
closing prices. Based on profitability and the number of trades, we find a
distinction between good and bad pairs which coincides with the distinction
made by IMC. In this chapter we also tested the ten pairs for cointegration,
using both the Engle-Granger as well as the Johansen method. The two
methods coincide on which pairs are cointegrated and which are not. Also
the estimated cointegrating relations are almost the same. All the good pairs
according to the trading strategy are seen as cointegrated, according to both
tests. Furthermore all bad pairs are seen as not cointegrated according to
both tests.
144
Chapter 9
Alternatives &
recommendations
yt − αxt .
In section 4.3 was stated that we neglect a possible constant in the coin-
tegrating relation, α0 . In this section we will look at a trading strategy that
does not neglect the constant. We also look at what can happen if we have
cointegration between the logarithms of the stock prices.
145
Trading strategy with constant
Consider two stock price processes, xt and yt , which have the relation
yt − αxt − α0 = εt , (9.1)
where εt is some stationary process. In other words, the two stocks are coin-
tegrated with a constant in their relation. We could trade the pair y, x with
ratio 1 : α and give up the cash neutral property, but another possibility
is to determine the trading instances with (9.1) and trade a quantity of x
such that the whole trade is cash neutral. More clearly, with (9.1) we can
determine whether xt is over- or underpriced compared to yt at time t but
we do not trade this relation, we trade one stock of y and yt /xt stocks of x
if there was a mispricing larger than Γ at time t.
Let us consider an example, let x and y be a pair with relation (9.1) where
α = 2 and α0 = 20 such that spread εt looks figure 9.1. The corresponding
processes for xt and yt are shown in figure 9.2.
1 .........
.. ..
.. .
........
.. ..
.. ...
........
.. ...
... ..
......
.. ..
.. ...
.. ....
......
.. ..
.. ..
.. ...
........
.. ...
... ...
......
.. ..
.. ..
..
.. .... .. .... .. .... ... ....
.. .. .. ... .. .. ... .. ... .... ...
... ... ... .. .
. .. .
.
. .
.. .
.
. .
..
.
.
. .
..
.
.
.
.. ... .. ... .. .. .. .. .. .. .. .. ..
... ... .. .. ... ... ... ... ... .. ... ...
.. .. ..
.. ... .. .. .. .. .. ... .. ..
..
.. .. ... .. .. .. .. .. .. .. .. ..
...
... ..
. ... .
. .
. .
. .
. .
. .
. .
. .
... .
.
... .. ... ...
.. ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... .. ... ..
..
..
...
... ... ...
... .. ... ...
... ... .. ... .. .. .. ... ...
0 ...
...
...
.....
.
..
..
...
...
..
.
...
...
...
..
...
...
..
.
...
...
.
...
...
.
.
...
..
.
.
...
... ..
.
.
.
..
.
...
...
...
.
.
...
..
.. ... ...
... .. ... ... ..
.. ... .. ... ... ... ... .. .. ... ...
..
.. ... .. ..
. ... ..
. .
.. .
. .
. .
. .
.. .
.
.. . .. .. .. .. ... .. .. .. .. ..
.. .. .. .. .. ..
.. .. ... .. .. ..
..
.. .. .. .. ..
..
.. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. ..
... ..
. ... ..
.
.
... ..
. ... ..
. ... ..
.
.
... ..
.
.
.. . .. .. . . . ... . ... ..
... ... ... ... ... ... ... ..
.. ... .. .... .. ...
.. ... ... ... .. ... .. .. .. . .. .
... ... ... ... .. .. ... .. .. .. .. ..
−1 ...... ...... ....... ..... ....... .......
146
80 .
............ . .................
......................... ................. .................
.
.. .......................................
.
...
..
..
..
..
..
........ .... ..
..
.................................................................................................. ....
. .....
......... ........................... ......
.......
........ ....................
....................................................................................................................................... ......
60
40
.. ............................................................
..................................................................................................................................................................................
......................................................................
. .................
...........................................................................................................................................................................
20
0 100 200 300 400 500
observations. The trades are shown in table 9.1. With the first trade we put
on a position for the first time, so we made no profit yet. The second trade
consists of two parts, we flatten the position from the first trade which results
in a profit and we put on a new position. We always trade one stock of y and
we trade yt /xt number of stocks x so it is exactly cash neutral. The actual
traded spread is not shown because it is basically the same as the right half
of figure 9.1. Figure 9.3 shows the spread if we do not include a constant,
i.e., if we neglected α0 .
Although the profit for each trade is not at least 2Γ, as the profit for the
trading strategy from chapter 2 with constant ratio was, it is still quite prof-
itable to trade this pair this way. Specially because the trading strategy from
chapter 2 would not make any money, even if we would have used a large
adjustment parameter κ.
147
....
.....
........ ...
4 ... ............
....
.. ...... ......
........ ....... ........
......... ..... .. ... ........
....... ............. ..................
. .
... ....
.
..... ........... . . ........ ....
.. ... . ..... .. ....
.......... . ... . ........ .. . .
.......... ..............
. .. . . ......
....... ... ......
. .. .. .. .... . . ..... ....
....
....
... .
... .... .............................
. ............... ......... ....
.
.
. . . .
.. ... ...... . ... ....... .. ............. . .. . . ....
... . ................ . . ......... ..... ..
0 .... .. ... .. .......
...
...... .
....... .........
........ ..
...... ....
.... . ... ....
.. ... ................ .................
..... .... . .... ...
....... . .... ... ...
. .
. ....... . ...
.... .. .
.... ...... ....
...... ......... .........
..... ..... .....
..
−4 .......
........
...
......... ... ....
...
...... ..............
...
Although this strategy can be applied for every α0 , we still do not want
α0 to be large because of the market neutral property of pair trading. If the
overall market is up 50%, so x increases with 50% then we expect that y also
increases with 50%. With a large α0 compared to the stock prices, this does
not hold. Actually it does not hold for any α0 6= 0, but there is only a small
effect when α0 is small. The value of α0 used in the example is actually too
large, it is equal to the first observation of x. Which values of α0 that can
be used with this strategy, should be examined further.
148
.. ... . .
... ...... ... ... .
.... ......... ..... . ....... .....
....
1.2 ...
.......
.......
..............
... ....
....
......
... .
........
...
.
..
........ ... ............
.
. . .
.
. . . .
.
..... .... .... ...... ... .....
.
. . . .
.
.....
......
.
......... .
.... .. ... ....... ... . .. ... ..... ....... .... .. .. .... ....
.. ........... ..... ....... .. ... ... ... ............... ...... ....... .. .... .......... . .. ... ....
.............. .... .... ... ........... ....... ..... . ... .. ...... .............. ..................... ... ... .. .................. ... .. ..... .....
................. ....... .... ....... . ...................... ........... ....... .......... .... ....... ......... ....
. .
. ........................... .....
. ........... .................................. .... ........ ... ....
.
. . . . .
.................. ....... ............. ....... ... ...... ... ... .... ... . . . . . . .. .
... .... ... .. .. .. .... .... .. ... ..... ...... ... ...... ...... ... ........ .... .. .....
.. ...... ..... .. .. .... ... ....... ...... ....................... .. .. ........ ... .. ...... ........ ...... ...... .. ....... ..... ... ..... . ........... ..... ... ...
1.0 ........ ..... ........ ......... ........................
..... ... .. ........... ..... ..........
. .. . . . . . ...
..... ...... .. ..... .. ... ....... .......... ... ..... .
.
.....
........
. .
..
..... ... ............ ....... .. ... .. ...... .. ....... .......... ..........................
..... . ........ ....................... .... .... ............ .....
.
. ...... ...... ..... ..
.
.
.
... .. .............. .....
.
. . . .
.
........ .............
............... .......................... ..
...... .... ........... ..... .. ... .... .. ...... ... .......... . . .
.
...
..
.......
.......
.......
.
.
....... . . . .. .......... ....
.......
.
....... .. ..
..... .... ..
.... ......... ..... . ... .. . .......... .... .. ..... .. .. ...... .... ........ ... ... .... .. .. .... .. ........ ... ...... ..... ... ........
... .... .... ... ........ ............... ....... ...... ... ..... . ........... ......... ... ... ..... ... .................. .... .... .. .... .....
.... ... .
..
. ..... .. ...... ......
. . . . ... . ... ..
. .. . . . .
. . .. .
. ...
. .. . . . ......
.. .. . .
........ . ..... .... . . . .
..... . ..
.
.......... .. . .. . . .
..................... .... . ........ ...... .... . .. . .. . . .
...... .... .
. .....
.
.......... . .
...... . ..... . .
......
. . . . .
..... ...... ..... . . . . . .
. .
..... . ...... ........ . .
.. .
..................
........ . . .. .. . .
. . . . .. . . . ..
.... . .
... . . ..... .
.... . ... .. .. . .
... ......
......
0.8 . .
..
. ........
....
. .
....
In this report we have only discussed trading strategies that trade one line,
we put on a position when the spread reaches ±Γ and wait till the spread
reaches Γ in the other direction. But it is very interesting to trade more lines.
For example, if the spread reaches +Γ1 , we put on a short position in y and a
long position in x. If the spread increases further and reaches Γ2 we enlarge
our short position in y and our long position in x. A trading strategy could
be to trade the same amounts at each threshold, which are equally spaced,
Γ2 = 2Γ1 . Figure 9.5 illustrates this idea. To make this more clear, table 9.2
shows the trading instances for this strategy with two lines when we trade x
and y in the ratio 1:1.
149
...
.....
....
... .
. .. .............
.
.
.......
. ..
... . .. .. . ... .. . .. ...
. . . . . . . . . . ......... .... ............... .. ..... ..... ...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... .... . .
4 .. .... .. ... .. .. ..........
.. .... .. ... ... ... ......
...
...
.
....... ...
.
... ... ........ ....... .. ... .. ..
..... ...
. ... ... ..... ...
.
. .. ....
.
.
.. . .. .
........... ... ..... ........ ... .....
... ...... ......... ... ... ... ........ .. ...
............... ...... ... .. .....................
. ..
. .
2 .. .
. .
...... ... . ..........
. ...
.... .. .
.
.... ...
.
....... ....... ............. ............... ....... ....... ....... ....... ......... ....... ....... ................ ....................... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... .......... ........... .......
.. .. ...
... .
.. .
.
.
....
.
...
...
.............. ........ ........ .. .. .
... .... ...... . ....... .
. ... .
.
.. .......
.. .. ........ ...... ... .............. ............. ......... ... ........ ........ ...........
.... ..... .... ... .......... . . .......... ... ......... ........... ........
. .... ...
.. . ...... ...
... .......... ........ .... .... ... ... ... ......... ........ . . .. ...
. ..... . ... .. ...... ... .... .............. .... .. ..
................................................................................................................................................................................................................................................................................................................................................................................
0 .....
......
.
.
...
... .
.
.
. ..
.
.
.
...... .
.. ..
..
..
.
. .. .. ... ..
. . .
.
.
.
....... .. ...
. .
......
.... ... .. .. .
......... . .......
.. ...... .. ....... .... ...... .....
...
.... ........ ............... .... ....... ...................
. . .
.
....
........ .. .. ... ........ .. .... ..... ..
....... .... ........... . .. ... . .... .
−2 ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........... ............ .......... ....... ......... ............... ....... ....... ....... ....... ....... ....... ....... .......
.. .......
..........
... ..
... ..
.......
.. ... ...
... ...
... ..
... ...
.. ...
−4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............... . . . . . . . . . . . . . . . . . .
......
This strategy can be easily extended to more lines, it can even be easily
extended to three or more stocks in a pair. How to choose the number of
lines, the thresholds and the corresponding amount of stocks is very inter-
esting to examine further.
We could cut the cointegration test into several pieces. Suppose we have
datasets containing four years of closing prices, than we could perform three
tests on two years of data with an overlap of one year. More clearly, the
first test is on the first and second year, the second test is on the second and
150
Table 9.2: Trading instances.
trade t st position (y,x)
1 26 2.12 (-1,+1)
2 56 4.22 (-2,+2)
3 97 1.94 (-1,+1)
4 152 -0.01 flat
5 158 -2.11 (+1,-1)
6 199 -4.20 (+2,-2)
7 206 -1.89 (+1,-1)
8 221 0.13 flat
9 284 2.18 (-1,+1)
10 289 4.06 (-2,+2)
11 297 1.92 (-1,+1)
12 306 -0.13 flat
third year and the third test is on the third and fourth year. Then we can see
if the stocks are cointegrated on each time interval and if the cointegrating
relation changes. This could be very helpful to determine a good adjustment
parameter κ.
151
152
Bibliography
153
[11] M. OSTERWALD-LENUM. A note with quantiles of the asymptotic
distribution of the maximum likelihood cointegration rank test statistics.
Oxford Bulletin of Economics and Statistics, 54:462, 1992.
[12] P.C.B. PHILIPS and S.N. DURLAUF. Multiple time series regression
with integrated processes. Review of Economic Studies, 53:473–495,
1986.
[14] J.H. STOCK and M.W. WATSON. Testing for common trends. Journal
of the American Statistical Association, 83(404):1097–1107, 1988.
154