Magaama
Magaama
Everybody is familiar with the concept of correlation between two random variables:
correlation is high when they co-move
correlation is zero when they move independently
So what is cointegration?
cointegration is high when two quantities move together or remain close to each other
cointegration is inexistent if the two quantities do not stay together
Clear? You can see why this concept may be difficult to grasp at first, but the truth is
that it’s easy.1
In the financial context:
Cointegration of (log-)prices yt refers to long-term co-movements.
Correlation of (log-)returns ∆yt = yt − yt−1 characterizes short-term co-movements in
(log-)prices yt .
1
Y. Feng and D. P. Palomar, A Signal Processing Perspective on Financial Engineering. Foundations and
Trends in Signal Processing, Now Publishers, 2016.
D. Palomar (HKUST) Pairs Trading 5 / 63
Correlation vs. cointegration
Example of high correlation with no cointegration:
5
ỹ1t
y2t
ỹ1t − y2t
−1
0 20 40 60 80 100 120 140 160 180 200
0.8
0.6
0.4
Log−returns of stock 2
0.2
−0.2
−0.4
−0.6
−0.8
−1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
Log−returns of stock 1
0.5
−0.5
−1
−1.5
−2
−2.5
−3
y1t
y2t
y1t − y2t
−3.5
0 20 40 60 80 100 120 140 160 180 200
0.8
0.6
0.4
Log−returns of stock 2
0.2
−0.2
−0.4
−0.6
−0.8
−1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
Log−returns of stock 1
A time series is called integrated of order p, denoted as I(p), if the time series obtained
by differencing the time series p times is weakly stationary,
while by differencing the time series p − 1 times is not weakly stationary.
Example: stock log-prices yt are integrated of order I(1) because
log-prices are not stationary
but log-returns yt − yt−1 are stationary (at least for some period of time).
A multivariate time series is said to be cointegrated if it has at least one linear
combination being integrated of a lower order, e.g., yt is not stationary but wT yt is
stationary for some weights w.
0.5
−0.5
−1
−1.5
−2
−2.5
−3
y1t
y2t
y1t − y2t
−3.5
0 20 40 60 80 100 120 140 160 180 200
Recall that if two time series are cointegrated, then in the long term they remain close to
each other.
In other words, the spread zt = y1t − γy2t is mean reverting.
This mean-reverting property of the spread can be exploited for trading and it is
commonly referred to as “pairs trading” or “statistical arbitrage”.
The idea behind pairs trading is to
short-sell the relatively overvalued stocks and buy the relatively undervalued stocks,
unwind the position when they are relatively fairly valued.
zt
Sell Sell
s0
Sell to unwind
−s0
Buy
2
G. Vidyamurthy, Pairs Trading: Quantitative Methods and Analysis. John Wiley & Sons, 2004.
D. Palomar (HKUST) Pairs Trading 16 / 63
Pairs trading or statistical arbitrage
Statistical arbitrage can be used in practice with profits:3
0.5
Spread 0
−0.5
0 20 40 60 80 100 120 140 160 180 200
(a)
1
0.5
Position
−0.5
−1
0 20 40 60 80 100 120 140 160 180 200
(b)
40
30
P&L
20
10
0
0 20 40 60 80 100 120 140 160 180 200
(c)
3
M. Avellaneda and J.-H. Lee, “Statistical arbitrage in the US equities market,” Quantitative Finance,
vol. 10, no. 7, pp. 761–782, 2010.
D. Palomar (HKUST) Pairs Trading 17 / 63
But how to discover cointegrated pairs and γ?
One interesting approach is based on a VECM modeling of the universe of stocks: From
the parameter β contained in the low-rank matrix Π = αβ T one can extract a
cointegration subspace. After that, one can design some portfolio within that
cointegration subspace.4
A simpler approach to discover pairs is by brute force, i.e., try exhaustively different
combinations of pairs of stocks and see if they are cointegrated.
But, given a potential pair, how do we obtain the “secret” γ?
Easy! Just a simple LS regression!
Recall that
γ is needed to form the spread to be traded (i.e., portfolio)
the spread mean µ is needed to determine the thresholds for entering a trade and unwind
later the position.
4
Z. Zhao and D. P. Palomar, “Mean-reverting portfolio with budget constraint,” IEEE Trans. Signal
Process., vol. 66, no. 9, pp. 2342–2357, 2018.
D. Palomar (HKUST) Pairs Trading 18 / 63
Outline
1 Cointegration
2 Basic Idea of Pairs Trading
3 Design of Pairs Trading
Pairs selection
Cointegration test
Optimum threshold
4 LS Regression and Kalman for Pairs Trading
5 From Pairs Trading to Statistical Arbitrage (StatArb)∗
VECM
Optimization of mean-reverting portfolio (MRP)
6 Summary
Design of a pairs trading strategy
We first focus on pairs trading (i.e., statistical arbitrage between two stocks) as the
example to introduce the main steps of statistical arbitrage.
In practice, pairs trading contains three main steps5 :
Pairs selection: identify stock pairs that could potentially be cointegrated.
Cointegration test: test whether the identified stock pairs are indeed cointegrated or not.
Trading strategy design: study the spread dynamics and design proper trading rules.
5
G. Vidyamurthy, Pairs Trading: Quantitative Methods and Analysis. John Wiley & Sons, 2004.
D. Palomar (HKUST) Pairs Trading 20 / 63
Outline
1 Cointegration
2 Basic Idea of Pairs Trading
3 Design of Pairs Trading
Pairs selection
Cointegration test
Optimum threshold
4 LS Regression and Kalman for Pairs Trading
5 From Pairs Trading to Statistical Arbitrage (StatArb)∗
VECM
Optimization of mean-reverting portfolio (MRP)
6 Summary
Pairs selection: normalized price distance
∑
T
NPD ≜ (p̃1t − p̃2t )2
t=1
where the normalized price p̃1t of stock 1 is given by p̃1t = p1t /p10 . The normalized prices
of stock 2 defined similarly.
One can easily (i.e., cheaply) compute the NPD for all the possible combination of pairs
and select some pairs with smallest NPD as the potentially cointegrated pairs.
Later one can use a more refined measure of cointegration (more computationally
demanding).
6
E. Gatev, W. N. Goetzmann, and K. G. Rouwenhorst, “Pairs trading: Performance of a relative-value
arbitrage rule,” Review of Financial Studies, vol. 19, no. 3, pp. 797–827, 2006.
D. Palomar (HKUST) Pairs Trading 22 / 63
Outline
1 Cointegration
2 Basic Idea of Pairs Trading
3 Design of Pairs Trading
Pairs selection
Cointegration test
Optimum threshold
4 LS Regression and Kalman for Pairs Trading
5 From Pairs Trading to Statistical Arbitrage (StatArb)∗
VECM
Optimization of mean-reverting portfolio (MRP)
6 Summary
Least Squares (LS) regression
If the spread zt is stationary, it can be written as7
zt = y1t − γy2t = µ + ϵt
where
µ represents the equilibrium value and
ϵt is a zero-mean residual.
Equivalently, it can be written as
y1t = µ + γy2t + ϵt
LS regression is used to estimate the parameters µ and γ, obtaining the estimates µ̂ and
γ̂.
If y1t and y2t are I(1) and are cointegrated, then the estimates converge to the true values
as the number of observations goes to infinity8 .
Using the estimated parameters µ̂ and γ̂, we can compute the residuals
Then, one has to decide whether the spread is stationary, i.e., ϵt is stationary. In practice,
the estimated residuals are used ϵ̂t
There are many well-defined mathematical tests for the stationarity of ϵ̂t , e.g., augmented
Dicky-Fuller (ADF) test, Johansen test, etc.
8
R. F. Engle and C. W. J. Granger, “Co-integration and error correction: Representation, estimation, and
testing,” Econometrica: Journal of the Econometric Society, pp. 251–276, 1987.
D. Palomar (HKUST) Pairs Trading 25 / 63
Outline
1 Cointegration
2 Basic Idea of Pairs Trading
3 Design of Pairs Trading
Pairs selection
Cointegration test
Optimum threshold
4 LS Regression and Kalman for Pairs Trading
5 From Pairs Trading to Statistical Arbitrage (StatArb)∗
VECM
Optimization of mean-reverting portfolio (MRP)
6 Summary
Optimum threshold
Once some identified pairs have passed the cointegration test, one still needs to decide
the entry and exit thresholds to open and unwind the positions, respectively.
For the sake of concreteness, we focus on studying the entry threshold:
open positions when the spread diverges from its long-term mean by s0
unwind the position when it reverts to its mean
Thus, the key problem now is how to design the value of s0 such that the total profit is
maximized.
Total profit:
profit of each trade × number of trades
profit of each trade is s0
number of trades is related to the zero crossings, which can be analized theoretically as well
as empirically.
We focus now on estimating the number of trades.
1 − Φ(s0 )
For each trade, the profit is s0 and then the total profit is s0 T(1 − Φ(s0 )).
Then the optimal threshold is s⋆0 = arg maxs0 {s0 T(1 − Φ(s0 ))}.
In practice, one cannot know the true distribution but can estimate the distribution
parameters.
Then one can compute the total profit based on estimated distribution.
D. Palomar (HKUST) Pairs Trading 28 / 63
Optimum threshold s0 : Parametric approach∗
Optimal threshold s⋆0 maximizes the total profit:
0.7 3
Theoretical
0.6 Parametric 2.5
Probability of trades
0.1 0.5
0 0
0 1 2 3 0 1 2 3
s0 s0
(a) (b)
0.25
Theoretical
0.2 Parametric
Total profit
0.15
0.1
0.05
0
0 0.5 1 1.5 2 2.5 3
s0
(c)
The empirical values f̄j may not be a smoothed enough and the resulted profit function
may not be accurate enough.
Smooth the trading frequency function by regularization:
∑
J ∑
J−1
minimize (f̄j − fj )2 + λ (fj − fj+1 )2
f
j=1 j=1
where
1 −1
1 −1
D= ∈ R(J−1)×J .
.. ..
. .
1 −1
Setting the derivative of the objective w.r.t. f to zero yields the optimal solution
f⋆ = (I + λDT D)−1 f̄.
The optimal threshold is the one maximizes the total profit:
0 0
0 1 2 3 0 1 2 3
s0 s0
(a) (b)
0.2
Theoretical
NonParam: empirical
0.15 NonParam: regularized
Total profit
0.1
0.05
0
0 0.5 1 1.5 2 2.5 3
s0
(c)
Using the estimated parameters µ̂ and γ̂, we can compute the residuals
ϵ̂t = y1t − µ̂ − γ̂y2t .
Then, one has to decide whether the cointegration is acceptable or not so move to the
trading part.
There are many well-defined mathematical tests for the stationarity of ϵ̂t , e.g., augmented
Dicky-Fuller (ADF) test, Johansen test, etc.
Total profit:
profit of each trade × number of trades
profit of each trade is s0
number of trades is related to the zero crossings, which can be analized theoretically as well
as empirically.
Ideally, we want residuals with large amplitude (variance) as well as a strong mean
reversion because they directly affect the profit.
1 1
0 0
−1 −1
−2 −2
Z−score
−3 signal −3
Aug 01 Sep 01 Oct 02 Nov 01 Dec 01 Jan 02 Feb 01 Apr 02 May 01 Jun 01 Jul 02 Aug 01 Sep 04 Nov 01 Dec 03 Jan 02 Jan 31
2000 2000 2000 2000 2000 2001 2001 2001 2001 2001 2001 2001 2001 2001 2001 2002 2002
2.5 2.5
2.0 2.0
1.5 1.5
1.0 1.0
Aug 01 Sep 01 Oct 02 Nov 01 Dec 01 Jan 02 Feb 01 Apr 02 May 01 Jun 01 Jul 02 Aug 01 Sep 04 Nov 01 Dec 03 Jan 02 Jan 31
2000 2000 2000 2000 2000 2001 2001 2001 2001 2001 2001 2001 2001 2001 2001 2002 2002
8 Z−score 8
signal
6 6
4 4
2 2
0 0
−2 −2
Jul 03 Aug 01 Sep 01 Oct 02 Nov 01 Dec 01 Jan 02 Feb 01 Apr 02 May 01 Jun 01 Jul 02 Aug 01 Sep 04 Nov 01 Dec 03
2000 2000 2000 2000 2000 2000 2001 2001 2001 2001 2001 2001 2001 2001 2001 2001
1.8 1.8
1.6 1.6
1.4 1.4
1.2 1.2
1.0 1.0
Jul 03 Aug 01 Sep 01 Oct 02 Nov 01 Dec 01 Jan 02 Feb 01 Apr 02 May 01 Jun 01 Jul 02 Aug 01 Sep 04 Nov 01 Dec 03
2000 2000 2000 2000 2000 2000 2001 2001 2001 2001 2001 2001 2001 2001 2001 2001
The problem with the LS regression is that it assumes that µ and γ are constant.
In practice, they can change with time, resulting in a spread that drifts from equilibrim
never to revert back with huge potential losses.
Thus, in practice, µ and γ are time-varying and have to be tracked.
How to track time-varying parameters?
Of course… Kalman!!!
Well, you can also try a rolling regression or exponential smoothing, but Kalman works
better.
Recall the previous static relationship for cointegrated series y1t and y2t :
y1t = µ + γy2t + ϵt
µt+1 = µt + η1t
γt+1 = γt + η2t
Kalman filter consist of two equations that model the time-varying hidden state xt and
the observations yt :
xt+1 = Tt xt + η t
yt = Zt xt + ϵt
The observation equation yt = Zt xt + ϵt relates the observation yt to the hidden state xt
as a linear relationship, where Zt is the time-varying observation matrix and ϵt is a
zero-mean Gaussian error ϵt ∼ N (0, R) with covariance matrix R.
The state transition equation xt+1 = Tt xt + η t expresses the transition of the hidden
state from xt to xt+1 as a linear relationship, where Tt is the time-varying transition
matrix and η t is a zero-mean Gaussian error η t ∼ N (0, Q) with covariance matrix Q.
The Kalman filter is extremely versatile in modeling a variety of real-life processes.9
9
J. Durbin and S. J. Koopman, Time Series Analysis by State Space Methods, 2nd Ed. Oxford University
Press, 2012.
D. Palomar (HKUST) Pairs Trading 40 / 63
Kalman for pairs trading
Kalman filter (state transition equation and observation equation):
xt+1 = Txt + η t
y1t = Zt xt + ϵt
where [ ]
µt
xt ≜ is the hidden state
γ
[ t ]
1 0
T≜ is the state transition matrix
0 1
[ ]
σ12 0
η t ∼ N (0, Q) is the i.i.d. state transition noise with Q =
0 σ22
[ ]
Zt ≜ (1 y2t) is the observation coefficient matrix
ϵt ∼ N 0, σϵ2 is the i.i.d. observation noise
Note that this is a time-varying Kalman filter since Zt is time-varying.
Parameters σ12 , σ22 , σϵ2 can be estimated using the EM algorithm using historical data for
calibration.
The hidden state path xt gives the sought time-varying coefficients.
D. Palomar (HKUST) Pairs Trading 41 / 63
Kalman for pairs trading
Log-prices of ETFs EWH and EWZ:
Log−prices 2000−08−01 / 2003−12−31
2.4 2.4
2.2 2.2
2.0 2.0
1.8 1.8
1.6 1.6
Aug 01 Nov 01 Feb 01 Jun 01 Sep 04 Jan 02 Apr 01 Jul 01 Oct 01 Jan 02 Apr 01 Jul 01 Oct 01 Dec 31
2000 2000 2001 2001 2001 2002 2002 2002 2002 2003 2003 2003 2003 2003
mu.LS
1.0 mu.rolling.LS 1.0
mu.Kalman
0.8 0.8
0.6 0.6
0.4 0.4
Aug 01 Nov 01 Feb 01 May 01 Aug 01 Nov 01 Feb 01 May 01 Aug 01 Nov 01 Feb 03 May 01 Aug 01 Nov 03
2000 2000 2001 2001 2001 2001 2002 2002 2002 2002 2003 2003 2003 2003
gamma.LS
0.6 gamma.rolling.LS 0.6
gamma.Kalman
0.5 0.5
0.4 0.4
0.3 0.3
Aug 01 Nov 01 Feb 01 May 01 Aug 01 Nov 01 Feb 01 May 01 Aug 01 Nov 01 Feb 03 May 01 Aug 01 Nov 03
2000 2000 2001 2001 2001 2001 2002 2002 2002 2002 2003 2003 2003 2003
LS
0.15 rolling.LS 0.15
Kalman
0.10 0.10
0.05 0.05
0.00 0.00
−0.05 −0.05
−0.10 −0.10
−0.15 −0.15
Aug 01 Nov 01 Feb 01 Jun 01 Sep 04 Jan 02 Apr 01 Jul 01 Oct 01 Jan 02 Apr 01 Jul 01 Oct 01 Dec 31
2000 2000 2001 2001 2001 2002 2002 2002 2002 2003 2003 2003 2003 2003
2 Z−score 2
signal
1 1
0 0
−1 −1
−2 −2
−3 −3
Aug 01 Nov 01 Feb 01 May 01 Aug 01 Nov 01 Feb 01 May 01 Aug 01 Nov 01 Feb 03 May 01 Aug 01 Nov 03
2000 2000 2001 2001 2001 2001 2002 2002 2002 2002 2003 2003 2003 2003
2.5 2.5
2.0 2.0
1.5 1.5
1.0 1.0
Aug 01 Nov 01 Feb 01 May 01 Aug 01 Nov 01 Feb 01 May 01 Aug 01 Nov 01 Feb 03 May 01 Aug 01 Nov 03
2000 2000 2001 2001 2001 2001 2002 2002 2002 2002 2003 2003 2003 2003
2 Z−score 2
signal
1 1
0 0
−1 −1
−2 −2
−3 −3
Aug 01 Nov 01 Feb 01 May 01 Aug 01 Nov 01 Feb 01 May 01 Aug 01 Nov 01 Feb 03 May 01 Aug 01 Nov 03
2000 2000 2001 2001 2001 2001 2002 2002 2002 2002 2003 2003 2003 2003
3.0 3.0
2.5 2.5
2.0 2.0
1.5 1.5
1.0 1.0
Aug 01 Nov 01 Feb 01 May 01 Aug 01 Nov 01 Feb 01 May 01 Aug 01 Nov 01 Feb 03 May 01 Aug 01 Nov 03
2000 2000 2001 2001 2001 2001 2002 2002 2002 2002 2003 2003 2003 2003
Z−score
2 signal 2
1 1
0 0
−1 −1
−2 −2
−3 −3
Aug 01 Oct 02 Dec 01 Feb 01 Apr 02 Jun 01 Aug 01 Oct 01 Dec 03 Feb 01 Apr 01 Jun 03 Aug 01 Oct 01 Dec 02 Feb 03 Mar 31
2000 2000 2000 2001 2001 2001 2001 2001 2001 2002 2002 2002 2002 2002 2002 2003 2003
4.0 4.0
3.5 3.5
3.0 3.0
2.5 2.5
2.0 2.0
1.5 1.5
Aug 01 Oct 02 Dec 01 Feb 01 Apr 02 Jun 01 Aug 01 Oct 01 Dec 03 Feb 01 Apr 01 Jun 03 Aug 01 Oct 01 Dec 02 Feb 03 Mar 31
2000 2000 2000 2001 2001 2001 2001 2001 2001 2002 2002 2002 2002 2002 2002 2003 2003
LS
4.0 4.0
rolling.LS
Kalman
3.5 3.5
3.0 3.0
2.5 2.5
2.0 2.0
1.5 1.5
1.0 1.0
Aug 01 Nov 01 Feb 01 May 01 Aug 01 Nov 01 Feb 01 May 01 Aug 01 Nov 01 Feb 03
2000 2000 2001 2001 2001 2001 2002 2002 2002 2002 2003
The Kalman filter can and has been used in many aspects of financial time-series
modeling as one could expect.10
Examples of univariate time series: rate of inflation, national income, level of
unemployment, etc.
Typical models include: local model, trend-cycle decompositions, seasonality, etc.
Examples of multivariate time series: inflation and national income.
Multiple time series allows for more sophisticated models including common factors,
cointegration, etc.
Also data irregularities can be easily handled, e.g., missing observations, outliers, mixed
frequencies.
Plenty of applications for nonlinear and non-Gaussian models as well, e.g., GARCH
modeling and stochastic volatility modeling.
10
A. Harvey and S. J. Koopman, “Unobserved components models in economics and finance: The role of the
Kalman filter in time series econometrics,” IEEE Control Systems Magazine, vol. 29, no. 6, pp. 71–81, 2009.
D. Palomar (HKUST) Pairs Trading 49 / 63
Outline
1 Cointegration
2 Basic Idea of Pairs Trading
3 Design of Pairs Trading
Pairs selection
Cointegration test
Optimum threshold
4 LS Regression and Kalman for Pairs Trading
5 From Pairs Trading to Statistical Arbitrage (StatArb)∗
VECM
Optimization of mean-reverting portfolio (MRP)
6 Summary
From pairs trading to statistical arbitrage
Denote the log-prices of multiple stocks as yt and the log-returns as rt = ∆yt = yt − yt−1 .
Most of the multivariate time-series models attempt to model the log-returns rt (because
the log-prices are nonstationary whereas the log-returns are weakly stationary, at least
over some time horizon).
However, it turns out that differencing the log-prices may destroy part of the structure.
The VECM11 tries to fix that issue by including an additional term in the model:
p−1
∑
rt = ϕ0 + Πyt−1 + Φ̃i rt−i + wt ,
i=1
11
R. F. Engle and C. W. J. Granger, “Co-integration and error correction: Representation, estimation, and
testing,” Econometrica: Journal of the Econometric Society, pp. 251–276, 1987.
D. Palomar (HKUST) Pairs Trading 53 / 63
VECM - Matrix Π
12
Z. Zhao and D. P. Palomar, “Mean-reverting portfolio with budget constraint,” IEEE Trans. Signal
Process., vol. 66, no. 9, pp. 2342–2357, 2018.
13
Z. Zhao, R. Zhou, and D. P. Palomar, “Optimal mean-reverting portfolio with leverage constraint for
statistical arbitrage in finance,” IEEE Trans. Signal Process., vol. 67, no. 7, pp. 1681–1695, 2019.
D. Palomar (HKUST) Pairs Trading 56 / 63
Mean-reverting portfolio (MRP)
For example, if we use the Portmanteau statistics as a proxy for the mean reversion, the
problem formulation becomes:
∑p ( )2
wT Mi w
minimize i=1 wT M0 w
w
subject to wT M0 w = ν
w ∈ W.
Using other proxies, the formulation can be expressed more generally as14
∑p ( )2
minimize wT Hw + λ i=1 wT Mi w
w
subject to wT M0 w = ν
w ∈ W.
14
Z. Zhao and D. P. Palomar, “Mean-reverting portfolio with budget constraint,” IEEE Trans. Signal
Process., vol. 66, no. 9, pp. 2342–2357, 2018.
D. Palomar (HKUST) Pairs Trading 58 / 63
MRP in practice
Observe several stock log-prices and the spreads obtained from β:
5.5
APA
AXP
CAT
5.0 COF
FCX
IBM
MMM
Log-prices
4.5
4.0
3.5
3.0
6.0
s1
5.5
5.4
5.2
s2
5.0
5.6
s3
5.4
5.2
2012 2013 2014
0.4
Spreads
0.2
0 MRP-cro (prop.)
-0.2 Spread s 1
-0.4
0.02
MRP-cro (prop.) - SR=3.2471
ROI
0
-0.02
0.01
Spread s 1 - SR=2.8579
ROI
0
-0.01
8
MRP-cro (prop.)
Spread s 1
6
Cum. P&L
0
2012 2013 2014
https://www.danielppalomar.com