Machine Learning & Portfolio Optimization: Gah-Yi Ban
Machine Learning & Portfolio Optimization: Gah-Yi Ban
Gah-Yi Ban
1 / 90
Portfolio Optimization
min w> Σw
w∈Rp
s.t. w> µ = R (MV)
w> 1 = 1
where
I X: p × 1 random vector of relative returns
I µ = E(X): mean returns
I Σ = Cov (X): p × p covariance matrix for the relative returns
I Solution: w0 (R)
I Same if return constraint is relaxed to w> µ ≥ R
2 / 90
Sample Average Approximation
where
I Σ̂1:n is the sample covariance matrix of [x1 , . . . , xn ].
I µ̂n is the sample average of the returns
I Solution: ŵSAA (R)
3 / 90
In-sample vs. Out-of-sample performance
4 / 90
In-sample vs. Out-of-sample return
ŵ>
SAA µ̂n
5 / 90
In-sample vs. Out-of-sample risk
In-sample risk:
ŵ>
SAA Σ̂1:n ŵSAA
Out-of-sample risk:
6 / 90
Performance of SAA: Simulated Data
iid
I Generate Xb,n = [xb,1 , . . . , xb,n ], where Xb,i ∼ N (ν, Q) for all
i = 1, . . . , n
I Solve the SAA problem for ŵb,SAA
I Compute its out-of-sample return and risk: ŵ>
b,SAA ν and
>
ŵb,SAA Q ŵb,SAA
7 / 90
Performance of SAA
Return vs. Risk
8 / 90
Performance of SAA
Return vs. Risk
9 / 90
SAA is an error-maximizing algorithm
10 / 90
Regularization
I Regularization: perturbing a linear operator problem for improved
stability of solution [Ivanov (1962), Phillips (1962), Tikhonov
(1963)]
I E.g. Least-squares regression with regularization:
12 / 90
Schematic for PBR
13 / 90
PBR for Mean-Variance problem
where
I Q̂ijkl = n1 (µ̂4,ijkl − σ̂ij2 σ̂kl2 ) + 1 2 2
n(n−1) (σ̂ik σ̂jl + σ̂il2 σ̂jk2 ),
I µ̂4,ijkl is the sample average estimator for µ4,ijkl , the fourth central
moment of the elements of X
I σ̂ij2 is the sample average estimator for σij2 , the covariance of the
elements of X .
PBR constraint for Markowitz is thus a quartic polynomial. However,
determining whether a quartic function is convex or not is an NP-hard
problem [Ahmadi et al. (2013)]
14 / 90
PBR for Mean-Variance problem
Convex approximation I
I Rank-1 approximation:
15 / 90
PBR for Mean-Variance problem
Convex approximation II
I Best convex quadratic approximation:
√
I Approximate PBR constraint: w> A∗ w ≤ U
16 / 90
Cross-Validation (CV)
Cross-Validation: if there’s enough data, put aside some for tuning free
parameters (the “validation data set”). E.g. 50% for training, 25% for
validation and 25% for testing
Larger then k, the better the estimation of expected test error, but greater the
computational burden and variance. k = 5, 10 are known to balance the
trade-offs well. k = n is leave-one-out CV
17 / 90
Performance-based CV
18 / 90
Performance-based Cross-Validation
min CVaR(w; X, β)
w
s.t. w> µ = R (1)
w> 1 = 1
where
1
I CVaR(w; X, β) = min α + E(−α − w> Xi )+
α 1−β
22 / 90
Conditional Value-at-Risk
1 > +
I CVaR(w; X, β) = min α + E(−α − w Xi )
α 1−β
I β = cutoff level, e.g. 95%, 99%
I Pros: tell you how thick the loss tail is; also a coherent risk
measure [Acerbi & Tasche (2001)]
23 / 90
SAA for mean-CVaR problem
\ n (w; Xn , β)
min CVaR
w
s.t. w> µ̂n = R
w> 1 = 1
where
I µ̂n is the sample average return;
n
1 X
I \ n (w; Xn , β) = min{α +
CVaR (−α − w> Xi )+ }
α n(1 − β)
i=1
24 / 90
PBR for mean-CVaR problem
Proposition
iid
Suppose Xn = [X1 , . . . , Xn ] ∼ F , where F is absolutely continuous with
twice continuously differentiable pdf. Then
1
\ n (w; Xn , β)] =
Var [CVaR Var [(−w> Xn − αβ (w))+ ] + O(n−2 ),
n(1 − β)2
where
αβ (w) = inf{α : P(−w> X ≥ α) ≤ 1 − β},
the Value-at-Risk (VaR) of the portfolio w at level β.
25 / 90
PBR for mean-CVaR problem
min \ n (w; Xn , β)
CVaR
w
s.t. w> µ̂n = R
w> 1 = 1
1
n(1−β)2
z > Ωn z ≤ U1
1 >
n w Σ̂n w ≤ U2
zi = max(0, −w> Xi − α), i = 1, . . . , n.
26 / 90
Empirical Results: mean-CVaR
OOS Average Sharpe Ratio (Return/CVaR)
FF 5 Industry FF 10 Industry
p=5 p=10
Mean-CVaR R=0.04
SAA 1.2137 1.0321
2 bins 3 bins 2 bins 3 bins
PBR (CVaR only) 1.2113 1.1733 1.0506 1.1381
(0.0554) (0.0674) (0.0638) (0.0312)
PBR (mean only) 1.2089 1.1802 1.0994 1.0519
(0.0746) (0.0790) (0.1051) (0.1338)
PBR (both) 1.2439 1.2073 1.1112 1.1422
(0.0513) (0.0601) (0.0691) (0.0648)
L1 1.0112 1.0754 0.9254 0.9741
(0.1497) (0.1366) (0.2293) (0.1880)
L2 0.9650 1.0636 1.0031 0.9835
(0.1780) (0.1287) (0.1512) (0.1598)
Parentheses: p-values of tests of differences from the SAA method.
27 / 90
Empirical Results: mean-CVaR
OOS Average Sharpe Ratio (Return/CVaR)
FF 5 Industry FF 10 Industry
p=5 p=10
Mean-CVaR R=0.08
SAA 1.2487 1.0346
2 bins 3 bins 2 bins 3 bins
PBR (CVaR only) 1.2493 1.2098 1.0551 1.1433
(0.0434) (0.0462) (0.0579) (0.0323)
PBR (mean only) 1.2480 1.2088 1.0987 1.0470
(0.0591) (0.0693) (0.1053) (0.1384)
PBR (both) 1.2715 1.2198 1.1122 1.1449
(0.0453) (0.0544) (0.0664) (0.0639)
L1 0.8921 0.9836 0.9416 1.0087
(0.1964) (0.1572) (0.2122) (0.1645)
L2 0.9367 1.0801 1.0278 0.9947
(0.1989) (0.1179) (0.1323) (0.1530)
Parentheses: p-values of tests of differences from the SAA method.
28 / 90
Summary
29 / 90
References
30 / 90