0% found this document useful (0 votes)
68 views

Machine Learning & Portfolio Optimization: Gah-Yi Ban

This document discusses portfolio optimization and the challenges of overfitting when using sample average approximation. It introduces the concept of regularization and performance-based regularization to improve out-of-sample performance. Specifically, it proposes adding a constraint to the sample average approximation problem to limit the sample variance of the portfolio variance. It discusses convex approximations and using cross-validation to tune the regularization parameter.

Uploaded by

miarovyi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Machine Learning & Portfolio Optimization: Gah-Yi Ban

This document discusses portfolio optimization and the challenges of overfitting when using sample average approximation. It introduces the concept of regularization and performance-based regularization to improve out-of-sample performance. Specifically, it proposes adding a constraint to the sample average approximation problem to limit the sample variance of the portfolio variance. It discusses convex approximations and using cross-validation to tune the regularization parameter.

Uploaded by

miarovyi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Machine Learning & Portfolio Optimization

Gah-Yi Ban

NUS-USPC Workshop on Machine Learning and FinTech


Nov 2017

1 / 90
Portfolio Optimization

Consider the portfolio optimization problem (Markowitz, 1952):

min w> Σw
w∈Rp
s.t. w> µ = R (MV)
w> 1 = 1

where
I X: p × 1 random vector of relative returns
I µ = E(X): mean returns
I Σ = Cov (X): p × p covariance matrix for the relative returns
I Solution: w0 (R)
I Same if return constraint is relaxed to w> µ ≥ R

2 / 90
Sample Average Approximation

I In practice, we don’t know the distribution P of X but have data


I Suppose we have n iid observations of asset returns from P:
Xn = [x1 , . . . , xn ].
I Then solve

min w> Σ̂1:n w


w∈Rp
s.t. w> µ̂n = R (SAA)
w> 1 = 1

where
I Σ̂1:n is the sample covariance matrix of [x1 , . . . , xn ].
I µ̂n is the sample average of the returns
I Solution: ŵSAA (R)

3 / 90
In-sample vs. Out-of-sample performance

Three types of performance measures:


I In-sample performance: the performance of the learned action in
the (training) sample, i.e. the data you used to learn
I Out-of-sample, or test, or generalization performance: the average
performance of the learned action over all possible new
observations
I Expected test, or true performance: the average performance of
the learned action over all possible training sets and over all
possible new observations
Note 1: for typical ML prediction problems, think error not performance.
E.g. in-sample error, out-of-sample error, prediction error
Note 2: Training performance always overestimates (w.p. 1) both the
out-of-sample and expected performances (why?)

4 / 90
In-sample vs. Out-of-sample return

In-sample (aka “training”) return:

ŵ>
SAA µ̂n

Out-of-sample (aka “test” or “generalization”) return:

EXn+1 [ŵ> >


SAA Xn+1 |Xn ] = ŵSAA µ

Expected test (aka “true”) return:

EXn [EXn+1 [ŵ>


SAA Xn+1 |Xn ]]

5 / 90
In-sample vs. Out-of-sample risk

In-sample risk:
ŵ>
SAA Σ̂1:n ŵSAA

Out-of-sample risk:

VarXn+1 [ŵ> >


SAA Xn+1 |Xn ] = ŵSAA ΣŵSAA

Expected test risk:

EXn [VarXn+1 [ŵ>


SAA Xn+1 |Xn ]]

6 / 90
Performance of SAA: Simulated Data

Fix (ν, Q) and target return level R. Then for b = 1, . . . , B,

iid
I Generate Xb,n = [xb,1 , . . . , xb,n ], where Xb,i ∼ N (ν, Q) for all
i = 1, . . . , n
I Solve the SAA problem for ŵb,SAA
I Compute its out-of-sample return and risk: ŵ>
b,SAA ν and
>
ŵb,SAA Q ŵb,SAA

7 / 90
Performance of SAA
Return vs. Risk

8 / 90
Performance of SAA
Return vs. Risk

9 / 90
SAA is an error-maximizing algorithm

I Although SAA makes intuitive sense, it is highly unreliable for


portfolio optimization with real stock return data
I This is well-documented across finance, statistics and OR:
I Markowitz: Frankfurter et al. (1971), Frost & Savarino (1986,
1988b), Michaud (1989), Best & Grauer (1991), Chopra & Ziemba
(1993), Broadie (1993), Lim et al. (2011)
I Michaud (1989): The (in-sample) portfolio optimization solution is
an “error-maximizing” solution

10 / 90
Regularization
I Regularization: perturbing a linear operator problem for improved
stability of solution [Ivanov (1962), Phillips (1962), Tikhonov
(1963)]
I E.g. Least-squares regression with regularization:

min ||y − Xβ||2 + λn ||β||k ,


β∈Rp

where λn is the degree of regularization, and k = 1 (LASSO),


k = 2 (ridge regression) yield popular penalty functions.
I Intuition: perturbing the in-sample problem reduces over-fitting; it
adds bias but can improve the variance, which is good for
generalization
I In general, L − 1 norm penalty yields sparse (many elements are exactly
zero) solution vector and L − 2 norm penalty yields dense (many small,
but non-zero elements) solution vector
I While these have justifications in regression problems, it’s not clear why
one would want sparse or dense portfolio solutions
11 / 90
Performance-based regularization (PBR)

I Performance-based regularization: perturb portfolio problem for


improved performance of the solution

min w> Σ̂n w


w
s.t. w> µ̂n = R
w> 1 = 1
SVar (w> Σ̂n w) ≤ U

I Intuition: penalize solutions w associated with greater estimation


errors of objective

12 / 90
Schematic for PBR

13 / 90
PBR for Mean-Variance problem

The sample variance of the sample variance of the portfolio,


SVar (w 0 Σ̂n w) is given by:

SVar (w 0 Σ̂n w) = Σpi=1 Σpj=1 Σpk=1 Σpl=1 wi wj wk wl Q̂ijkl ,

where
I Q̂ijkl = n1 (µ̂4,ijkl − σ̂ij2 σ̂kl2 ) + 1 2 2
n(n−1) (σ̂ik σ̂jl + σ̂il2 σ̂jk2 ),
I µ̂4,ijkl is the sample average estimator for µ4,ijkl , the fourth central
moment of the elements of X
I σ̂ij2 is the sample average estimator for σij2 , the covariance of the
elements of X .
PBR constraint for Markowitz is thus a quartic polynomial. However,
determining whether a quartic function is convex or not is an NP-hard
problem [Ahmadi et al. (2013)]

14 / 90
PBR for Mean-Variance problem

Convex approximation I
I Rank-1 approximation:

(w> α̂)4 ≈ Σpi=1 Σpj=1 Σpk=1 Σpl=1 wi wj wk wl Q̂ijkl ,


q
4
where α̂i = Q̂iiii .

Approximate PBR constraint: w> α̂ ≤
4
I U

15 / 90
PBR for Mean-Variance problem
Convex approximation II
I Best convex quadratic approximation:

(w> Aw)2 ≈ Σpi=1 Σpj=1 Σpk=1 Σpl=1 wi wj wk wl Q̂ijkl ,

such that the elements of A are as close as possible to the


pair-wise terms of Q, i.e. A2ij ≈ Q̂ijij
I Solve semidefinite program: A∗ = argmin ||A − Q2 ||F , where Q2 is
A0
a matrix with ij-th element equalling Q̂ijij and || · ||F denotes the
Frobenius norm: v
u m X n
uX
||A||F = t |aij |2
i=1 j=1


I Approximate PBR constraint: w> A∗ w ≤ U

16 / 90
Cross-Validation (CV)
Cross-Validation: if there’s enough data, put aside some for tuning free
parameters (the “validation data set”). E.g. 50% for training, 25% for
validation and 25% for testing

k -fold Cross Validation: divide into 2 ≤ k ≤ n sub-training sets to maximize


use of scarce training data.

Larger then k, the better the estimation of expected test error, but greater the
computational burden and variance. k = 5, 10 are known to balance the
trade-offs well. k = n is leave-one-out CV
17 / 90
Performance-based CV

I CV: common technique in machine learning to tune free


parameters
I k-fold CV: split training data into k equally-sized bins, train
statistical model on every possible combination of k − 1 bins, then
tune parameter on the remaining bin.
I Performance-based k-fold CV: (1) search boundary for U1 needs
to be set carefully in order to avoid infeasibility and having no
effect; (2) tune parameters by the Sharpe ratio, not by the mean
squared error

18 / 90
Performance-based Cross-Validation

Figure: A schematic explaining the out-of-sample performance-based k -cross


validation (OOS-PBCV) algorithm used to calibrate the constraint rhs, U, for
the case k = 3. The training data set is split into k bins, and the optimal U for
the entire training data set is found by averaging the best U found for each
subset of the training data.
19 / 90
Empirical Results: Fama-French data sets
OOS Average Sharpe Ratio (Return/Std)
FF 5 Industry FF 10 Industry
p=5 p=10
Mean-Variance R=0.04
SAA 1.1459 1.1332
2 bins 3 bins 2 bins 3 bins
PBR (rank-1) 1.2603 1.3254 1.1868 1.2098
(0.0411) (0.0286) (0.0643) (0.0509)
PBR (PSD) 1.1836 1.1831 1.1543 1.1678
(0.0743) (0.071) (0.0891) (0.0816)
NS 1.0023 0.9968
(0.1404) (0.1437)
L1 1.0136 1.0386 1.1185 1.1175
(0.1568) (0.1396) (0.1008) (0.1017)
L2 0.9711 1.0268 1.0579 1.0699
(0.1781) (0.1452) (0.1482) (0.1280)
Parentheses: p-values of tests of differences from the SAA method.
20 / 90
Empirical Results: Fama-French data sets
OOS Average Sharpe Ratio (Return/Std)
FF 5 Industry FF 10 Industry
p=5 p=10
Markowitz R=0.08
SAA 1.1573 1.1225
2 bins 3 bins 2 bins 3 bins
PBR (rank-1) 1.3286 1.3551 1.1743 1.2018
(0.0223) (0.0208) (0.0668) (0.0510)
PBR (PSD) 1.1813 1.1952 1.1467 1.1575
(0.0648) (0.0614) (0.0893) (0.0844)
NS 0.9664 0.9405
(0.1514) (0.1577)
L1 0.9225 0.9965 1.0318 1.0779
(0.1857) (0.1403) (0.1332) (0.1181)
L2 0.9703 1.0284 1.0671 1.0776
(0.1649) (0.1398) (0.1398) (0.1209)
Parentheses: p-values of tests of differences from the SAA method.
21 / 90
Mean-CVaR Portfolio Optimization

Consider the mean-Conditional Value-at-Risk portfolio optimization


problem:

min CVaR(w; X, β)
w
s.t. w> µ = R (1)
w> 1 = 1

where  
1
I CVaR(w; X, β) = min α + E(−α − w> Xi )+
α 1−β

22 / 90
Conditional Value-at-Risk

 
1 > +
I CVaR(w; X, β) = min α + E(−α − w Xi )
α 1−β
I β = cutoff level, e.g. 95%, 99%
I Pros: tell you how thick the loss tail is; also a coherent risk
measure [Acerbi & Tasche (2001)]

23 / 90
SAA for mean-CVaR problem

I Data: n iid observations of asset returns Xn = X1 , . . . , Xn ∼ P

\ n (w; Xn , β)
min CVaR
w
s.t. w> µ̂n = R
w> 1 = 1

where
I µ̂n is the sample average return;
n
1 X
I \ n (w; Xn , β) = min{α +
CVaR (−α − w> Xi )+ }
α n(1 − β)
i=1

24 / 90
PBR for mean-CVaR problem

Proposition
iid
Suppose Xn = [X1 , . . . , Xn ] ∼ F , where F is absolutely continuous with
twice continuously differentiable pdf. Then

1
\ n (w; Xn , β)] =
Var [CVaR Var [(−w> Xn − αβ (w))+ ] + O(n−2 ),
n(1 − β)2

where
αβ (w) = inf{α : P(−w> X ≥ α) ≤ 1 − β},
the Value-at-Risk (VaR) of the portfolio w at level β.

25 / 90
PBR for mean-CVaR problem

min \ n (w; Xn , β)
CVaR
w
s.t. w> µ̂n = R
w> 1 = 1
1
n(1−β)2
z > Ωn z ≤ U1
1 >
n w Σ̂n w ≤ U2
zi = max(0, −w> Xi − α), i = 1, . . . , n.

I Not convex. Combinatorial optimization problem


I Theorem: convex relaxation, a QCQP, is tight
I Tune U1 and U2 via performance based k -fold CV

26 / 90
Empirical Results: mean-CVaR
OOS Average Sharpe Ratio (Return/CVaR)
FF 5 Industry FF 10 Industry
p=5 p=10
Mean-CVaR R=0.04
SAA 1.2137 1.0321
2 bins 3 bins 2 bins 3 bins
PBR (CVaR only) 1.2113 1.1733 1.0506 1.1381
(0.0554) (0.0674) (0.0638) (0.0312)
PBR (mean only) 1.2089 1.1802 1.0994 1.0519
(0.0746) (0.0790) (0.1051) (0.1338)
PBR (both) 1.2439 1.2073 1.1112 1.1422
(0.0513) (0.0601) (0.0691) (0.0648)
L1 1.0112 1.0754 0.9254 0.9741
(0.1497) (0.1366) (0.2293) (0.1880)
L2 0.9650 1.0636 1.0031 0.9835
(0.1780) (0.1287) (0.1512) (0.1598)
Parentheses: p-values of tests of differences from the SAA method.
27 / 90
Empirical Results: mean-CVaR
OOS Average Sharpe Ratio (Return/CVaR)
FF 5 Industry FF 10 Industry
p=5 p=10
Mean-CVaR R=0.08
SAA 1.2487 1.0346
2 bins 3 bins 2 bins 3 bins
PBR (CVaR only) 1.2493 1.2098 1.0551 1.1433
(0.0434) (0.0462) (0.0579) (0.0323)
PBR (mean only) 1.2480 1.2088 1.0987 1.0470
(0.0591) (0.0693) (0.1053) (0.1384)
PBR (both) 1.2715 1.2198 1.1122 1.1449
(0.0453) (0.0544) (0.0664) (0.0639)
L1 0.8921 0.9836 0.9416 1.0087
(0.1964) (0.1572) (0.2122) (0.1645)
L2 0.9367 1.0801 1.0278 0.9947
(0.1989) (0.1179) (0.1323) (0.1530)
Parentheses: p-values of tests of differences from the SAA method.
28 / 90
Summary

I In general, in-sample optimal actions (predictions/decisions) do


not generalize well out-of-sample. For the portfolio selection
problem, solutions overweigh idiosyncratic observations in the
training data.
I Regularization: L1 , L2 norm penalties are standard, we explored
more complex ones (PBR) to focus on the performance of a
decision, rather than the prediction error.
I Performance-based Cross-Validation: data-driven methods to
tune regularization parameters
I Can expect better out-of-sample performance with optimal amount
of regularization that balances bias and variance.
I PBR solutions are better than SAA and L1 , L2 regularized
solutions (and other benchmarks) on well-known, publicly
available data sets.

29 / 90
References

I Ban, Gah-Yi, Noureddine El Karoui, and Andrew EB Lim.


“Machine Learning and Portfolio Optimization.” Management
Science, Articles in Advance, 21 Nov 2016.

30 / 90

You might also like