Pdfcoffee.com Portfolio Theory and Risk Managementpdf PDF Free
Pdfcoffee.com Portfolio Theory and Risk Managementpdf PDF Free
With its emphasis on examples, exercises and calculations, this book suits advanced
undergraduates as well as postgraduates and practitioners. It provides a clear treatment
of the scope and limitations of mean-variance portfolio theory and introduces popular
modern risk measures. Proofs are given in detail, assuming only modest mathematical
background, but with attention to clarity and rigour. The discussion of VaR and its
more robust generalizations, such as AVaR, brings recent developments in risk measures
within range of some undergraduate courses and includes a novel discussion of reducing
VaR and AVaR by means of hedging techniques.
A moderate pace, careful motivation and more than 70 exercises give students confi-
dence in handling risk assessments in modern finance. Solutions and additional materi-
als for instructors are available at www.cambridge.org/9781107003675.
Mastering Mathematical Finance is a series of short books that cover all core topics
and the most common electives offered in Master’s programmes in mathematical or
quantitative finance. The books are closely coordinated and largely self-contained, and
can be used efficiently in combination but also individually.
The MMF books start financially from scratch and mathematically assume only under-
graduate calculus, linear algebra and elementary probability theory. The necessary
mathematics is developed rigorously, with emphasis on a natural development of math-
ematical ideas and financial intuition, and the readers quickly see real-life financial
applications, both for motivation and as the ultimate end for the theory. All books are
written for both teaching and self-study, with worked examples, exercises and solutions.
Series editors Marek Capiński, AGH University of Science and Technology, Kraków;
Ekkehard Kopp, University of Hull; Tomasz Zastawniak, University of York
Portfolio Theory and Risk Management
EKKEHARD KOPP
University of Hull, Hull, UK
University Printing House, Cambridge CB2 8BS, United Kingdom
www.cambridge.org
Information on this title: www.cambridge.org/9781107003675
© Maciej J. Capiński and Ekkehard Kopp 2014
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2014
Printed in the United Kingdom by TJ International Ltd, Padstow Cornwall
A catalogue record for this publication is available from the British Library
Library of Congress Cataloguing in Publication data
Capiński, Maciej J.
Portfolio theory and risk management / Maciej J. Capiński, AGH University of Science and
Technology, Kraków, Poland, Ekkehard Kopp, University of Hull, Hull, UK.
pages cm – (Mastering mathematical finance)
Includes bibliographical references and index.
ISBN 978-1-107-00367-5 (Hardback) – ISBN 978-0-521-17714-6 (Paperback)
1. Portfolio management. 2. Risk management. 3. Investment analysis.
I. Kopp, P. E., 1944– II. Title.
HG4529.5.C366 2014
332.6–dc23 2014006178
ISBN 978-1-107-00367-5 Hardback
ISBN 978-0-521-17714-6 Paperback
Additional resources for this publication at www.cambridge.org/9781107003675
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
To Anna, Emily, Staś, Weronika and Helenka
Contents
Preface page ix
1 Risk and return 1
1.1 Expected return 2
1.2 Variance as a risk measure 5
1.3 Semi-variance 9
2 Portfolios consisting of two assets 11
2.1 Return 12
2.2 Attainable set 15
2.3 Special cases 20
2.4 Minimum variance portfolio 23
2.5 Adding a risk-free security 25
2.6 Indifference curves 28
2.7 Proofs 31
3 Lagrange multipliers 35
3.1 Motivating examples 35
3.2 Constrained extrema 40
3.3 Proofs 44
4 Portfolios of multiple assets 48
4.1 Risk and return 48
4.2 Three risky securities 52
4.3 Minimum variance portfolio 54
4.4 Minimum variance line 57
4.5 Market portfolio 62
5 The Capital Asset Pricing Model 67
5.1 Derivation of CAPM 68
5.2 Security market line 71
5.3 Characteristic line 73
6 Utility functions 76
6.1 Basic notions and axioms 76
6.2 Utility maximisation 80
6.3 Utilities and CAPM 92
6.4 Risk aversion 95
vii
viii Contents
7 Value at Risk 98
7.1 Quantiles 99
7.2 Measuring downside risk 102
7.3 Computing VaR: examples 104
7.4 VaR in the Black–Scholes model 109
7.5 Proofs 120
8 Coherent measures of risk 124
8.1 Average Value at Risk 125
8.2 Quantiles and representations of AVaR 127
8.3 AVaR in the Black–Scholes model 136
8.4 Coherence 146
8.5 Proofs 154
Index 159
Preface
In this fifth volume of the series ‘Mastering Mathematical Finance’ we
present a self-contained rigorous account of mean-variance portfolio the-
ory, as well as a simple introduction to utility functions and modern risk
measures.
Portfolio theory, exploring the optimal allocation of wealth among dif-
ferent assets in an investment portfolio, based on the twin objectives of
maximising return while minimising risk, owes its mathematical formula-
tion to the work of Harry Markowitz1 in 1952; for which he was awarded
the Nobel Prize in Economics in 1990. Mean-variance analysis has held
sway for more than half a century, and forms part of the core curriculum
in financial economics and business studies. In these settings mathematical
rigour may suffer at times, and our aim is to provide a carefully motivated
treatment of the mathematical background and content of the theory, as-
suming only basic calculus and linear algebra as prerequisites.
Chapter 1 provides a brief review of the key concepts of return and risk,
while noting some defects of variance as a risk measure. Considering a
portfolio with only two risky assets, we show in Chapter 2 how the mini-
mum variance portfolio, minimum variance line, market portfolio and cap-
ital market line may be found by elementary calculus methods. Chapter 3
contains a careful account of the method of Lagrange multipliers, includ-
ing a discussion of sufficient conditions for extrema in the special case of
quadratic forms. These techniques are applied in Chapter 4 to generalise
the formulae obtained for two-asset portfolios to the general case.
The derivation of the Capital Asset Pricing Model (CAPM) follows in
Chapter 5, including two proofs of the CAPM formula, based, respectively,
on the underlying geometry (to elucidate the role of beta) and linear alge-
bra (leading to the security market line), and introducing performance mea-
sures such as the Jensen index and Sharpe ratio. The security characteristic
line is shown to aid the least-squares estimation of beta using historical
portfolio returns and the market portfolio.
Chapter 6 contains a brief introduction to utility theory. To keep matters
simple we restrict to finite sample spaces to discuss preference relations.
1
H. Markowitz, Portfolio selection, Journal of Finance 7 (1), (1952), 77–91.
ix
x Preface
We consider examples of von Neumann–Morgenstern utility functions, link
utility maximisation with the No Arbitrage Principle and explain the key
role of state price vectors. Finally, we explore the link between utility max-
imisation and the CAPM and illustrate the role of the certainty equivalent
for the risk averse investor.
In the final two chapters the emphasis shifts from variance to measures
of downside risk. Chapter 7 contains an account of Value at Risk (VaR),
which remains popular in practice despite its well-documented shortcom-
ings. Following a careful look at quantiles and the algebraic properties of
VaR, our emphasis is on computing VaR, especially for assets within the
Black–Scholes framework. A novel feature is an account of VaR-optimal
hedging with put options, which is shown to reduce to a linear program-
ming problem if the parameters are chosen with care.
In Chapter 8 we examine how the defects of VaR can be addressed using
coherent risk measures. The principal example discussed is Average Value
at Risk (AVaR), which is described in detail, including a careful proof of
sub-additivity. AVaR is placed in the context of coherent risk measures, and
generalised to yield spectral risk measures. The analysis of hedging with
put options in the Black–Scholes setting is revisited, with AVaR in place of
VaR, and the outcomes are compared in examples.
Throughout this volume the emphasis is on examples, applications and
computations. The underlying theory is presented rigorously, but as simply
as possible. Proofs are given in detail, with the more demanding ones left to
the end of each chapter to avoid disrupting the flow of ideas. Applications
presented in the final chapters make use of background material from the
earlier volumes [PF] and [BSM] in the current series. The exercises form
an integral part of the volume, and range from simple verification to more
challenging problems. Solutions and additional material can be found at
www.cambridge.org/9781107003675, which will be updated regularly.
1
Risk and return
Financial investors base their activity on the expectation that their invest-
ment will increase over time, leading to an increase in wealth. Over a fixed
time period, the investor seeks to maximise the return on the investment,
that is, the increase in asset value as a proportion of the initial investment.
The final values of most assets (other than loans at a fixed rate of interest)
are uncertain, so that the returns on these investments need to be expressed
in terms of random variables. To estimate the return on such an asset by a
single number it is natural to use the expected value of the return, which
averages the returns over all possible outcomes.
Our uncertainty about future market behaviour finds expression in the
second key concept in finance: risk. Assets such as stocks, forward con-
tracts and options are risky because we cannot predict their future values
with certainty. Assets whose possible final values are more ‘widely spread’
are naturally seen as entailing greater risk. Thus our initial attempt to mea-
sure the riskiness of a random variable will measure the spread of the re-
turn, which rational investors will seek to minimise while maximising their
return.
In brief, return reflects the efficiency of an investment, risk is concerned
with uncertainty. The balance between these two is at the heart of portfo-
lio theory, which seeks to find optimal allocations of the investor’s initial
wealth among the available assets: maximising return at a given level of
risk and minimising risk at a given level of expected return.
1
2 Risk and return
Example 1.1
Assume that S (0) = 100 and
with probability 12 ,
(
120
S (1) =
90 with probability 12 .
Then E(S (1)) = 12 120 + 12 90 = 105 and Var(S (1)) = (120 − 105)2 12 +
(90 − 105)2 12 = 152 . Observe also that the
√ standard deviation, which is the
square root of the variance, is equal to Var(S (1)) = 15.
1.1 Expected return 3
and
Z ∞
Var(S (1)) = (x − E(S (1)))2 f (x)dx.
−∞
Example 1.2
Assume that S (1) = S (0) exp (m + sZ) , where Z is a random variable with
standard normal distribution N(0, 1). This means that S (1) has lognormal
distribution. The density function of S (1) is equal to
x −m 2
1 (ln S (0) )
f (x) = √ e− 2s2 for x > 0,
xs 2π
and 0 for x ≤ 0. We can compute the expected price as
Z ∞
E(S (1)) = x f (x)dx
0
2
∞
1 − (ln S (0)2−m)
Z x
= √ e 2s dx
0 s 2π
Z ∞ !
sy+m 1 1 x
2
− y2
= S (0)e √ e dy (taking y = ln −m )
−∞ 2π s S (0)
Z ∞
s2 1 (y−s)2
= S (0)em+ 2 √ e− 2 dy
−∞ 2π
s2
= S (0)em+ 2 .
4 Risk and return
While we may allow any probability space, we must make sure that
negative values of the random variable S (1) are excluded since negative
prices make no sense from the point of view of economics. This means
that the distribution of S (1) has to be supported on [0, +∞) (meaning that
P(S (1) ≥ 0) = 1).
The return (also called the rate of return) on the investment S is a ran-
dom variable K : Ω → R, defined as
S (1) − S (0)
K= .
S (0)
By the linearity of mathematical expectation, the expected (or mean) re-
turn is given by
E(S (1)) − S (0)
E(K) = .
S (0)
We introduce the convention of using the Greek letter µ for expectations of
various random returns
µ = E(K),
which illustrates the possibility of reversing the approach: given the returns
we can find the prices.
The requirement that S (1) is nonnegative implies that we must have
K ≥ −1. This in particular excludes the possibility of considering K with
Gaussian (normal) distribution.
At time 1 a dividend may be paid. In practice, after the dividend is paid,
the stock price drops by this amount, which is logical. Thus we have to
determine the price that includes the dividend; more precisely, we must
distinguish between the right to receive that price (the cum dividend price)
and the price after the dividend is paid (the ex dividend price). We assume
1.2 Variance as a risk measure 5
that S (1) denotes the latter, hence the definition of the return has to be
modified to account for dividends:
S (1) + Div(1) − S (0)
K= .
S (0)
A bond is a special security that pays a certain sum of money, known
in advance, at maturity; this sum is the same in each state. The return on a
bond is not random (recall that we are dealing with a single time period).
Consider a bond paying a unit of home currency at time 1, that is B(1) = 1,
which is purchased for B(0) < 1. Then
1 − B(0)
R=
B(0)
defines the risk-free return. The bond price can be expressed as
1
B(0) = ,
1+R
giving the present value of a unit at time 1.
Exercise 1.3 Compute the expected returns for the stocks described
in Exercise 1.1 and Example 1.2.
Exercise 1.4 Assume that S (0) = 80 and that the ex dividend price
is
with probability 16 ,
60
S (1) = with probability 36 ,
80
with probability 26 .
90
The company will pay out a constant dividend (independent of the fu-
ture stock price). Compute the dividend for which the expected return
on stock would be 20%.
Standard deviation alone does not fully capture the risk of an investment.
We illustrate this with a simple example.
Example 1.4
Consider three assets with today’s prices S i (0) = 100 for i = 1, 2, 3 and
time 1 prices with the following distributions:
with probability 12 ,
(
120
S 1 (1) =
90 with probability 12 ,
with probability 12 ,
(
140
S 2 (1) =
90 with probability 12 ,
with probability 12 ,
(
130
S 3 (1) =
100 with probability 12 .
We can see that
σ1 = Var(K1 ) = 0.15,
p
σ2 = Var(K2 ) = 0.25,
p
σ3 = Var(K2 ) = 0.15.
p
Here σ2 > σ1 and σ3 = σ1 , but both the second and third assets are
preferable to the first, since at time 1 they bring in more cash. We shall
return to this example in the next section.
Definition 1.5
We say that a security with expected return µ1 and standard deviation σ1
dominates another security with expected return µ2 and standard devia-
tion σ2 whenever
µ1 ≥ µ2 and σ1 ≤ σ2 .
Exercise 1.6 Assume that we have three assets. The first has ex-
pected return µ1 = 10% and standard deviation of return equal to
σ1 = 0.25. The second has expected return µ2 = 15% and standard
deviation of return equal to σ2 = 0.3. Assume√ that the future prices of
the third asset will have E(S 3 (1)) = 100, Var(S 3 (1)) = 20. Find the
ranges of prices S 3 (0) so that the following conditions are satisfied:
(i) The third asset dominates the first asset.
(ii) The third asset dominates the second asset.
(iii) No asset is dominated by another asset.
1.3 Semi-variance
Consider the three assets described in Example 1.4. Although σ1 = σ3 ,
the third asset carries no ‘downside risk’, since neither outcome for S 3 (1)
involves a loss for the investor. Similarly, although σ2 > σ1 , the downside
risk for the second asset is the same as that for the first (a 50% chance of
incurring a loss of 10), but the expected return for the second asset is 15%,
making it the more attractive investment even though, as measured by vari-
ance, it is more risky. Since investors regard risk as concerned with failure
(i.e. downside risk), the following modification of variance is sometimes
used. It is called semi-variance and is computed by a formula that takes
into account only the unfavourable outcomes, where the return is below the
expected value
E(min{0, K − µ})2 . (1.1)
The square root of semi-variance is denoted by semi-σ. However, this no-
tion still does not agree fully with the intuition.
Example 1.6
Assume that Ω = {ω1 , ω2 }, P({ω1 }) = P({ω2 }) = 1
2
and
K(ω1 ) = 10%,
K(ω2 ) = 20%.
10 Risk and return
E(min{0, K − R})2 ,
which eliminates the above unwanted feature. Instead of the risk-free rate,
one can also consider the return required by the investor.
These versions are not very popular in the financial world, the variance
being the basic measure of risk. In our presentation of portfolio theory
we follow the historical tradition and take variance as the measure of risk.
It is possible to develop a version of the theory for alternative ways of
measuring risk. In most cases, however, such theories do not produce neat
analytic formulae as is the case for the mean and variance.
We will return to a more general discussion of risk measures in the final
chapters of this volume. An analysis of the popular concept of Value at
Risk (VaR), which has been used extensively in the banking and investment
sectors since the 1990s, will lead us to conclude that, despite its ubiquity,
this risk measure has serious shortcomings, especially when dealing with
mixed distributions. We will then examine an alternative which remedies
these defects but still remains mathematically tractable.
2
Portfolios consisting of two assets
2.1 Return
2.2 Attainable set
2.3 Special cases
2.4 Minimum variance portfolio
2.5 Adding a risk-free security
2.6 Indifference curves
2.7 Proofs
We begin our discussion of portfolio risk and expected return with portfo-
lios consisting of just two securities. This has the advantage that the key
concepts of mean-variance portfolio theory can be expressed in simple ge-
ometric terms.
For a given allocation of resources between the two assets comprising
the portfolio, the mean and variance of the return on the entire portfolio
are expressed in terms of the means and variances of, and (crucially) the
covariance between, the returns on the individual assets. This enables us
to examine the set of all feasible weightings of (in other words, allocations
of funds to) the different assets in the portfolio, and to find the unique
weighting with minimum variance. We also find the collection of efficient
portfolios – ones that are not dominated by any other. Finally, adding a
risk-free asset, we find the so-called market portfolio, which is the unique
portfolio providing an optimal combination with the risk-free asset.
We denote the prices of the securities as S 1 (t) and S 2 (t) for t = 0, 1. We
start with a motivating example.
11
12 Portfolios consisting of two assets
Example 2.1
Let Ω = {ω1 , ω2 }, S 1 (0) = 200, S 2 (0) = 300. Assume that
1
P ({ω1 }) = P ({ω1 }) = ,
2
and that
S 1 (1, ω1 ) = 260, S 2 (1, ω1 ) = 270,
S 1 (1, ω2 ) = 180, S 2 (1, ω2 ) = 360.
The expected returns and standard deviations for the two assets are
µ1 = 10%, µ2 = 5%,
σ1 = 20%, σ2 = 15%.
Assume that we spend V(0) = 500, buying a single share of stock S 1 and a
single share of stock S 2 . At time 1 we will have
V(1, ω1 ) = 260 + 270 = 530,
V(1, ω2 ) = 180 + 360 = 540.
The expected return on the investment is 7% and the standard deviation is
just 1%. We can see that by diversifying the investment into two stocks we
have considerably reduced the risk.
2.1 Return
From the above example we see that the risk can be reduced by diversifica-
tion. In this section we discuss how to minimise risk when investing in two
stocks.
Suppose that we buy x1 shares of stock S 1 and x2 shares of stock S 2 .
The initial value of this portfolio is
When we design a portfolio, usually its initial value is the starting point of
our considerations and it is given. The decision on the number of shares
in each asset will follow from the decision on the division of our wealth,
which is our primary concern and is expressed by means of the weights
2.1 Return 13
defined by
x1 S 1 (0) x2 S 2 (0)
w1 = , w2 = . (2.1)
V(x1 ,x2 ) (0) V(x1 ,x2 ) (0)
If the initial wealth V(0) and the weights w1 , w2 , w1 +w2 = 1, are given, then
the funds allocated to a particular stock are w1 V(0), w2 V(0), respectively,
and the numbers of shares we buy are
w1 V(0) w2 V(0)
x1 = , x2 = .
S 1 (0) S 2 (0)
At the end of the period the securities prices change, which gives the final
value of the portfolio as a random variable
To express the return on a portfolio we employ the weights rather than the
numbers of shares since this is more convenient.
The return on the investment in two assets depends on the method of
allocation of the funds (the weights) and the corresponding returns. The
vector of weights will be denoted by w = (w1 , w2 ), or in matrix notation
" #
w1
w= ,
w2
and the return of the corresponding portfolio by Kw .
Proposition 2.2
The return Kw on a portfolio consisting of two securities is the weighted
average
Kw = w1 K1 + w2 K2 , (2.2)
where w1 and w2 are the weights and K1 and K2 the returns on the two
components.
Proof With the numbers of shares computed as above, we have the fol-
lowing formula for the value of the portfolio
Example 2.3
Consider the stocks S 1 and S 2 from Example 2.1. Suppose that at time 0
we have V(0) = 600. Suppose also that at time 0 we borrow three shares
of stock S 1 , meaning that we choose x1 = −3. We sell the three shares of
stock, which together with V(0) gives us 3 · 200 + 600 = 1200 to invest in
the second asset. We can thus take x2 = 4. Note that
V(x1 ,x2 ) (0) = x1 S 1 (0) + x2 S 2 (0) = 600 = V(0).
At time 1 we have the proceeds from holding four shares of S 2 , but we
need to buy back the three shares of S 1 at its market value. Since
V(x1 ,x2 ) (1) = x1 S 1 (1) + x2 S 2 (1),
we see that
V(x1 ,x2 ) (1, ω1 ) = −3 · 260 + 4 · 270 = 300,
V(x1 ,x2 ) (1, ω2 ) = −3 · 180 + 4 · 360 = 900.
We can compute the weights using (2.1)
−3 · 200 4 · 300
w1 = = −1, w2 = = 2.
600 600
We see that, as expected, w1 + w2 = 1.
2.2 Attainable set 15
Exercise 2.1 Compute the expected return and the standard devia-
tion of the return for the investment from Example 2.3. Explain why
this portfolio is less desirable than investing in any of the two securi-
ties.
Let us introduce the following notation for the covariance of the returns
on the stocks S 1 , S 2 :
16 Portfolios consisting of two assets
σi j = Cov(Ki , K j ),
for i, j = 1, 2. In particular,
σi j
ρi j = . (2.5)
σi σ j
For this to make sense we have to assume that the variances of both returns
are non-zero. The variance is zero in one case only, namely when the ran-
dom variable is constant (almost surely). So we assume that the returns on
stocks are genuine, non-constant, random variables, unlike bonds, where
the return is the same in each state (scenario).
By (2.4) the correlation coefficient satisfies
−1 ≤ ρi j ≤ 1.
Theorem 2.4
The expected return and the variance of the return on a portfolio are given
by
µw = E(Kw ) = w1 µ1 + w2 µ2 , (2.6)
σ2w = Var (Kw ) = w21 σ21 + w22 σ22 + 2w1 w2 σ12 . (2.7)
Proof Equality (2.6) follows directly from (2.2) and linearity of mathe-
matical expectation:
Corollary 2.5
Using (2.5) we can rewrite the formula for the variance of a portfolio as
σ2w = w21 σ21 + w22 σ22 + 2w1 w2 ρ12 σ1 σ2 . (2.8)
Corollary 2.6
Using the following matrix notation
µ1
" # " #
w1
w= , µ= ,
w2 µ2
σ21 σ12
" #
C= ,
σ12 σ22
equations (2.6)–(2.7) can be written as
µw = wT µ, (2.9)
σ2w = w Cw
T
(2.10)
where we denote the transpose of the matrix A by AT .
18 Portfolios consisting of two assets
µ
Figure 2.3 Portfolio line with one asset dominating the other.
From (2.11) we see that µw is affine, and σ2w is a quadratic function with
respect to w. Since a graph of the root of a quadratic function is a hyperbola,
one can guess that the attainable set consisting of all points (µw , σw ) should
be a hyperbola.
Theorem 2.7
If µ1 , µ2 and ρ12 ∈ (−1, 1), then the attainable set is a hyperbola with its
centre on the vertical axis.
Proof See page 31.
We shall return to the above discussion when working with n assets later
on. It may come as a surprise that from the point of view of technical
difficulties, the general case will be as simple as the particular situation
just worked out, where only two assets are involved. It will also turn out
that the case of many assets reduces to the case of just two and we will be
able to draw valuable conclusions, that remain valid in general case, from
the discussion of the present chapter.
In practice we can reject some of the portfolios drawing on the basic
preference property, namely, given two portfolios with the same risk, the
20 Portfolios consisting of two assets
one with higher expected return is preferable. So we may discard the lower
part of the curve restricting our attention to the upper, called the efficient set
or frontier, as shown in Figure 2.4. More precisely, a portfolio is called ef-
ficient if there is no other portfolio, except itself, that dominates it. The set
of efficient portfolios among all attainable portfolios is called the efficient
frontier.
Exercise 2.5 Assuming that ρ12 = −1, derive the formulae for the
half lines that form the attainable set.
Figure 2.6 Portfolio line for one risky and one risk-free security.
Exercise 2.8 Investigate what happens when illegal data with |ρ12 | >
1 are considered.
µw = w1 R + w2 µ2 ,
σ2w = w22 σ22
giving
σw = |w2 | σ2 ,
and so the set in the (σ, µ)-plane is as shown in Figure 2.6 (with redundant
lower part according to the preference relation).
The segment between the risk-free asset and the asset characterised by
(σ2 , µ2 ) corresponds to positive weights. The line above (σ2 , µ2 ) requires
taking a short position in the risk-free asset, in other words, borrowing at
the risk-free rate (which we assume here to be possible). The rejected lower
segment shows portfolios with a short position in the risky asset.
2.4 Minimum variance portfolio 23
In Corollary 2.6 the return and variance of a given portfolio were stated
in terms of the covariance matrix
σ1 σ12
" 2 #
C=
σ12 σ22
for the two assets. We now do the same for the weights of the minimum
variance portfolio.
Since S 1 and S 2 are risky assets, the matrix C is invertible. By Cramer’s
rule
σ22 −σ12
" #
1
C =−1
.
det C −σ12 σ21
So we have, writing 1 = (1, 1),
σ2 − σ12
" 2 # " #
1 1 a
C 1=
−1
= ,
det C σ21 − σ12 det C b
1 1
1TC −1 1= (σ2 + σ22 − 2σ12 ) = (a + b),
det C 1 det C
since σ12 = ρ12 σ1 σ2 . We have proved the following:
Corollary 2.9
The vector wmin = (w1 , w2 ) of weights of the minimum variance portfolio
found in Theorem 2.8 has the form
C −1 1
wmin = .
1TC −1 1
We now discuss what happens when short-selling is not allowed. We
need to find the minimum of
MP
MVP
Figure 2.9 The minimum variance portfolio (MVP), the market portfolio
(MP), and the capital market line (CML).
the expected return of the market portfolio by µm and its risk by σm , the
capital market line is given by
µm − R
µ=R+ σ. (2.15)
σm
Theorem 2.10
The weights of the market portfolio are m = (w, 1 − w), with
c d
w= , 1−w= , (2.16)
c+d c+d
where
c = σ22 (µ1 − R) − σ12 (µ2 − R),
d = σ21 (µ2 − R) − σ12 (µ1 − R).
Proof See page 33.
Corollary 2.11
The formulae (2.16) for the weights of the market portfolio can be written
in matrix notation as
C −1 (µ − R1)
m = T −1 , (2.17)
1 C (µ − R1)
where C is the covariance matrix, µ = (µ1 , µ2 ), and 1 = (1, 1).
Example 2.12
Assume that the covariance matrix C, the vector of expected returns µ, and
the risk-free return R are given. Assume also that an investor wishes to
spend V and that the aim is to achieve an expected return equal to a given
rate m. The question is how much he should spend on the risky assets, and
how much he should invest risk-free.
First we compute m using (2.16). We can then compute the expected
return of the market portfolio using (2.9)
µm = mT µ.
Optimal investments lie on the capital market line. The investor needs to
hold a combination of the market portfolio and the risk-free security. We
assume that he spends λV on the market portfolio and invests (1 − λ) V
risk-free. The desired λ can be computed from the expected return of the
position
λµm + (1 − λ) R = m,
28 Portfolios consisting of two assets
giving
m−R
λ= .
µm − R
Since the investor spends λV on the market portfolio, the vector
!
v1
= λVm,
v2
gives us the amount v1 invested in the first asset, and v2 invested in the
second asset. As mentioned above, (1 − λ) V is invested risk-free.
risk will never be equally attractive; nor will be two assets with the same
risk but different expected returns. Thus the intersection of this set by any
line parallel to any of the axes can contain at most one element. So it is a
graph of an increasing function. We assume in addition that this function is
convex for each investor – in other words, to retain his peace of mind, the
investor demands that a unit increase of risk be offset by more than one unit
increase in return, as shown in Figure 2.10 – and we call it an indifference
curve.
We assume that indifference curves are level sets of a function
u : R2 → R.
We assume that a curve {u = c2 } lies above {u = c1 } for c1 < c2 . In other
words, the higher the value of u, the higher the investor’s satisfaction with
the investment. Given a set of attainable portfolios, an investor chooses the
one placed on the best indifference curve. It is geometrically obvious as a
result of convexity of the curves that the optimal portfolio is at the tangency
point with the capital market line, for some indifference curve, as shown in
Figure 2.11(a).
For another investor, who is less risk averse, that is, who has less steep
indifference curves, the optimal portfolio may be different, as in Figure
2.11(b). It lies further to the right, which agrees with our intuition regarding
the risk preferences of this investor.
30 Portfolios consisting of two assets
Figure 2.11 Indifference curves and optimal investment for an investor with
high risk aversion (a), and lower risk aversion (b).
Example 2.13
Assume that the covariance matrix C, the vector of expected returns µ, and
the risk-free return R are given, and that an investor’s indifference curves
are the level sets of the function
a
u(σ, µ) = µ − σ2 . (2.18)
2
We show how the investor should spend V to maximise u. The indifference
curves are the level sets u(σ, µ) = c, so that we obtain µ = c + a2 σ2 , which
is convex and has slope aσ.
Using (2.17), (2.9) and (2.10) we can find the market portfolio m, its
expected return µm and variance σ2m . Since the slope aσ of the indifference
curve needs to match the slope of the capital market line, the tangency point
can be found by solving the system of two linear equations
µm − R
µ=R+ σ,
σm
µm − R
aσ = .
σm
This means that
!2
1 µm − R
µ=R+ .
a σm
We can now determine how to divide V amongst the assets using the
same method as in Example 2.12.
2.7 Proofs 31
2.7 Proofs
Theorem 2.7
If µ1 , µ2 and ρ12 ∈ (−1, 1), then the attainable set is a hyperbola with its
centre on the vertical axis.
Proof For a more familiar notation we introduce the letters x, y for the
coordinates so that we have the following description of the attainable set:
y = wµ1 + (1 − w)µ2 , (2.19)
x =w
2 2
σ21 + (1 − w) 2
σ22 + 2w(1 − w)σ12 . (2.20)
The goal of further computations is to convert the above system of equa-
tions to the form
(x − h)2 (y − k)2
− = 1, (2.21)
a2 b2
from which we will be able to read off the properties of the hyperbola (see
Figure 2.12).
Solving (2.19) for w
y − µ2
w=
µ1 − µ2
(note the relevance of the assumption µ1 , µ2 ) and inserting into (2.20),
we get
1
x2 = [(y − µ2 )2 σ21 + (µ1 − y)2 σ22 + 2(y − µ2 )(µ1 − y)σ12 ],
A
32 Portfolios consisting of two assets
(x−h)2 (y−k)2
Figure 2.12 The hyperbola a2
− b2
= 1.
We can see that we have obtained the desired hyperbola equation (2.21),
with h = 0, meaning that the center of the hyperbola lies on the vertical
axis (see Figure 2.12).
2.7 Proofs 33
One loose end to tie up is to show that c , 0, as otherwise we would be
dividing by zero in (2.23). A simple but tedious computation shows that
Theorem 2.10
The weights of the market portfolio are m = (w, 1 − w), with
c d
w= , 1−w= ,
c+d c+d
where
Proof For a portfolio (w, 1 − w), we denote its expected return by µ(w),
and standard deviation by σ(w). Optimisation is based on maximising the
slope coefficient:
µ(w) − R
s(w) = .
σ(w)
To this end it is necessary and sufficient to solve
s0 (w) = 0.
We have
µ0 (w)σ(w) − (µ(w) − R)σ0 (w)
s0 (w) = .
σ2 (w)
34 Portfolios consisting of two assets
Since
p 0 1 1
σ0 (w) = σ2 (w) = (σ2 (w))0 = (σ2 (w))0 ,
2 σ (w)
p
2 2σ(w)
the equation s0 (w) = 0 reduces to
2µ0 (w)σ2 (w) − (µ(w) − R)(σ2 (w))0 = 0,
that is
(µ1 − µ2 )(w2 σ21 + (1 − w)2 σ22 + 2w(1 − w)σ12 )
−(wµ1 + (1 − w)µ2 − R)(wσ21 − (1 − w)σ22 + (1 − 2w)σ12 ) = 0.
This is in fact a linear equation in w since all terms involving w2 cancel
out. Elementary, but tedious computations give
c d
w= , 1−w= ,
c+d c+d
which concludes the proof.
3
Lagrange multipliers
The aim of this section is to provide the underlying geometric intuition for
the method.
35
36 Lagrange multipliers
We consider two functions
f : R2 → R,
g : R2 → R,
and show how to find solutions of the following problem:
Find
min f (x, y),
(3.1)
under the constraint: g(x, y) = 0.
We start with a simple example.
Example 3.1
Consider
f (x, y) = x2 + y2 ,
1 1 1
g(x, y) = x + y − .
2 2 2
Basic arguments (say, by substituting y = 1 − x into f (x, y) and computing
a derivative with respect to x) lead to the solution
1
x ∗ = y∗ = . (3.2)
2
We now present an alternative approach. We first observe that one of the
level curves {(x, y) : f (x, y) = r2 } (which are circles of radius r, as shown
in Figure 3.1) is tangent at the point (x∗ , y∗ ) to the line {(x, y) : g(x, y) = 0}.
Since the gradients
∂f " #
∂x (x, y) 2x
∇ f (x, y) = ∂ f
= ,
∂y
(x, y) 2y
∂g " 1
#
∂x
(x, y)
∇g(x, y) = = ,
2
∂g 1
∂y
(x, y) 2
are orthogonal to the level curves, the vectors ∇ f (x∗ , y∗ ) and ∇g(x∗ , y∗ )
should be collinear. This means that there should exist a number λ ∈ R
such that we have the following system of two equations:
∇ f (x, y) − λ∇g(x, y) = 0. (3.3)
The idea is to solve (3.3) instead of (3.1); in other words, we solve a
system of equations, instead of solving a minimisation problem.
3.1 Motivating examples 37
Figure 3.1 The level curves { f = r2 } for r = 1 (outer circle), r = √12 (mid-
dle circle) and r = 12 (inner circle), together with the gradients ∇ f and ∇g,
attached at (x∗ , y∗ ).
Together with the constraint g(x, y) = 0, (3.3) leads to the linear system
1
2x − λ = 0,
2
1
2y − λ = 0, (3.4)
2
1 1 1
x + y − = 0,
2 2 2
with the unique solution
1
x ∗ = y∗ =
, λ∗ = 2.
2
The points x∗ and y∗ found by this method are the same as those found in
(3.2).
In Figure 3.2 we see that in this example the point (x∗ , y∗ ) is the only
point on {g(x, y) = 0}, at which ∇ f and ∇g are collinear, hence the only
point where (3.3) can hold.
Example 3.1 suggests that instead of solving the problem (3.1) we can
look for a solution of the system of equations
∇ f (x, y) − λ∇g(x, y) = 0, (3.5)
g(x, y) = 0.
Solving a system of equations can turn out to be easier than minimising a
function under constraints.
We now test how this works on an example from portfolio theory that
was discussed in Chapter 2.
Example 3.2
We consider the problem of finding the minimum variance portfolio when
given two risky assets, as in Chapter 2. To use the same notation as in (3.5),
we write x and y instead of w1 and w2 , respectively, and take
f (x, y) = x2 σ21 + y2 σ22 + 2xyσ12 ,
g(x, y) = x + y − 1.
The constraint g(x, y) = 0 ensures that x and y add up to one, making the
pair (x, y) a well defined portfolio. The function f gives its variance.
3.1 Motivating examples 39
MVP
The examples from the previous section have been considered on the plane.
It turns out that a similar approach can be used in higher dimensions, and
that we can consider more complicated constraints.
Our objective in this section is to show how to solve the following gen-
eral constrained minimisation problem:
Find
min f (v) ,
(3.8)
under the constraints: g(v) = 0,
where
f : Rn → R,
g : Rn → Rk .
We will provide necessary and, in the special case of quadratic forms, suf-
ficient conditions for a solution to this problem.
To keep better track of dimensions, we use a bold font whenever we are
dealing with vectors, and the normal font when dealing with numbers. Note
that in stating the problem above we used f for a function taking values in
R and g for a function
taking values in Rk .
For the reader’s convenience we review some notations from multi-variable
3.2 Constrained extrema 41
calculus. We use the notation g0 (v) to denote the k × n Jacobian matrix
∂g1 ∂g ∂g
∂x1 (v) ∂x21 (v) · · · ∂xn1 (v)
∂g2 (v) ∂g2 (v) · · · ∂g2 (v)
∂x ∂x2 ∂xn
g (v) = 1.
0
.. .. .
.
∂gk . . .
∂gk ∂gk
∂x1
(v) ∂x2
(v) · · · ∂xn
(v)
∂2 f ∂2 f ∂2 f
∂x1 ∂x1
(v) ∂x1 ∂x2
(v) ··· ∂x1 ∂xn
(v)
∂2 f ∂2 f ∂2 f
∂x2 ∂x1
(v) ∂x2 ∂x2
(v) ··· ∂x2 ∂xn
(v)
H( f, v) = .. .. ..
. . .
∂2 f ∂2 f ∂2 f
∂xn ∂x1
(v) ∂xn ∂x2
(v) ··· ∂xn ∂xn
(v)
Theorem 3.4
Assume that f : Rn → R is twice continuously differentiable, and that for
any v ∈ Rn the Hessian H( f, v) is a positive semidefinite matrix, meaning
that
wT H( f, v)w ≥ 0, (3.10)
g(v) = Av − c,
3.3 Proofs
Our proof of Theorem 3.3 depends on the implicit function theorem, which
is a classical result in analysis. We state this theorem without proof,1 after
introducing some notation.
For
g = (g1 , . . . , gk ) : Rl × Rm → Rk
∂g ∂g
and (x, y) ∈ Rl × Rm , x = (x1 , . . . xl ) and y = (y1 , . . . ym ) we write ∂x
and ∂y
for the k × l (resp. k × m) matrices
∂g1 ∂g ∂g
∂x1 (x, y) ∂x21 (x, y) · · · ∂x1l (x, y)
∂g .. .. ..
(x, y) = ,
∂x ∂gk . . .
∂gk ∂gk
∂x1
(x, y) ∂x2 (x, y) · · · ∂xl (x, y)
∂g1 ∂g1 ∂g1
∂y1
(x, y) ∂y2
(x, y) ··· ∂ym
(x, y)
∂g
.. .. ..
(x, y) = . . .
.
∂y
∂gk ∂gk ∂gk
(x, y) (x, y) · · · (x, y)
∂y1 ∂y2 ∂ym
Having developed the required mathematical tools, the tasks of finding the
minimum variance portfolio, minimum variance line and market portfo-
lio for portfolios of n risky assets can be cast as constrained minimisa-
tion problems whose solutions are provided by applying the method of La-
grange multipliers. Using simple linear algebra, the formulae for the min-
imum variance and market portfolios and the capital market line can be
shown to mirror those found for portfolios of two assets. The derivations
of these formulae will be preceded by an examination of the portfolios of
three assets in order to provide geometric intuition.
1 = (1, . . . , 1) ,
the constraint can conveniently be written as
wT 1 = 1. (4.1)
48
4.1 Risk and return 49
The attainable set is the set of all weight vectors w that satisfy this con-
straint.
If short-selling is not possible, the condition w j ≥ 0 is added to the
constraint, so in that case the attainable set becomes
{w : wT 1 = 1, w j ≥ 0 for all j ≤ n}.
Unless stated otherwise, we shall assume availability of short sales.
Alternatively a portfolio is described by the vector of positions taken in
particular components (numbers of units of assets)
x = (x1 , . . . , xn ).
We have the following relations between the weights, prices and the num-
bers of shares:
x j Sj (0)
wj = , j = 1, . . . , n,
V(0)
where x j is the number of shares of security j in the portfolio, Sj (0) is the
initial price of security j, and V(0) is the total money invested.
Denote the random returns on the securities by K1 , . . . , Kn , and the vec-
tor of expected returns by
µ = (µ1 , . . . , µn ),
with
µ j = E(K j ), for j = 1, . . . , n.
The covariances between returns will be denoted by σ jk = Cov(K j , Kk ), in
particular σ j j = σ2j = Var(K j ). These are the entries of the n×n covariance
matrix
σ11 σ12 · · · σ1n
σ21 σ22 · · · σ2n
C = . .. .. .. .
.. . . .
σn1 σn2 · · · σnn
We write as before
n
X
Kw = w j K j.
j=1
50 Portfolios of multiple assets
Theorem 2.4 can easily be generalised.
Theorem 4.1
The expected return µw = E(Kw ) and variance σ2w = Var(Kw ) of a portfolio
with weights w are given by
µw = wT µ,
σ2w = wTCw.
Proof The formula for µw follows from the linearity of mathematical ex-
pectation:
n n n
X X X
µw = E(Kw ) = E w j K j =
w j E(K j ) = w j µ j = wT µ.
j=1 j=1 j=1
= wTCw.
Exercise 4.2 Show that the covariance matrix is symmetric and pos-
itive semidefinite. (Recall that C is positive semidefinite if for any
x ∈ Rn , xTCx ≥ 0.) Does C have to be invertible?
w2 w1 w1
w2
Proposition 4.2
For any two portfolios
wA = wA,1 , . . . , wA,n ,
wB = wB,1 , . . . , wB,n ,
= wTACwB ,
as required.
52 Portfolios of multiple assets
1 1
1 1
Lemma 4.3
We have the following formulae for the gradients computed with respect to
w:
∇ wT µ = µ, (4.3)
∇ wT 1 = 1, (4.4)
∇ w Cw = 2Cw,
T
(4.5)
Proof Since
∂ T ∂
w µ = (w1 µ1 + · · · + wn µn ) = µi
∂wi ∂wi
4.3 Minimum variance portfolio 55
we see that
∂
wT µ µ1
∂w1
.. ..
∇ w µ =
T = = µ,
. .
∂
∂wn
wT µ
µn
which proves (4.3).
The proof of (4.4) follows from an identical argument, using 1 instead
of µ.
To prove (4.5) we observe that in
n n
∂ T ∂ XX
w Cw = w j wk σ jk
∂wi ∂wi j=1 k=1
Corollary 4.5
For any portfolio w
Cov(Kw , Kwmin ) = σ2wmin .
min wTCw,
subject to: wT µ = m, (4.12)
wT 1 = 1.
58 Portfolios of multiple assets
MVL
Theorem 4.6
Let M be a 2 × 2 matrix of the form
µ C µ µTC −1 1
" T −1 #
M= .
µTC −1 1 1TC −1 1
If C and M are invertible, then the solution of problem (4.12) is given by
1
w= C −1 (det(M1 ) µ + det(M2 ) 1) , (4.13)
det(M)
where
m µTC −1 1 µTC −1 µ m
" # " #
M1 = , M2 = .
1 1TC −1 1 µTC −1 1 1
Proof We introduce the Lagrange multiplier λ = (λ1 , λ2 ), and the La-
grangian
L(w) = ∇ wTCw −λ1 ∇ wT µ − m + λ2 ∇ wT 1 − 1 = 0.
Using Lemma 4.3 we can compute
L(w) = 2Cw − λ1 µ − λ2 1 = 0.
We solve this system for w:
1 1
w= λ1C −1 µ + λ2C −1 1. (4.14)
2 2
Since wT µ = µT w and wT 1 = 1T w, substituting (4.14) into the constraints
from (4.12), we obtain a system of linear equations
1 1
λ1 µTC −1 µ + λ2 µTC −1 1 = m,
2 2
1 1
λ1 1 C µ + λ2 1TC −1 1 = 1.
T −1
2 2
4.4 Minimum variance line 59
We can solve the above system for λ1 and λ2 to obtain (note the relevance
of the assumption that M is invertible, which ensures that det(M) , 0)
1 det (M1 ) 1 det (M2 )
λ1 = , λ2 = .
2 det (M) 2 det (M)
Substituting the above back into (4.14) gives (4.13).
We have found a candidate for the solution of (4.12). By Lemma 4.3 we
know that the Hessian of wTCw is equal 2C, which is a positive semidefi-
nite matrix. By Theorem 3.4 this ensures that we have found a global min-
imum.
Corollary 4.7
There exist two vectors a and b, which depend only on C and µ, such that
for any real m the solution of the problem (4.12) is
w = ma + b.
Proof Since
MVP
Figure 4.7 Efficient frontier, together with the minimum variance portfolio
(MVP).
The efficient frontier, which is the set of all portfolios not dominated by
any other portfolios, consists of w = am + b for m ≥ µwmin (see Figure 4.7).
We now show that the whole minimum variance line can be found from
just two portfolios. This result is often referred to as the two-fund the-
orem, since it means that two efficient portfolios (with unequal returns)
suffice to establish an efficient investment policy.
Corollary 4.8
Suppose that w1 and w2 are two portfolios on the minimum variance line
with different expected returns: µw1 , µw2 . Then any portfolio w on the
minimum variance line can be obtained from these two, that is, there is a
real number α such that w = αw1 + (1 − α)w2 .
Proof We first find α so that
µw = αµw1 + (1 − α)µw2 .
This is possible since the returns are different:
µw − µw2
α= .
µw1 − µw2
Since the two portfolios lie on the minimum variance line, they satisfy
w1 = µw1 a + b,
w2 = µw2 a + b.
From these relations we have
αw1 + (1 − α)w2 = (αµw1 + (1 − α)µw2 )a + b = µw a + b,
but w is also on the minimum variance line so w = µw a + b, hence the
result.
4.4 Minimum variance line 61
The minimum variance portfolio wmin lies on the minimum variance line.
We therefore already have a simple formula (4.7) for one of the two port-
folios needed to obtain the minimum variance line. The second portfolio
is the market portfolio, whose formula will be derived in the next section.
The resulting parameterisation of the minimum variance line will then be
written out in equation (4.18).
From Corollary 4.8 we obtain the following important observation.
Theorem 4.9
Suppose that there exist two portfolios w1 and w2 on the minimum variance
line with different expected returns: µw1 , µw2 . Then the minimum variance
line is a hyperbola centred on the vertical axis.
Proof Let Kw1 and Kw2 be the returns on portfolios w1 and w2 , respec-
tively. From Corollary 4.8 we know that any portfolio on the minimum
variance line can be expressed as
w = αw1 + (1 − α)w2 ,
hence its return is equal to
Kw = αKw1 + (1 − α)Kw2 .
We can treat each of the two portfolios as if it were a single security. Ap-
plying the results from Chapter 2 for portfolios consisting of two securities,
we know that
µw = αµw1 + (1 − α) µw2 ,
σ2w = α2 σ2w1 + (1 − α)2 σ2w2 + 2α (1 − α) Cov Kw1 , Kw2 .
CML
MP
MVP
Figure 4.8 Minimum variance portfolio (MVP), the market portfolio (MP),
and the capital market line (CML).
Exercise 4.7 Consider the data from Exercise 4.6. Plot the mini-
mum variance line in the (w1 , w2 )-plane. Consider two portfolios cor-
responding to m = 10% and m = 20%. Find the variances of, as well
as the covariance between, their returns. Use these to plot the mini-
mum variance line in the (σ, µ) plane.
Exercise 4.8 Consider the data from Exercise 4.6. Find the weights
and the expected return of a portfolio on the minimum variance line
with σ2 = 0.007.
Theorem 4.10
If the risk-free return R is smaller than the expected return of the minimum
variance portfolio, then the market portfolio exists and is given by
C −1 (µ − R1)
m= . (4.15)
1TC −1 (µ − R1)
Proof From Theorem 4.9 we know that the minimum variance line is
a hyperbola. Since its centre is on the vertical axis, there exists a single
tangency point for a half line emanating from (0, R), which maximises the
slope (see Figure 4.8). The slope in question is of the form
µw − R wT µ − R
= √ ,
σw wTCw
where w are the weights of a portfolio and R is the risk-free rate of return.
At the maximal slope the Lagrangian
wT µ − R
!
L(w) = ∇ √ − λ∇(wT 1 − 1),
wTCw
needs to be equal to zero. We can compute the gradients using Lemma 4.3
and equate them to zero:
√
µ wTCw − (wT µ − R) 2 √w1T Cw 2Cw
L(w) = − λ1 = 0.
wTCw
This yields
Cw
µσw − (µw − R) − λσ2w 1 = 0,
σw
hence
µw − R
Cw = µ − λσw 1.
σ2w
Multiplying by wT on the left and using the fact that wT 1 = 1 we get
µw − R T
w Cw = µw − λσw ,
σ2w
so
R
λ= ,
σw
64 Portfolios of multiple assets
therefore we have the equation
γCw = µ − R1,
µw −R
where γ = σ2w
. Therefore
γw = C −1 (µ − R1). (4.16)
Even though we have w in the formula for γ, we show that γ turns out to
be a constant. This follows from multiplying the above equation by 1T on
both sides, which gives
γ = 1TC −1 (µ − R1).
By substituting γ into (4.16) we obtain our claim.
Exercise 4.9 Prove that when R is equal to the expected return of the
minimum variance portfolio, then the formula for the market portfolio
results in a division by zero. Explain geometrically why this is so.
The line joining the risk-free security represented by (0, R) and the mar-
ket portfolio with coordinates (σm , µm ) is given by the equation
µm − R
µ=R+ σ. (4.17)
σm
It is called the capital market line, CML in brief. For a portfolio on CML
with risk σ the term µmσ−Rm
σ is called the risk premium, which is the addi-
tional return above the risk-free level, representing a reward or compensa-
tion for exposure to risk.
If all the investors agree on the values of the model parameters (the ex-
pected returns on the basic assets and the entries of the covariance matrix)
and if each investor chooses an optimal portfolio according to convex in-
difference curves on the basis of risk-return analysis, then all these optimal
portfolios are placed on the CML. Consequently, they should all invest in
just one risky portfolio, namely the market portfolio (combining it with
the risk-free asset in a preferred individual way). Consequently, the mar-
ket portfolio weights should represent the relative volumes of the values
of particular shares of stock with respect to the whole market (just as in
Chapter 2, where we discussed a simple market with just two ingredients).
Such a portfolio is represented in practice by the market index.
We now return to our discussion of the shape of the minimum variance
line. From Corollary 4.8 we know that this line can be constructed using
4.5 Market portfolio 65
m2
m1
Figure 4.9 Efficient frontier in the case of different rates for investing and
borrowing risk free.
The market portfolio exists when the return on the minimum variance
portfolio exceeds the risk-free return. The Capital Asset Pricing Model
(CAPM) provides a linear relationship between the expected return µm on
the market portfolio and that of any risky asset. The two are linked by
means of a parameter, commonly known as the beta (β), providing a mea-
sure of undiversifiable risk of an asset. In the chapter we explore this rela-
tionship and show how the CAPM formula can assist investment decisions
and introduce measures of portfolio performance.
Paradoxically, although we use variance to quantify risk, in assessing
portfolio risk the variances of the assets in the portfolio turn out to be less
relevant than their mutual covariances. To demonstrate this, let us consider
the following example.
Example 5.1
Suppose that the weights of a portfolio are of the form w j = 1n , j ≤ n,
where n is the number of assets in the portfolio. We investigate the risk of
this portfolio in terms of its dependence on n. Assume that the variances of
all securities on the market are uniformly bounded, σ2j ≤ L. Then
n n
X X X 1 1 X
σ2w = w j wk σ jk = w2j σ2j + w j wk σ jk ≤ n L + σ jk .
j,k=1 j=1 j,k
n2 n2 j,k
67
68 The Capital Asset Pricing Model
Assume further that the off-diagonal elements of the covariance matrix are
uniformly bounded, |σ jk | ≤ c, for some c > 0. Then
L 1
σ2w ≤ + n(n − 1)c.
n n2
The upper bound converges to c as n → ∞. Hence the risk of a portfolio
containing many assets is determined by the covariances. The variances of
the ingredients become irrelevant for large n.
Figure 5.1 Lack of tangency for portfolios built out of a security and the
market portfolio, leads to portfolios with higher slope than that of the market
portfolio.
minimal variance portfolio (so that the market portfolio m exists). Then,
for each i ≤ n, the expected return µi of the i-th asset in the portfolio is
given by the formula
µi = R + βi (µm − R). (5.1)
Proof As we know, the capital market line is tangent to the minimum vari-
ance line at the market portfolio point (σm , µm ) (see Figure 4.8). Consider
all portfolios built by means of the market portfolio and the i-th security.
They form a hyperbola which we claim to be tangent to the capital market
line at (σm , µm ). Suppose that, on the contrary, this hyperbola intersects the
CML. This clearly contradicts the fact that the slope of CML is maximal,
see Figure 5.1
We compute the slope of the tangent line to the hyperbola at (σm , µm )
and then we will use the fact that the slope of CML is the same. Denote
the proportion of wealth invested in security i by x and that invested in the
market portfolio by 1 − x. We use x to denote the portfolio x = (x, 1 − x).
The risk and return are of the form
µx = xµi + (1 − x)µm ,
q
σx = x2 σ2i + (1 − x)2 σ2m + 2x(1 − x)Cov(Ki , Km ),
SML CML
MP MP
Figure 5.2 Security market line (SML) and the capital market line (CML).
MP is the market portfolio.
Formula (5.5) can be used to estimate the beta factor of a security, based
on historical data.
76
6.1 Basic notions and axioms 77
{ω1 , . . . , ωN }, with
P({ωi }) = pi > 0.
The prices of securities are denoted by Sj (0), the initial prices, and
Sj (1, ωi ) = Sj (1)(ωi ),
the prices at the end of the period, which depend on the state. Portfolios will
be described by the numbers x j of securities held. A portfolio is represented
by a vector x = (x1 , . . . , xn ). We denote the initial wealth of the investor by
V, so the formation of a portfolio is subject to the bound
n
X
x j Sj (0) = V.
j=1
Thus we assume that each individual can always decide which of two
given positions he prefers.
If Axioms 1 and 2 are satisfied, we call a preference relation. In
practice, a preference relation may be difficult to specify. An alternative
approach is based on employing a so-called utility.
Definition 6.1
A function U : RN → R is called a utility if it is strictly increasing with
respect to each variable, differentiable and strictly concave.
Using a utility U we can define the relation
X U Y if and only if U(X) ≤ U(Y).
Definition 6.2
We say that u : R → R is a utility function if it is strictly increasing,
differentiable and strictly concave.
Proposition 6.3
If u : R → R is a utility function, then U defined by
U(X) = E(u(X))
is a utility.
= λU(X) + (1 − λ)U(Y),
which means that U is strictly concave.
Definition 6.4
We say that a utility U is a von Neumann–Morgenstern utility if there
exists a utility function u such that
U(X) = E(u(X)).
The crucial feature of a von Neumann–Morgenstern utility is that it is
determined by a single-variable function u.
Example 6.5
Typical examples of utility functions are as follows:
(i) Exponential: u(x) = −e−ax ;
(ii) Logarithmic: u(x) = ln x;
(iii) Power: u(x) = axa for a ≤ 1;
(iv) Quadratic: u(x) = x − 12 bx2 (which is increasing only for x < 1b ).
Exercise 6.3 Verify that the functions from Example 6.5 satisfy the
conditions of Definition 6.2.
This shows that we can only have Vxk (1) → ∞ when kxk k → ∞. The
sequence zk = kxxkk k is bounded, hence has a subsequence convergent to
a limit z. We show that z is an arbitrage opportunity, which provides the
contradiction we seek. First,
n n
X 1 X V
Vzk (0) = (zk ) j Sj (0) = (xk ) j Sj (0) = → 0,
j=1
kxk k j=1 kxk k
by the definition of FCS , and this inequality is preserved in the limit, giv-
ing
Vz (1, ωi ) ≥ 0. (6.5)
Since S(1) is one-to-one, if we had S(1)z = 0, then z would need to be
equal to zero. This is not possible since kzk = 1, hence
Vz (1) = S(1)z , 0.
Combined with (6.5), this means that Vz (1, ωi ) > 0 for some ωi ∈ Ω,
showing that z is an arbitrage opportunity.
We now turn to the question of the relation between the security prices
at time 0 and 1.
Definition 6.8
We say that π = (π1 , . . . , πN ) is a vector of state prices, if πi > 0 for
i = 1, . . . , N, and
XN
Sj (0) = πi Sj (1, ωi ). (6.6)
i=1
6.2 Utility maximisation 83
Condition (6.6) can be written in matrix notation as
S(0) = πT S(1). (6.7)
We have the following relation linking the value of a strategy with state
prices.
Lemma 6.9
For any x ∈ Rn
N
X
Vx (0) = πi Vx (1, ωi ).
i=1
Let (Vx (1))i denote the i-th coordinate of the N-dimensional vector Vx (1).
Using the chain rule we obtain
∂f ∂
(x) = U(Vx (1))
∂x j ∂x j
N
X ∂U ∂
= (Vx (1)) (Vx (1))i
i=1
∂Xi ∂x j
N
n
X ∂U ∂ X
= xk S k (1, ωi )
(Vx (1))
∂Xi ∂x j k=1
i=1
N
X ∂U
= (Vx (1))Sj (1, ωi ),
i=1
∂Xi
Taking λ = 1
α
and looking at the j-th coordinate of (6.11) gives
N
X ∂U ∗
Sj (0) = λ (X )Sj (1, ωi ).
i=1
∂Xi
so
∂U
(X1 , . . . , XN ) = u0 (Xi ) pi ,
∂Xi
hence
∂U ∗
(X ) = u0 (X ∗ (ωi ))pi ,
∂Xi
and combined with (6.10) this implies the claim.
Theorem 6.12 can be used to find the solution of the optimisation prob-
lem. We focus on the particular case of expected utility U(X) = E(u(X)).
Theorem 6.14
Assume that U(X) = E(u(X)). If X ∗ = (X1∗ , . . . , XN∗ ) is a solution of the
problem (6.3), then, with (u0 )−1 denoting the inverse function of u0 , we ob-
tain
πi
!
Xi∗ = (u0 )−1 , (6.12)
λpi
where λ is determined by the condition
N
πi
X !
V= πi (u )0 −1
. (6.13)
i=1
λpi
Example 6.15
In this example we consider the case of a logarithmic utility function u(x) =
ln(x). Then u0 (x) = 1x and (u0 )−1 (y) = 1y . By (6.12) this gives
πi λpi
!
X (ωi ) = (u )
∗ 0 −1
= , (6.14)
λpi πi
and this λ is determined by (6.13) so that
N N
πi λpi
X ! X
V= πi (u0 )−1 = πi = λ. (6.15)
i=1
λpi i=1
πi
We consider a trinomial model with a single risky security with today’s
price S (0) = 100 and future prices
S = S (0) (1 + u) with probability 41 ,
u
S (1) = S = S (0) (1 + m) with probability 21 ,
m
S d = S (0) (1 + d)
with probability 1 , 4
5.1
4.9
4.7
It appears that, out of the above, the X ∗ for x = 0.1 and x = 0.4 have the
highest expected utility. But X ∗ associated both with x = 0.1 and x = 0.4
is not attainable though by means of a portfolio. Only X ∗ for x = 0.25 is
attainable, by investing V risk free.
We see therefore that not all solutions of (6.12)–(6.13) need to be solu-
tions of the optimisation problem. In fact, only the solution with the small-
est expected utility turns out to be feasible (see Figure 6.1).
Lemma 6.16
If Xπ∗ = (Xπ,1
∗
, . . . Xπ,N
∗
) ∈ X(π) is a solution of
max{E(u(X)) : X ∈ X(π)}
then there exists a λ such that
πi
!
∗
Xπ,i = (u0 )−1 ,
λpi
N
πi
X !
V= πi (u0 )−1 .
i=1
λpi
Proof The claim follows from the method of Lagrange multipliers (The-
orem 3.3), taking
N
X
f (X1 , . . . , XN ) = pi u(Xi )
i=1
and
N
X
g(X1 , . . . , XN ) = πi Xi − V,
i=1
Theorem 6.17
Assume that U(X) = E(u(X)). Let Π denote the set of all state price vec-
tors. If the model admits a strictly positive solution X ∗ of the optimisation
problem (6.3), then
E(u(X ∗ )) = min E(u(Xπ∗ )).
π∈Π
and therefore
E(u(X ∗ )) ≤ min E(u(Xπ∗ )).
π∈Π
Theorem 6.17 gives the following recipe for finding the optimal solution:
• find the family of state price vectors Π;
• using (6.12)–(6.13) for each π ∈ Π compute Xπ∗ ;
• the Xπ∗ with the smallest expected utility is the candidate for the solution.
In an arbitrage free and complete model, state prices are unique, in which
case finding the optimal solution turns out to be straightforward. In our
setting the model is complete if the matrix S(1) defined in (6.1) is square
(i.e. n = N) and invertible. Then, from (6.7), we obtain the formula for the
state price vector
πT = S(0) (S(1))−1 . (6.17)
6.2 Utility maximisation 91
By Theorem 6.7 we know that the solution to the optimisation problem ex-
ists. The state price vector π is uniquely determined, meaning that (6.12)–
(6.13) admits a unique solution X ∗ , which is the solution of the optimisation
problem. Let us denote by x∗ the strategy which gives the optimal utility,
X ∗ = Vx∗ (1).
Using (6.2) we can compute
x∗ = (S(1))−1 X ∗ . (6.18)
Example 6.18
As in Example 6.15, let us consider the problem of maximising the ex-
pected logarithmic utility. In addition to the risk-free investment and the
risky asset from Example 6.15, let us also consider a second risky asset.
We assume that
h i
S(0) = 1 100 200 ,
1 110 200
S(1) = 1 100 220 .
1 90 180
The state prices can be computed as
h i
πT = S(0) (S(1))−1 = 1
3
1
3
1
3
.
Let us assume that we invest V = 100. From the state prices we can com-
pute the optimal consumption using (6.14)–(6.15). Using (6.18) the optimal
strategy, we obtain
75 −150
X ∗ = 150 , x∗ = −2.5 .
75 2.5
This is the total wealth of the investors in the market at times 0 and 1. We
denote the market return by
M(1) − M(0)
Km = , (6.20)
M(0)
and the risk-free return by R.
Theorem 6.19
Assume that M(0) , 0 and Var(Km ) , 0. Then the expected return on each
asset satisfies
E(K j ) = R + β j (E(Km ) − R) ,
for j = 1, . . . , n, where
Cov(K j , Km )
βj = .
Var(Km )
Proof Let the risk-free asset be designated by index j = 1, so that K1 = R.
For an investor with initial wealth V and portfolio x we have
Vx (1) = V(1 + Kw )
n
X
= V 1 + w j K j
j=1
n
n
X X
= V 1 + 1 −
w j R +
w j K j . (6.21)
j=2 j=2
If x∗l is the optimal portfolio for investor l, and the initial wealth of this
investor is Vl = Vx∗l (0), then by (6.21), for j = 2, . . . , n, the first-order
conditions for a maximum give
∂ h i h i
0= E ul Vx∗l (1) = Vl E u0l (Vx∗l (1)) K j − R . (6.22)
∂w j
We use the relation Cov(X, Y) = E [XY] − E [X] E [Y], which holds for any
random variables X, Y, as Ω is finite:
h i
Cov u0l (Vx∗l (1)), K j − R = E u0l (Vx∗l (1)) K j − R
h i h i
−E u0l (Vx∗l (1)) E K j − R .
94 Utility functions
Comparing with (6.22), it follows that
h i h i
E u0l (Vx∗l (1)) E K j − R = −Cov u0l (Vx∗l (1)), K j − R .
hence
al h i! h i
− E Vx∗l (1) E K j − R = Cov Vx∗l (1), K j .
bl
Taking
L i!
X al h
c= − E Vxl (1) ,
∗
l=1
bl
c (E [Km ] − R) = c (E [m1 K1 + · · · + mn Kn ] − R)
Xn h i
= cm j E K j − R
j=1
n
X
= m j M(0)Cov Km , K j (by (6.23))
j=1
= M(0)Cov(Km , Km )
= M(0)Var(Km ).
Let us observe that since M(0) , 0 and Var(Km ) , 0, the above equality
implies that c , 0. As a result, combining the above with (6.23),
h i
E Kj − R Cov Km , K j
= = β j,
E [Km ] − R Var(Km )
which completes the proof.
6.4 Risk aversion 95
Above we have shown that we can connect the mean-variance criterion
for optimality of portfolios with the optimal expected utility if we assume
that investors use quadratic utility functions. However, an arbitrary utility
function can be approximated by a quadratic utility, if we consider its first
three Taylor terms. Thus the CAPM theorem can be considered as an ap-
proximation for the optimal portfolio choice for arbitrary utility functions.
P(X = Xi ) = pi .
96 Utility functions
Taking the second-order Taylor expansion at Xi of u around m = E(X) we
obtain
1
u(Xi ) ≈ u(m) + u0 (m)(Xi − m) + u00 (m)(Xi − m)2 .
2
Multiplying by pi and summing we get
1
E(u(X)) ≈ u(m) + u0 (m)E(X − m) + u00 (m)E(X − m)2 (6.24)
2
1 00
= u(m) + u (m)Var(X).
2
Taking the first-order Taylor expansion of u at m − γ(X) around m gives
Example 6.20
Assume that an investor has an exponential utility u(x) = −e−ax . Then
u0 (x) = ae−ax , u00 (x) = −a2 e−ax ,
which means that absolute risk aversion coefficient is constant
u00 (E(X))
ARA = − 0 = a.
u (E(X))
The certainty equivalent of X is then
aV 2
E(X) − γ(X) = V µ − σ + V. (6.28)
2
This yields the same type of indifference curve as considered in Example
2.13.
7.1 Quantiles
7.2 Measuring downside risk
7.3 Computing VaR: examples
7.4 VaR in the Black–Scholes model
7.5 Proofs
98
7.1 Quantiles 99
about the potential impact of extreme (i.e. highly unlikely) events. In this
chapter we explore this popular risk measure. Our focus is on its compu-
tation, for discrete, continuous and mixed distributions, and this will high-
light a further defect, showing that VaR for a diversified position can be
higher than for investment in a single asset.
In the final section we give a detailed analysis, in a Black–Scholes con-
text, of hedging to minimise VaR with the judicious use of European put
options.
7.1 Quantiles
An investor holding an asset whose future value is uncertain may wish to
determine whether his discounted gain X on an investment has at least 95%
probability of remaining above a certain (usually negative) level. Value at
Risk at 5% answers this question by specifying the minimum loss incurred
in the worst 5% of possible outcomes. Its calculation is therefore closely
tied to the values of the distribution function F X of X. This leads us to
examine the so-called quantiles of F X more closely.
We begin with a simple example.
Example 7.1
Consider a two step binomial model with stock prices
121
%
110
% &
100 99
& %
90
&
81
Assume that the probability p of the price going up in a single step is
p = 0.8. In this example we neglect the time value of money and compute
the gain after the second step of buying a single share of stock as
X = S (2) − S (0),
100 Value at Risk
Figure 7.1 The upper and lower quantiles for various distribution functions.
which gives
with probability p2 = 0.64,
21
X= −1 with probability 2p(1 − p) = 0.32,
−19 with probability (1 − p)2 = 0.04.
We can see that the probability that our investment will lead to a loss
L = −X < 19 is
P(L < 19) = P(X > −19) = 0.96.
This means that with with probability 96% we will lose no more than 1. If
we agree, for instance, to ignore the worst 5% of potential outcomes, our
‘worst-case scenario’ would be a loss of 1. However, if we are only willing
to exclude the worst 2.5%, for example, the loss of 19 should be taken into
account.
↵ = 0.1
↵ = 0.025 ↵ = 0.04
19 1 21
Figure 7.2 The plot of the distribution function from Example 7.1.
Definition 7.2
For α ∈ (0, 1) the number
is called an α-quantile of X.
The definition is best understood when looking at the graph of the cu-
mulative distribution function. In Figure 7.1 we can see that the upper and
the lower quantiles differ when the plot of F X (x) becomes flat at the value
F X (x) = α, otherwise they are equal.
Example 7.3
For X from Example 7.1 we can compute the upper and the lower α-
quantiles, for α ∈ {0.025, 0.04, 0.1}, as (see Figure 7.2)
q0.025 (X) = −19, q0.025 (X) = −19,
q0.04 (X) = −1, q0.04 (X) = −19,
q0.1 (X) = −1, q0.1 (X) = −1.
We list some basic properties of quantiles. The proofs are all elementary,
102 Value at Risk
but we defer the more technical parts to the end of the chapter to avoid
disturbing the flow of development.
Proposition 7.4
Let X, Y be random variables.
(i) X ≥ Y implies qα (X) ≥ qα (Y).
(ii) For any b ∈ R, qα (X + b) = qα (X) + b.
(iii) For b > 0, qα (bX) = bqα (X).
(iv) qα (−X) = −q1−α (X).
Proof See page 120.
Lemma 7.5
If F X (x) is continuous and strictly increasing then
qα (X) = F X−1 (α).
Proof The given conditions on F X ensure that it is invertible, the in-
verse function α → F −1 (α) is continuous, and α < F X (x) is equivalent
to F X−1 (α) < x. This gives
qα (X) = inf{x : α < F X (x)} = inf{x : F X−1 (α) < x} = F X−1 (α),
which concludes our proof.
Lemma 7.6
Let X be a random variable. If f : R → R is right-continuous and non-
decreasing then
qα ( f (X)) = f (qα (X)).
Proof See page 122.
Definition 7.7
For α in (0, 1), we define the Value at Risk (VaR) of X, at confidence level
1 − α, as (see Figure 7.3)
Example 7.8
Let X be as in Example 7.1. By looking at the distribution function F X (x)
(see Figure 7.2) we can see that
VaR0.04 (X) = 1,
VaR0.025 (X) = 19.
In loose terms, this means that the probability of the loss exceeding VaRα
is no greater than α. In other words, at confidence level 1 − α, our loss is
no worse than VaRα .
Simple algebraic properties of VaR follow from those we proved for the
upper quantile:
104 Value at Risk
Proposition 7.9
Let X, Y be random variables.
(i) X ≥ Y implies VaRα (X) ≤ VaRα (Y),
(ii) For any a ∈ R, VaRα (X + a) = VaRα (X) − a,
(iii) For any a ≥ 0, VaRα (aX) = aVaRα (X).
Proof The proof follows from the properties of quantiles proved in Propo-
sition 7.4, and is left as an exercise.
Example 7.10
Suppose that we invest V(0) risk-free. Then V(T ) = erT V(0), giving
X = e−rT V(T ) − V(0) = 0.
The distribution function of X is then
(
1 for x ≥ 0,
F X (x) =
0 for x < 0.
For any α ∈ (0, 1), qα (X) = 0, which gives
VaRα (X) = −qα (X) = 0.
7.3 Computing VaR: examples 105
Example 7.11
Consider
(
−20 with probability 0.025,
X= (7.3)
−10 with probability 0.025,
and P(X > 0) = 0.95. For x < 0
0 x ∈ (−∞, −20),
F X (x) =
0.025 x ∈ [−20, −10),
0.05 x ∈ [−10, 0).
Taking α = 0.05 we have
VaR0.05 (X) = −q0.05 (X) = 10.
For any α < 0.05,
VaRα (X) = −qα (X) = 20,
which demonstrates that VaRα can be sensitive to the choice of α.
Let us now change the value −20 in (7.3) to −2000. The VaR0.05 still
remains equal to 10! This illustrates that VaR does not take into consider-
ation unlikely events (i.e. with probability below the chosen threshold α),
whatever the severity of their outcome. This is an undesirable feature in a
risk measure.
Example 7.12
Consider two independent investments X1 , X2 with gains
(
0 with probability p,
Xi =
1 with probability 1 − p,
106 Value at Risk
for i = 1, 2. We can think of these as corporate bonds with the same price
and maturity date, of two independent companies that each have a proba-
bility of default with zero recovery equal to p.
If p < α then
VaRα (X1 ) = VaRα (X2 ) = 0.
If, instead, we buy half a unit of each of the two bonds, then our gain will
be equal to
with probability p2 ,
0
1 1
X1 + X2 =
1
with probability 2p(1 − p),
2 2 12
with probability (1 − p)2 .
If we choose α ∈ (p, p2 + 2p(1 − p)) then
!
1
F 12 X1 + 12 X2 = p2 + 2p(1 − p) > α
2
hence !
α 1 1 1
VaR X1 + X2 = .
2 2 2
We can see that
!
1 1
VaRα X1 + X2 > max {VaRα (X1 ), VaRα (X2 )} ,
2 2
which means that the risk of a diversified position, as measured by VaR,
is greater than the risk of investing all our funds in a single bond. This
runs counter to the principle that diversification should reduce risk, and
therefore illustrates a second serious drawback in using VaR to measure
risk. In the next chapter we will consider risk measures designed to remedy
these defects.
From examples explored so far we see that finding VaR in the case of
discrete distributions is an easy task. This is summarised in the following
lemma.
Lemma 7.13
Assume that X is a discrete random variable with P(X = xi ) = pi , pi =
PN
i=1
1, and x1 < x2 < · · · < xN . Then
This gives
qα (X) = inf{x : α < P(X ≤ x)} (by (7.1))
= min{xk : α < P(X ≤ xk )} (since X ∈ {x1 , . . . , xN })
= min{xk : α < ki=1 pi }
P
(by (7.4))
= max{xk : k−1
i=1 pi ≤ α}
P
(by (7.5))
= xkα (by definition of kα ).
This concludes our proof, since VaRα (X) = −qα (X).
We now turn to the computation of VaR for random variables with con-
tinuous distributions. For a standard normal random variable Z, with distri-
z2
Rx
bution function N(x) = √2π −∞ e dz, Lemma 7.5 yields
1 − 2
Example 7.14
Suppose that today’s price of the stock is equal to S (0). Assume also that
the price of the stock at time T is equal to S (T ) = S (0)em+σZ , with Z having
standard normal distribution N(0, 1). We shall compute VaRα (X) for
X = e−rT S (T ) − S (0).
By Lemma 7.5, qα (Z) = N −1 (α), where N is the standard normal cumu-
lative distribution function. Observing that
X = f (Z),
108 Value at Risk
where
f (ζ) = e−rT S (0)em+σζ − S (0)
is an increasing function,
VaRα (X) = −qα ( f (Z))
= − f (qα (Z)) (by Lemma 7.6)
(7.6)
= − f (N−1 (α)) (by Lemma 7.5)
= S (0) 1 − em−rT +σN (α) .
−1
Lemma 7.15
Let f : R → R be a non-decreasing right-continuous function. Then
We now show that VaR can be computed using Monte Carlo simulations.
First we need some auxiliary results.
P
For a sequence of random variables {Yi }∞i=1 we write Yi → Y to denote
that Yi converges to Y in probability. (See [PF] for details of the standard
results and terminology from probability we use here.)
Lemma 7.16
Let X1 , X2 , . . . be a sequence of i.i.d. random variables, Xi : Ω → R, with
the same distribution as X. Let x ∈ R be fixed. If we take a sequence of
random variables F N (x) : Ω → R defined as
N
1 X
F N (x) = 1{X ≤x} ,
N i=1 i
P
then F N (x) → F X (x).
7.4 VaR in the Black–Scholes model 109
Proof Let us introduce the following notation: Yi = 1{Xi ≤x} and Y = 1{X≤x} .
PN P
By the weak law of large numbers (see [PF]), N1 i=1 Yi → E(Y), hence
N
1 X P
F N (x) = Yi → E(Y) = E 1{X≤x} = P (X ≤ x) = F X (x),
N i=1
as required.
Suppose now that X̂1 , . . . , X̂N are results of simulations following the
same distribution as X and let
N
1 X
F̂ N (x) = 1 .
N i=1 {X̂i ≤x}
Exercise 7.5 Derive the formula for E(X(x,y) ). Taking the values S (0),
µ, σ, r and T as in Exercise 7.4, plot the set
VaRα (X(x,y) ), E(X(x,y) ) : x ∈ [0, 1], y = (1 − x)S (0) .
n o
and compare with the plot obtained in Exercise 7.5. Which is more
efficient, reducing VaR with bonds or with forward contracts?
Lemma 7.19
If 0 < z ≤ x then
23
22
21
20
80
85 90
95
100
By Lemma 7.15
VaRα X(x,z) = − f (qα (S (T )))
Example 7.20
Assume that we want to invest V0 at time zero and buy x shares of stock.
In order to have V(x,z) (0) = V0 we need to buy
V0 − xS (0)
z = z(K) =
P(r, T, K, S (0), σ)
put options. Depending on the choice of the strike price K we obtain dif-
ferent values of
VaRα X(x,z(K)) = V0 − e−rT xqα (S (T )) + z(K) (K − qα (S (T )))+
z(K) is small. On the other hand, if we choose a low strike price, then
we can buy a larger number z(K) of options, but each offers lower payoff
(K − qα (S (T )))+ . An optimal choice of the strike price K lies somewhere
between these extremes (see Figure 7.4).
Usually we do not have full freedom of choice for the strike price of a
put option and need to choose between options which are available on the
market. Let us assume that we can invest in n put options with strike prices
K1 , . . . , Kn and maturity T. We denote by Hi (t) the payoff of a put option
with strike price Ki ; in particular
Assume that we buy x shares of stock and zi put options with strike
prices Ki , for i = 1, . . . , n. Let z, 1 and H(t) for t = 0, T be vectors in Rn
defined as
z1 1 H1 (t)
z = ... , 1 = ... , H(t) = ... .
zn 1 Hn (t)
Proposition 7.21
n
If zi ≥ 0, for i = 1, . . . , n, and zi = zT 1 ≤ x, then
P
i=1
i=1
Example 7.22
Consider the Black–Scholes model with parameters S (0) = 100, µ = 10%,
σ = 0.2 and r = 3%. Assume that we want to invest V0 = 1000 in stock
and put options with strike prices K1 = 75, K2 = 90, K3 = 110 with
116 Value at Risk
c x z1 z2 z3 VaRα
Evidently it does not make sense to buy put options with strike prices below
qα (S (T )). Looking at the table we can see that when c is small, then we buy
options which are cheaper. When c is large, we can afford to spend money
on options with higher strike price, which offer better protection. A full
picture is obtained when we look not only at VaR, but at the distribution of
X in Figure 7.5.
500 1
250 0.5
0
100
125
150
-‐250
250
500
-‐250
Figure 7.5 The discounted gain X(x,z) from Example 7.22 for various levels
of c (left), and its distribution function (right).
expensive and provide good protection, can be financed by taking short po-
sitions in puts whose strike price is below qα (S (T )). Such short positions in
puts are ignored in the computation of VaR since their exercise is unlikely.
Thus, we can obtain a position with a very small (even negative) VaR.
Example 7.23
Consider the data from Example 7.22. Suppose that we want to invest V0 =
1000 and decide to buy x = 20 shares of stock and hedge them with z2 = 0
and z3 = 20 put options with strike prices K2 and K3 , respectively. Clearly
V(0) does not provide enough funds to enter such a position. We decide
to finance our strategy by taking a short position in put options with strike
price K1
1
z1 = (V0 − xS (0) − z3 H3 (0)) = −3056.
H1 (0)
Clearly our strategy is not a good idea. Common sense dictates that the
short position in unhedged puts will be catastrophic if S (T ) < K1 . For
instance, if the future price of stock should fall to say 70, then the value of
the strategy would be
20 · 70 − 3056 · (75 − 70) + 20 · (110 − 70) = −13 080,
leading to a loss exceeding thirteen thousand. Since the probability of this
is small,
P(S (T ) < K1 ) < P(S (T ) ≤ qα (S (T ))) = α,
118 Value at Risk
Example 7.24
Consider n stocks S 1 , . . . , S n , whose prices at time T evolve according to
σ2j n
X √
S j (T ) = S j (0) exp µ j − T + c jl T Zl ,
2 l=1
0.5
0.05
100 100
Figure 7.6 Monte Carlo simulation for VaR in the Black–Scholes market
from Example 7.24.
to obtain a sequence of simulated gains that can be used to estimate VaRα (X)
using (7.8).
In Figure 7.6 we have a plot of FYN obtained from N = 30 000 simula-
tions, for the following parameters:
S 1 (0) = 100, S 2 (0) = 200, S 3 (0) = 300,
7.5 Proofs
Proposition 7.4
Let X, Y be random variables.
(i) X ≥ Y implies qα (X) ≥ qα (Y).
(ii) For any b ∈ R, qα (X + b) = qα (X) + b.
(iii) For b > 0, qα (bX) = bqα (X).
(iv) qα (−X) = −q1−α (X).
Proof If X ≥ Y then
F X (x) = P(X ≤ x) ≤ P(Y ≤ x) = FY (x),
hence α < F X (x) implies that α < FY (x). This means that
{x : α < F X (x)} ⊂ {x : α < FY (x)}
which gives
qα (X) = inf{x : α < F X (x)} ≥ inf{x : α < FY (x)} = qα (Y).
The second property follows since with Y = X + b we have
FY (x + b) = P(X + b ≤ x + b) = F X (x),
so that
qα (X + b) = inf{x + b : α < FY (x + b)}
= inf{x : α < FY (x + b)} + b
= inf{x : α < F X (x)} + b
= qα (X) + b.
Since P(bX ≤ x) = P(X ≤ x/b) we see similarly that
FbX (x) = F X (x/b),
hence for b > 0
qα (bX) = inf{x : α < FbX (x)}
= inf{x : α < F X (x/b)}
= inf {by : α < F X (y)}
= b inf{y : α < F X (y)}
= bqα (X).
To prove (iv) we first need to show that for any b ∈ R
inf{x : b ≤ P (X ≤ x)} = inf{x : b ≤ P (X < x)}. (7.22)
7.5 Proofs 121
Since P (X < x) ≤ P (X ≤ x) , if b ≤ P (X < x) then b ≤ P (X ≤ x) , which
means that
hence
We shall now rule out the possibility that the above inequality is strict.
Suppose that
for which
b ≤ P (X ≤ x̂) ,
In the previous chapter Value at Risk was shown to have two potentially
undesirable features:
• VaR provides no information on the size of potential losses in scenarios
with probability less than α.
• VaR recorded for a diversified position may exceed that recorded for a
position with all funds held in one security.
On the other hand, VaR has the advantage of simplicity: it produces a single
number to quantify the risk of holding a given risky position. However, it
does this by taking account only of the α-quantile, rather than of the whole
distribution.
While VaR has retained much of its popularity with practitioners, many
observers have commented that the 2007/8 banking crisis revealed that fi-
nancial markets can be unduly optimistic in their evaluations of risk. This
chapter takes its title from a seminal paper by Artzner, Delbaen, Eber and
Heath in 1999,1 which highlighted the defects of VaR and proceeded to set
out, as axioms, four algebraic properties for risk measures to be coherent,
as well describing a wide class of such measures. This approach has since
won many adherents and spawned a very considerable research literature,
including further generalisations.
We introduce particular examples of coherent measures, beginning with
1
P. Artzner, F. Delbaen, J.-M. Eber, D. Heath, Coherent measures of risk, Mathematical
Finance 9, (1999), 203–228.
124
8.1 Average Value at Risk 125
↵
}
Figure 8.1 α times AVaRα (X) is the area for the loss corresponding to the
tail of the distribution.
the most natural adaptation of VaR, widely known as AVaR. We will derive
equivalent expressions for this risk measure, show that it is sub-additive,
compare it with other risk measures proposed as alternatives to VaR, and
outline its generalisation to spectral measures. We will also examine AVaR
in the Black–Scholes model by revisiting, with AVaR replacing VaR, the
hedging techniques with European puts described in Section 7.4.
Proposition 8.2
For X ≤ Y and any real number m we have:
(i) AVaRα (X) ≥ AVaRα (Y);
(ii) AVaRα (X + m) = AVaRα (X) − m;
(iii) for λ ≥ 0, AVaRα (λX) = λAVaRα (X).
Exercise 8.2 Prove that Lemma 8.4 holds also for Y(ω) = qU(ω) (X).
Hence Lemma 8.4 implies that for any integrable random variable X we
have
Z 1
q s (X)ds = E(Y) = E(X), (8.1)
0
Exercise 8.3 Show that (8.1) holds also when we replace q s (X) with
q s (X).
128 Coherent measures of risk
We now apply (8.1) to obtain an alternative description of AVaR.
Proposition 8.5
For any α ∈ (0, 1)
1h
AVaRα (X) = − E(X1{X<qα (X)} ) + qα (X)(α − P(X < qα (X)) .
i
(8.2)
α
Proof Let x− denote the negative part of x, i.e. x− = − min{x, 0}. Since
f (x) = −x− is a non-decreasing function, by Lemma 7.6 for any random
variable Y and any β ∈ (0, 1),
qβ (−Y − ) = qβ ( f (Y)) = f (qβ (Y)) = −(qβ (Y))− . (8.3)
Let us write qα (X) = qα for ease of notation. The claim now follows by
computing
1 α β
Z
α
AVaR (X) = − q (X)dβ
α 0
Z α
1
=− (qβ (X) − qα )dβ − qα
α 0
1 1
Z
=− −(qβ (X) − qα )− dβ − qα (for β ≤ α, qβ (X) ≤ qα )
α 0
1 1
Z
=− −(qβ (X − qα ))− dβ − qα (by Proposition 7.4)
α 0
1 1 β
Z
=− q (−(X − qα )− )dβ − qα (using (8.3))
α 0
1
= − E(−(X − qα )− ) − qα (using (8.1))
αZ
1
=− (X − qα )dP − qα
α {X<qα }
"Z Z #
1 α α
=− XdP − q dP + αq
α {X<qα } {X<qα }
1h
= − E(X1{X<qα } ) + qα (α − P(X < qα )) .
i
α
We can now formulate a corollary that allows us to compute AVaR for
discretely distributed random variables.
8.2 Quantiles and representations of AVaR 129
Corollary 8.6
Assume that X is a discrete random variable with P(X = xi ) = pi , p1 + · · · +
pN = 1, and x1 < x2 < · · · < xN . Then
k −1
α α −1
kX
α 1 X
AVaR (X) = − pi xi + xkα α −
pi ,
α i=1 i=1
we compute
α
E(X1αX ) = E X1{X<qα } + X α−P(X<q
)
P(X=qα )
1{X=qα }
α
= E(X1{X<qα } ) + {X=qα } X α−P(X<q
R )
P(X=qα )
dP
α
= E(X1{X<qα } ) + α−P(X<q )
R
P(X=qα ) {X=qα }
XdP
= E(X1{X<qα } ) + qα (α − P(X < qα )) (using (8.9))
α
= −αAVaR (X), (using (8.2))
as required.
8.2 Quantiles and representations of AVaR 131
Let us observe that the random variable Z(ω) = α1 1αX (ω) is integrable,
bounded above by α1 and has expectation 1, as shown in Lemma 8.8. We
can therefore define a new probability measure, which we denote by QαX ,
as
Z
α
QX (A) = ZdP.
A
Theorem 8.10
For α ∈ (0, 1) let
( )
dQ 1
Pα = Q : Q is a probability measure, Q P, ≤ .
dP α
Then
sup{−EQ (X) : Q ∈ Pα } = AVaRα (X).
dQα
+ {X>qα } X dQ dP + Ω X dPX dP.
R R
dP
We now examine one by one the four integrals in the above expression.
By definition, dQ
dP
≤ α1 , hence on {X < qα }
!
dQ 1
(X − qα ) − ≥ 0,
dP α
giving
Z ! Z !
dQ 1 α dQ 1
X − dP ≥ q − dP. (8.13)
{X<qα } dP α {X<qα } dP α
Evidently,
Z ! Z !
dQ 1 dQ 1
X − κ dP = qα − κ dP. (8.14)
{X=qα } dP α {X=qα } dP α
dQ
Since dP
≥ 0,
Z Z
dQ dQ
X dP ≥ qα dP. (8.15)
α
{X>q } dP {X>qα } dP
Finally, for the last of the four integrals we see that
dQα
Z
X X dP = EQαX (X). (8.16)
Ω dP
8.2 Quantiles and representations of AVaR 133
Substituting (8.13)–(8.16) into our formula for EQ (X) we obtain
Z ! Z !
dQ 1 dQ 1
EQ (X) ≥ qα − dP + qα − κ dP
{X<qα } dP α {X=qα } dP α
Z
dQ
+ qα dP + EQαX (X)
α
{X>q } dP
Z Z
α 1 1
=− q dP − qα κdP
{X<qα } α {X=qα } α
Z
dQ
+ qα dP + EQαX (X)
Ω dP
1
= −qα P(X < qα ) − qα κP(X = qα ) + qα + EQαX (X)
1
α α
= EQαX (X) . (using (8.6))
We have shown that −EQ (X) ≤ −EQα (X). Since QαX ∈ Pα , this implies that
sup{−EQ (X) : Q ∈ Pα } = −EQαX (X) = AVaRα (X),
as required.
We are finally ready to prove Theorem 8.3. The result follows from The-
orem 8.10 and we formulate it as a corollary.
Corollary 8.11
AVaR is sub-additive:
AVaRα (X + Y) ≤ AVaRα (X) + AVaRα (Y).
Proof We use the fact that for two functions f, g : U → R, where U is an
arbitrary set,
sup { f (x) + g(x)} ≤ sup f (x) + sup g(x). (8.17)
x∈U x∈U x∈U
Let us fix X and Y. We can apply (8.17) taking U = Pα , f (Q) = −EQ (X),
and g(Q) = −EQ (Y) to obtain
AVaRα (X + Y) = sup{−EQ (X + Y) : Q ∈ Pα }
= sup{EQ (−X) + EQ (−Y) : Q ∈ Pα }
≤ sup{EQ (−X) : Q ∈ Pα } + sup{EQ (−Y) : Q ∈ Pα }
(using (8.17))
= AVaR (X) + AVaRα (Y),
α
as required.
134 Coherent measures of risk
The next exercise provides an alternative direct proof of sub-additivity.
The idea is the same as in the proof of Theorem 8.10.
Exercise 8.5 Show that for X ≤ Y and any real number m we have:
(i) TCEα (X) ≥ TCEα (Y);
(ii) TCEα (X + m) = TCEα (X) − m;
(iii) for λ ≥ 0, TCEα (λX) = λTCEα (X).
Example 8.13
Let Ω = {ω1 , ω2 , ω3 } and
P({ω1 }) = P({ω2 }) = 0.03,
P({ω3 }) = 0.94.
Let α = 0.05 and define random variables X, Y by setting
X(ω1 ) = −100, X(ω2 ) = 0, X(ω3 ) = 0,
Y(ω1 ) = 0, Y(ω2 ) = −100, Y(ω3 ) = 0.
We claim that
TCEα (X + Y) > TCEα (X) + TCEα (Y).
Since
qα (X) = inf{x : F X (x) > 0.05} = 0,
and {X ≤ 0} = Ω, we see that
TCEα (X) = −E (X|X ≤ qα (X)) = −E (X|Ω) = −E (X)
= − [0.03 × (−100) + 0.97 × 0] = 3.
By an identical computation, also
TCEα (Y) = 3.
On the other hand, Z = X + Y has
qα (Z) = inf{x : FZ (x) > 0.05} = −100,
and {Z ≤ qα (Z)} = {ω1 , ω2 }, hence
TCEα (Z) = −E(Z|Z ≤ qα (Z))
1
=− (Z(ω1 )P({ω1 }) + Z(ω1 )P({ω1 }))
P(Z ≤ qα (Z))
1
=− (−100 × 0.03 − 100 × 0.03)
0.06
= 100.
This demonstrates a serious shortcoming of the tail-conditional expectation
as a risk measure. Since
!
α 1 1 1 1
TCE X + Y = TCEα (X + Y) ≥ [TCEα (X) + TCEα (Y)] ,
2 2 2 2
136 Coherent measures of risk
The example shows that TCE shares the same defect as VaR. Fortunately
AVaR, even though its computation is slightly more involved, has much
more desirable properties.
Lemma 8.14
For any q ∈ R
1 √
S (0)eµT N q − σ T ,
E (S (T )|Z ≤ q) =
N(q)
Lemma 8.15
For the discounted gain
X = e−rT S (T ) − S (0)
we have
1 √
AVaRα (X) = S (0) − S (0)e(µ−r)T N qα (Z) − σ T .
α
Proof By Lemma 7.17 we know that
2
√
α µ− σ2 T +σ T qα (Z)
q (S (T )) = S (0)e , (8.21)
therefore
and
Pα (K) = Ke−µT N(−d−µ,α ) − S (0)N −d+µ,α .
(8.23)
Proposition 8.16
If z ∈ [0, x], then
1 √
AVaRα X(x,z) = V(x,z) (0) − e(µ−r)T xS (0)N qα (Z) − σ T + zPα (K) .
h i
α
Proof We first observe that
X(x,z) = e−rT V(x,z) (T ) − V(x,z) (0)
= e−rT xS (T ) + z (K − S (T ))+ − V(x,z) (0).
(8.24)
140 Coherent measures of risk
1 1
1 1
Figure 8.2 F X(x,z) for various z. The dotted line represents X(x,z) for S (T ) = K.
= −E X(x,z) |Z ≤ qα (Z)
(by (8.26))
α
= V(x,z) (0) − e −rT
xE (S (T )|Z ≤ q (Z)) (see (8.24))
−rT
zE (K − S (T ))+ |Z ≤ qα (Z) .
−e (8.27)
We now compute the last term in (8.27). By (8.20),
{S (T ) ≤ K} = Z ≤ −d−µ ,
8.3 AVaR in the Black–Scholes model 141
hence,
E (K − S (T ))+ |Z ≤ qα (Z)
= E (K − S (T )) 1{Z≤−d−µ } |Z ≤ qα (Z)
α µ
√ !
1 min(q (Z),−d− )
Z
2
µ− σ2 T +σ T x 1 2
= K − S (0)e √ e−x dx
α −∞ 2π
Z −d−µ,α
1 1 2
= K √ e−x dx (min(a, b) = − max(−a, −b))
α −∞ 2π
Z µ,α √
1 −d−
µ− σ2 T +σ T x 1
2
2
− S (0)e √ e−x dx
α −∞ 2π
1 1
= KN(−d−µ,α ) − P(Z ≤ −d−µ,α )E S (T )|Z ≤ −d−µ,α
α α
1 1 √
= KN(−d− ) − S (0)eµT N −d−µ,α − σ T
µ,α
(by Lemma 8.14)
α α
1
= eµT Ke−µT N(−d−µ,α ) − S (0)N −d+µ,α .
α
Substituting the above into (8.27) and applying Lemma 8.14 gives the
claim.
We now need to consider the case when z = x. Since for any β ∈ (0, 1)
(see Figure 8.2)
lim qβ (X(x,z) ) = qβ (X(x,x) ),
z%x
we obtain
−1 α β
Z
lim AVaRα X(x,z) = lim
q (X(x,z) )dβ
z%x z%x α 0
−1 α β
Z
= q (X(x,x) )dβ
α 0
= AVaRα X(x,x) .
Hence the result follows from the fact that the formula for AVaRα (X(x,z) ) in
the claim is continuous with respect to z.
26
24
22
20
80 90 100
Figure 8.3 AVaR of a fixed position in x stocks, hedged with puts (parame-
ters of the model are as in Exercise 8.11).
Example 8.17
Suppose that we spend V0 to buy a fixed number x of stocks, together with
z put options. The number of options we can buy depends on the choice of
the strike price K,
V0 − xS (0)
z = z(K) = .
P(r, T, K, S (0), σ)
We consider AVaRα (X(x,z(K)) ) for K such that z(K) ≤ x.
In Figure 8.3 we see that the smallest AVaR is attained for the smallest
considered strike price, for which z(K) = x. On the plot we also see that
AVaR dominates VaR, and that the two are equal when z(K) = x.
Example 8.18
From Example 8.17 we see that AVaR is minimised when we buy the same
number of shares of stock and European put options. Suppose therefore
that we invest V0 to buy x shares of stock and x puts. Here x depends on the
choice of the strike price K (since the higher the strike, the more expensive
8.3 AVaR in the Black–Scholes model 143
30
6
20
4
10 2
50 100 150 10 20 30
Figure 8.4 AVaR of a position in the same number of stocks and puts, for
data from Exercise 8.11.
and
AVaRα X(x(K),x(K)) , E(X(x(K),x(K)) ) |K ≥ 0 ,
Example 8.20
Consider the Black–Scholes model with parameters S (0) = 100, µ = 10%,
σ = 0.2 and r = 3%. Assume that we spend V0 = 1000, investing in
stock and put options with strike prices K1 = 75, K2 = 90, K3 = 110 and
expiry T = 1. We shall solve the problem (8.29) for α = 0.05, considering
c = 0, 10, 30, 50 and 80.
The choice of x depends on c, since
xS (0) + c = V0 .
We compute the vectors H(0) and Pα using (8.22) and (8.23), respec-
tively,
0.406 0.140
H(0) = 2.769 , Pα = 0.819 .
12.042 1.724
The solutions to the problem (8.29) are shown in the table:
146 Coherent measures of risk
c x z1 z2 z3 AVaRα
From the table we can see that for larger c we can afford to buy options
with higher strike prices, which provide better protection, but are at the
same time more expensive.
Example 8.21
Consider the n-dimensional Black–Scholes market from Example 7.24.
Using the same Monte Carlo simulation that was used to compute VaR in
Example 7.24, we can compute the AVaR for the position using (8.4) and
Corollary 8.6. We thus obtain AVaRα (YN ) = 61.75 from the simulation.
8.4 Coherence
In this section we provide an axiomatic description of a certain class of
measures of risk. It will be apparent that this class contains AVaR, but not
VaR.
By a risk measure we mean a number ρ(X) ∈ R that is assigned to a
8.4 Coherence 147
random variable X to represent its risk. The following axioms are seen as
natural requirements for a satisfactory risk measure.
Definition 8.22
A risk measure ρ is coherent if it is:
(i) monotone: X ≤ Y implies ρ(X) ≥ ρ(Y);
(ii) cash-invariant: ρ(X + m) = ρ(X) − m;
(iii) positively homogeneous: for all λ ≥ 0, ρ(λX) = λρ(X);
(iv) sub-additive: for any X, Y,
ρ(X + Y) ≤ ρ(X) + ρ(Y).
Note that, by (ii), ρ(X+ρ(X)) = 0, so that ρ(X) is the minimum amount of
additional investment we need to add to X to ensure that the final position
eliminates risk, as measured by ρ. In other words,
ρ(X) = inf{m ∈ R : ρ(X + m) ≤ 0}.
More generally, a position X is said to be acceptable if ρ(X) ≤ 0.
Exercise 8.16 Show that any coherent risk measure ρ is convex: for
λ ∈ [0, 1]
ρ(λX + (1 − λ)Y) ≤ λρ(X) + (1 − λ)ρ(Y).
Show conversely that if a risk measure ρ is convex and positively ho-
mogeneous, then it is coherent.
Definition 8.24
Suppose that R is a family of probability measures satisfying R ⊂ {Q :
Q P}. We define a risk measure ρR by setting
Proposition 8.25
For any family R of probability measures absolutely continuous with re-
spect to P,
ρR (X) = sup{−EQ (X) : Q ∈ R}
ρR (X + m) = sup{−EQ (X + m) : Q ∈ R}
= sup{−EQ (X) : Q ∈ R} − m
= ρR (X) − m.
We have −EQ (λX) = −λEQ (X), so for λ ≥ 0, taking the supremum over
Q in R gives ρR (λX) = λρR (X).
Finally, to prove sub-additivity, we use the fact that for two functions
f, g : U → R, where U is an arbitrary set,
Let us fix X and Y. We apply (8.30) taking U = R, f (Q) = −EQ (X), and
8.4 Coherence 149
g(Q) = −EQ (Y). Thus
ρR (X + Y) = sup −EQ (X + Y)
Q∈R
as required.
AVaR was
n our first example o of such a coherent risk measure: taking
R = Pα = Q : Q P, dQ
dP
≤ 1
α
gives AVaRα , as we saw in Theorem 8.10.
We now consider some further examples.
Example 8.26
Take Rmin = {P}, which gives ρmin = −EP (X). This is a coherent risk mea-
sure by Proposition 8.25, but is not very useful. We see that if EP (X) ≥ 0
then ρmin (X) is negative, indicating that any random variable with positive
expectation is acceptable.
Example 8.27
At the other extreme, we obtain a risk measure that is too stringent for
practical use if we define
ρmax (X) = −ess inf X.
The right-hand side means that we can have X(ω) < −ess inf X only on
a P-null set. The requirement ρmax (X) ≤ 0 therefore means that this risk
measure allows negative positions X(ω) only for a P-null set of ω in Ω.
Hence ρmax (X) = inf{m ∈ R : X + m ≥ 0 P-a.s.}.
Exercise 8.19 Consider the probability space (Ω, F , P) and the ran-
dom variables X, Y defined in Example 8.13. Verify that WCEα (X) =
WCEα (Y) = 50, and AVaRα (X) = AVaRα (Y) = 60, when α = 0.05.
Verify that in this example, WCE is additive. Compare the risk mea-
sures VaR, TCE, WCE and AVaR for X.
we see that
dQA 1A
= .
dP P(A)
Taking any A satisfying P(A) > α, we see that dQ
dP
A
≤ α1 , so
( )
dQ 1
QA ∈ Pα = Q : Q P, ≤ ,
dP α
8.4 Coherence 151
hence
AVaRα (X) = sup −EQ (X) ≥ −EQA (X) = WCEα (X).
sup
Q∈Pα QA ,P(A)>α
1
≥ − P(B) B
−VaRα (X)dP (on B, X ≤ −VaRα (X))
α
= VaR (X). (since VaRα (X) is a constant)
In (8.19) we have shown that when F X is continuous, AVaRα (X) =
TCEα (X), hence both equal WCEα (X).
One potential difficulty with AVaR is that it restricts attention to the α-
tail of the distribution function F X rather than taking the whole distribution
of X into account. Moreover, in taking averages it assigns the same weight
to any qβ (X) for β < α. A natural route to more general risk measures is to
assign different weights to different β.
Definition 8.30
Let ϕ : (0, 1) → R be a non-negative, non-increasing function satisfying
Z 1
ϕ(x)dx = 1.
0
We define
Z 1
ϕ
ρ (X) = − qβ (X)ϕ(β)dβ
0
Example 8.31
For α ∈ (0, 1) we recover AVaRα (X) by choosing ϕ(β) = α1 1[0,α] (β), since
1 α β
Z 1 Z
β
− q (X)ϕ(β)dβ = − q (X)dβ = AVaRα (X).
0 α 0
Theorem 8.32
A spectral risk measure ρϕ is coherent.
and, for 0 < a < b < 1, setting ν((a, b]) = ϕ(a) − ϕ(b). This defines ν as an
additive set function on intervals (a, b] ⊂ (0, 1), which extends to a unique
measure ν on all Borel sets A in (0, 1). Now set
Z
µ(A) = xdν(x).
A
For pairs (x, y), read the inequalities 0 < y < x < 1 from left to right and
right to left respectively, to obtain
= − (0,1) qβ (X)ϕ(β)dβ
R
(by (8.31))
= ρϕ (X),
hence the theorem is proved.
The flexibility inherent in the choice of ϕ means that individual’s sub-
jective risk profiles can be mapped onto spectral risk measures to obtain
different assessments of risk. We content ourselves with just one example.
154 Coherent measures of risk
Example 8.33
Recall the exponential utility function u(x) = −e−ax introduced in Chapter
6, where a is the investor’s absolute risk aversion coefficient. We obtain
the corresponding weighting function in the form ϕ(x) = ke−ax , since with
k > 0 we have ϕ ≥ 0 and ϕ is (strictly) decreasing on [0, 1]. To ensure
that it is an admissible risk spectrum, we simply need to choose k such that
R1
0
ϕ(t)dt = 1, which forces k = 1−ea −a . The spectral risk measure
Z
ϕ a
ρ (X) = (−qβ (X))e−aβ dβ
1 − e−a (0,1)
thus takes account of the investor’s risk aversion by giving most weight to
the worst outcomes.
8.5 Proofs
Lemma 8.4
Let X : Ω → R be a random variable. Assume that U is a uniformly
distributed random variable on (0, 1). Then the random variable Y, defined
by Y(x) = qU(x) (X), has the same distribution as X.
g(α) = qα (X).
Then Y = g(U).
Since U is a uniformly distributed random variable on (0, 1), for any
Borel set A ⊂ (0, 1) the probability that U is in A is
Prob(U ∈ A) = m(A),
g(α) = qα (X) = y.
(There is a possibility that such α does not exist. This is when y lies below
the flat part of the distribution function F X (y); see Figure 7.1 on page 100.)
This means that the pre-image g−1 (y) consists of at most a single point,
8.5 Proofs 155
hence
Prob(g(U) = y) = Prob(U ∈ g−1 (y)) = m(g−1 (y)) = 0. (8.33)
By the definition of the upper quantile, i.e.
qα (X) = inf{x : α < F X (x)}, (8.34)
we see that if α < F X (x) then qα (X) ≤ x. This means that
{α : α < F X (y)} ⊂ {α : qα (X) ≤ y} = {α : g(α) ≤ y} , (8.35)
hence
FY (y) = Prob (Y ≤ y)
= Prob(g(U) ≤ y)
≥ Prob(U < F X (y)) (by (8.35))
= F X (y).
Again, by the definition of qα (X) (see (8.34)), we see that if qα (X) < x
then α < F X (x), hence
{α : g(α) < y} = {α : qα (X) < y} ⊂ {α : α < F X (y)} . (8.36)
This gives
FY (y) = Prob (Y ≤ y)
= Prob(g(U) ≤ y)
= Prob(g(U) < y) + Prob(g(U) = y)
= Prob(g(U) < y) (by (8.33))
≤ Prob(U < F X (y)) (by (8.36))
= F X (y).
We have shown that FY (y) = F X (y), which concludes our proof.
Corollary 8.7
If X is a random variable whose distribution function F X is strictly increas-
ing and continuous, then
AVaRα (X) = −E(X|X ≤ qα (X)).
Proof Since F X is continuous, for any q ∈ R,
P(X = q) = 0. (8.37)
By Lemma 7.5
qα (X) = F X−1 (α), (8.38)
156 Coherent measures of risk
hence
P (X < qα (X)) = P(X ≤ qα (X)) − P(X = qα (X))
= P(X ≤ qα (X)) (using (8.37)) (8.39)
= F X (qα (X))
= α. (using (8.38))
Substituting into (8.2) gives
1h
AVaRα (X) = − E(X1{X<qα (X)} ) + qα (X)(α − P(X < qα (X))
i
α
1
= − E(X1{X<qα (X)} )
α
1
= − E(X1{X≤qα (X)} ) (since by (8.37), P(X = qα (X)) = 0)
α
1
=− E(X1{X≤qα (X)} ) (from (8.38))
P (X ≤ qα (X))
= −E(X|X ≤ qα (X)),
as required.
Lemma 8.8
For α ∈ (0, 1), let qα = qα (X) and set
if P(X = qα ) = 0,
(
α 1{X<qα }
1X =
1{X<qα } + κ1{X=qα } if P(X = qα ) > 0,
α−P(X<qα )
where κ = P(X=qα )
. Then
E(1αX ) = α, (8.40)
and for all ω ∈ Ω,
1αX (ω) ∈ [0, 1].
Proof For (8.40) observe that if P(X = qα ) = 0 then
E(1αX ) = P(X < qα ) = P(X ≤ qα ) = α,
while if P(X > qα ) > 0 we have
E(1αX ) = P(X < qα ) + α − P(X < qα ) = α.
To prove the second claim we start by observing that when P(X = qα ) =
0, then 1αX = 1{X<qα } ∈ {0, 1}. If P(X = qα ) > 0 and ω < {X = qα } then
1αX (ω) = 1{X<qα } (ω) ∈ {0, 1}.
8.5 Proofs 157
The only non-trivial case is when P(X = qα ) > 0 and ω ∈ {X = qα }.
In such a case (using the standard notation F X (x− ) = limy%x F X (y)),
P(X = qα ) = F X (qα ) − F X (qα− ),
so that for ω ∈ {X = qα },
α − F X (qα− )
1αX (ω) = κ = . (8.41)
F X (qα ) − F X (qα− )
By definition
qα = inf{x : α < F X (x)},
and we see that for any q < qα we have α ≥ F X (q), hence
α ≥ F X (qα− ).
For any q > qα , α < F X (q), and by right continuity of F X we have
α ≤ F X (qα ).
We have shown that α ∈ [F X (qα− ), F X (qα )], hence the quotient from (8.41)
lies in [0, 1].
Index
159
160 Index
convex, 147 Taylor formula, 46
spectral, 151 two-fund theorem, 60
risk-aversion utility, 78
function, 152 expected, 86
risk-free asset, 22, 25
exponential, 80
risk-neutral probability, 84
scenario, 2 function, 79
security logarithmic, 80
characteristic line, 74 maximisation, 76, 80, 84
dominated, 8 power, 80
market line, 72 quadratic, 80, 92
semi-variance, 9 von Neumann–Morgenstern, 80
Sharpe ratio, 73 utility function
short-selling, 14, 49, 53 exponential, 92, 97, 154
standard deviation, 6
logarithmic, 87, 91
state, 2
state prices, 82 Value at Risk (VaR), 10, 98, 103
strike price, 110 volatility, 109
sub-additive, 126, 133 weighting function, 152, 154
tail conditional expectation (TCE), worst conditional expectation (WCE),
134 150