0% found this document useful (0 votes)
50 views171 pages

Pdfcoffee.com Portfolio Theory and Risk Managementpdf PDF Free

Uploaded by

omkar6142
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views171 pages

Pdfcoffee.com Portfolio Theory and Risk Managementpdf PDF Free

Uploaded by

omkar6142
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 171

Portfolio Theory and Risk Management

With its emphasis on examples, exercises and calculations, this book suits advanced
undergraduates as well as postgraduates and practitioners. It provides a clear treatment
of the scope and limitations of mean-variance portfolio theory and introduces popular
modern risk measures. Proofs are given in detail, assuming only modest mathematical
background, but with attention to clarity and rigour. The discussion of VaR and its
more robust generalizations, such as AVaR, brings recent developments in risk measures
within range of some undergraduate courses and includes a novel discussion of reducing
VaR and AVaR by means of hedging techniques.
A moderate pace, careful motivation and more than 70 exercises give students confi-
dence in handling risk assessments in modern finance. Solutions and additional materi-
als for instructors are available at www.cambridge.org/9781107003675.

maciej j. capi ński is an Associate Professor in the Faculty of Applied Mathematics


at AGH University of Science and Technology in Kraków, Poland. His interests include
mathematical finance, financial modelling, computer-assisted proofs in dynamical sys-
tems and celestial mechanics. He has authored 10 research publications, one book, and
supervised over 30 MSc dissertations, mostly in mathematical finance.

ekkehard kopp is Emeritus Professor of Mathematics at the University of Hull,


where he taught courses at all levels in analysis, measure and probability, stochastic
processes and mathematical finance between 1970 and 2007. His editorial experience
includes service as founding member of the Springer Finance series (1998–2008) and
the Cambridge University Press AIMS Library Series. He has taught in the UK, Canada
and South Africa and he has authored more than 50 research publications and five
books.
Mastering Mathematical Finance

Mastering Mathematical Finance is a series of short books that cover all core topics
and the most common electives offered in Master’s programmes in mathematical or
quantitative finance. The books are closely coordinated and largely self-contained, and
can be used efficiently in combination but also individually.

The MMF books start financially from scratch and mathematically assume only under-
graduate calculus, linear algebra and elementary probability theory. The necessary
mathematics is developed rigorously, with emphasis on a natural development of math-
ematical ideas and financial intuition, and the readers quickly see real-life financial
applications, both for motivation and as the ultimate end for the theory. All books are
written for both teaching and self-study, with worked examples, exercises and solutions.

[DMFM] Discrete Models of Financial Markets,


Marek Capiński, Ekkehard Kopp
[PF] Probability for Finance,
Ekkehard Kopp, Jan Malczak, Tomasz Zastawniak
[SCF] Stochastic Calculus for Finance,
Marek Capiński, Ekkehard Kopp, Janusz Traple
[BSM] The Black–Scholes Model,
Marek Capiński, Ekkehard Kopp
[PTRM] Portfolio Theory and Risk Management,
Maciej J. Capiński, Ekkehard Kopp
[NMFC] Numerical Methods in Finance with C++,
Maciej J. Capiński, Tomasz Zastawniak
[SIR] Stochastic Interest Rates,
Daragh McInerney, Tomasz Zastawniak
[CR] Credit Risk,
Marek Capiński, Tomasz Zastawniak
[FE] Financial Econometrics,
Marek Capiński
[SCAF] Stochastic Control Applied to Finance,
Szymon Peszat, Tomasz Zastawniak

Series editors Marek Capiński, AGH University of Science and Technology, Kraków;
Ekkehard Kopp, University of Hull; Tomasz Zastawniak, University of York
Portfolio Theory and Risk Management

MACIEJ J. CAPI ŃSKI


AGH University of Science and Technology, Kraków, Poland

EKKEHARD KOPP
University of Hull, Hull, UK
University Printing House, Cambridge CB2 8BS, United Kingdom

Cambridge University Press is part of the University of Cambridge.


It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning and research at the highest international levels of excellence.

www.cambridge.org
Information on this title: www.cambridge.org/9781107003675
© Maciej J. Capiński and Ekkehard Kopp 2014
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2014
Printed in the United Kingdom by TJ International Ltd, Padstow Cornwall
A catalogue record for this publication is available from the British Library
Library of Congress Cataloguing in Publication data
Capiński, Maciej J.
Portfolio theory and risk management / Maciej J. Capiński, AGH University of Science and
Technology, Kraków, Poland, Ekkehard Kopp, University of Hull, Hull, UK.
pages cm – (Mastering mathematical finance)
Includes bibliographical references and index.
ISBN 978-1-107-00367-5 (Hardback) – ISBN 978-0-521-17714-6 (Paperback)
1. Portfolio management. 2. Risk management. 3. Investment analysis.
I. Kopp, P. E., 1944– II. Title.
HG4529.5.C366 2014
332.6–dc23 2014006178
ISBN 978-1-107-00367-5 Hardback
ISBN 978-0-521-17714-6 Paperback
Additional resources for this publication at www.cambridge.org/9781107003675
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
To Anna, Emily, Staś, Weronika and Helenka
Contents
Preface page ix
1 Risk and return 1
1.1 Expected return 2
1.2 Variance as a risk measure 5
1.3 Semi-variance 9
2 Portfolios consisting of two assets 11
2.1 Return 12
2.2 Attainable set 15
2.3 Special cases 20
2.4 Minimum variance portfolio 23
2.5 Adding a risk-free security 25
2.6 Indifference curves 28
2.7 Proofs 31
3 Lagrange multipliers 35
3.1 Motivating examples 35
3.2 Constrained extrema 40
3.3 Proofs 44
4 Portfolios of multiple assets 48
4.1 Risk and return 48
4.2 Three risky securities 52
4.3 Minimum variance portfolio 54
4.4 Minimum variance line 57
4.5 Market portfolio 62
5 The Capital Asset Pricing Model 67
5.1 Derivation of CAPM 68
5.2 Security market line 71
5.3 Characteristic line 73
6 Utility functions 76
6.1 Basic notions and axioms 76
6.2 Utility maximisation 80
6.3 Utilities and CAPM 92
6.4 Risk aversion 95

vii
viii Contents
7 Value at Risk 98
7.1 Quantiles 99
7.2 Measuring downside risk 102
7.3 Computing VaR: examples 104
7.4 VaR in the Black–Scholes model 109
7.5 Proofs 120
8 Coherent measures of risk 124
8.1 Average Value at Risk 125
8.2 Quantiles and representations of AVaR 127
8.3 AVaR in the Black–Scholes model 136
8.4 Coherence 146
8.5 Proofs 154
Index 159
Preface
In this fifth volume of the series ‘Mastering Mathematical Finance’ we
present a self-contained rigorous account of mean-variance portfolio the-
ory, as well as a simple introduction to utility functions and modern risk
measures.
Portfolio theory, exploring the optimal allocation of wealth among dif-
ferent assets in an investment portfolio, based on the twin objectives of
maximising return while minimising risk, owes its mathematical formula-
tion to the work of Harry Markowitz1 in 1952; for which he was awarded
the Nobel Prize in Economics in 1990. Mean-variance analysis has held
sway for more than half a century, and forms part of the core curriculum
in financial economics and business studies. In these settings mathematical
rigour may suffer at times, and our aim is to provide a carefully motivated
treatment of the mathematical background and content of the theory, as-
suming only basic calculus and linear algebra as prerequisites.
Chapter 1 provides a brief review of the key concepts of return and risk,
while noting some defects of variance as a risk measure. Considering a
portfolio with only two risky assets, we show in Chapter 2 how the mini-
mum variance portfolio, minimum variance line, market portfolio and cap-
ital market line may be found by elementary calculus methods. Chapter 3
contains a careful account of the method of Lagrange multipliers, includ-
ing a discussion of sufficient conditions for extrema in the special case of
quadratic forms. These techniques are applied in Chapter 4 to generalise
the formulae obtained for two-asset portfolios to the general case.
The derivation of the Capital Asset Pricing Model (CAPM) follows in
Chapter 5, including two proofs of the CAPM formula, based, respectively,
on the underlying geometry (to elucidate the role of beta) and linear alge-
bra (leading to the security market line), and introducing performance mea-
sures such as the Jensen index and Sharpe ratio. The security characteristic
line is shown to aid the least-squares estimation of beta using historical
portfolio returns and the market portfolio.
Chapter 6 contains a brief introduction to utility theory. To keep matters
simple we restrict to finite sample spaces to discuss preference relations.

1
H. Markowitz, Portfolio selection, Journal of Finance 7 (1), (1952), 77–91.

ix
x Preface
We consider examples of von Neumann–Morgenstern utility functions, link
utility maximisation with the No Arbitrage Principle and explain the key
role of state price vectors. Finally, we explore the link between utility max-
imisation and the CAPM and illustrate the role of the certainty equivalent
for the risk averse investor.
In the final two chapters the emphasis shifts from variance to measures
of downside risk. Chapter 7 contains an account of Value at Risk (VaR),
which remains popular in practice despite its well-documented shortcom-
ings. Following a careful look at quantiles and the algebraic properties of
VaR, our emphasis is on computing VaR, especially for assets within the
Black–Scholes framework. A novel feature is an account of VaR-optimal
hedging with put options, which is shown to reduce to a linear program-
ming problem if the parameters are chosen with care.
In Chapter 8 we examine how the defects of VaR can be addressed using
coherent risk measures. The principal example discussed is Average Value
at Risk (AVaR), which is described in detail, including a careful proof of
sub-additivity. AVaR is placed in the context of coherent risk measures, and
generalised to yield spectral risk measures. The analysis of hedging with
put options in the Black–Scholes setting is revisited, with AVaR in place of
VaR, and the outcomes are compared in examples.
Throughout this volume the emphasis is on examples, applications and
computations. The underlying theory is presented rigorously, but as simply
as possible. Proofs are given in detail, with the more demanding ones left to
the end of each chapter to avoid disrupting the flow of ideas. Applications
presented in the final chapters make use of background material from the
earlier volumes [PF] and [BSM] in the current series. The exercises form
an integral part of the volume, and range from simple verification to more
challenging problems. Solutions and additional material can be found at
www.cambridge.org/9781107003675, which will be updated regularly.
1
Risk and return

1.1 Expected return


1.2 Variance as a risk measure
1.3 Semi-variance

Financial investors base their activity on the expectation that their invest-
ment will increase over time, leading to an increase in wealth. Over a fixed
time period, the investor seeks to maximise the return on the investment,
that is, the increase in asset value as a proportion of the initial investment.
The final values of most assets (other than loans at a fixed rate of interest)
are uncertain, so that the returns on these investments need to be expressed
in terms of random variables. To estimate the return on such an asset by a
single number it is natural to use the expected value of the return, which
averages the returns over all possible outcomes.
Our uncertainty about future market behaviour finds expression in the
second key concept in finance: risk. Assets such as stocks, forward con-
tracts and options are risky because we cannot predict their future values
with certainty. Assets whose possible final values are more ‘widely spread’
are naturally seen as entailing greater risk. Thus our initial attempt to mea-
sure the riskiness of a random variable will measure the spread of the re-
turn, which rational investors will seek to minimise while maximising their
return.
In brief, return reflects the efficiency of an investment, risk is concerned
with uncertainty. The balance between these two is at the heart of portfo-
lio theory, which seeks to find optimal allocations of the investor’s initial
wealth among the available assets: maximising return at a given level of
risk and minimising risk at a given level of expected return.

1
2 Risk and return

1.1 Expected return


We are concerned with just two time instants: the present time, denoted
by 0, and the future time 1, where 1 may stand for any unit of time. Sup-
pose we make a single-period investment in some stock with the current
price S (0) known, and the future price S (1) unknown, hence assumed to
be represented by a random variable
S (1) : Ω → [0, +∞),
where Ω is the sample space of some probability space (Ω, F , P) . The
members of Ω are often called states or scenarios. (See [PF] for basic
definitions.)
When Ω is finite, Ω = {ω1 , . . . , ωN }, we shall adopt the notation
S (1, ωi ) = S (1)(ωi ) for i = 1, . . . , N,
for the possible values of S (1). In this setting it is natural to equip Ω with
the σ-field F = 2Ω of all its subsets. To define a probability measure P :
F → [0, 1] it is sufficient to give its values on single element sets, P({ωi }) =
pi , by choosing pi ∈ (0, 1] such that i=1 pi = 1. We can then compute the
PN
expected price at the end of the period
N
X
E(S (1)) = S (1, ωi )pi ,
i=1

and the variance of the price


N
X
Var(S (1)) = (S (1, ωi ) − E(S (1)))2 pi .
i=1

Example 1.1
Assume that S (0) = 100 and
with probability 12 ,
(
120
S (1) =
90 with probability 12 .
Then E(S (1)) = 12 120 + 12 90 = 105 and Var(S (1)) = (120 − 105)2 12 +
(90 − 105)2 12 = 152 . Observe also that the
√ standard deviation, which is the
square root of the variance, is equal to Var(S (1)) = 15.
1.1 Expected return 3

Exercise 1.1 Assume that U, D ∈ R are such that −1 < D < U.


Assume also that S has a binomial distribution, that is
!
  N
P S (1) = S (0) (1 + U) (1 + D)
k N−k
= pk (1 − p)N−k ,
k
for k ∈ {0, 1, . . . , N}. Compute E(S (1)) and Var(S (1)).

When S (1) is continuously distributed, with density function f : R → R,


then
Z ∞
E(S (1)) = x f (x)dx,
−∞

and
Z ∞
Var(S (1)) = (x − E(S (1)))2 f (x)dx.
−∞

Example 1.2
Assume that S (1) = S (0) exp (m + sZ) , where Z is a random variable with
standard normal distribution N(0, 1). This means that S (1) has lognormal
distribution. The density function of S (1) is equal to
x −m 2
1 (ln S (0) )
f (x) = √ e− 2s2 for x > 0,
xs 2π
and 0 for x ≤ 0. We can compute the expected price as
Z ∞
E(S (1)) = x f (x)dx
0
2

1 − (ln S (0)2−m)
Z x

= √ e 2s dx
0 s 2π
Z ∞ !
sy+m 1 1 x
2
− y2
= S (0)e √ e dy (taking y = ln −m )
−∞ 2π s S (0)
Z ∞
s2 1 (y−s)2
= S (0)em+ 2 √ e− 2 dy
−∞ 2π
s2
= S (0)em+ 2 .
4 Risk and return

Exercise 1.2 Consider S (1) from Example 1.2. Show that


 2  2
Var(S (1)) = S (0)2 e s − 1 e2m+s .

While we may allow any probability space, we must make sure that
negative values of the random variable S (1) are excluded since negative
prices make no sense from the point of view of economics. This means
that the distribution of S (1) has to be supported on [0, +∞) (meaning that
P(S (1) ≥ 0) = 1).
The return (also called the rate of return) on the investment S is a ran-
dom variable K : Ω → R, defined as
S (1) − S (0)
K= .
S (0)
By the linearity of mathematical expectation, the expected (or mean) re-
turn is given by
E(S (1)) − S (0)
E(K) = .
S (0)
We introduce the convention of using the Greek letter µ for expectations of
various random returns
µ = E(K),

with various subscripts indicating the context, if necessary.


The relationships between the prices and returns can be written as

S (1) = S (0)(1 + K),


E(S (1)) = S (0)(1 + µ),

which illustrates the possibility of reversing the approach: given the returns
we can find the prices.
The requirement that S (1) is nonnegative implies that we must have
K ≥ −1. This in particular excludes the possibility of considering K with
Gaussian (normal) distribution.
At time 1 a dividend may be paid. In practice, after the dividend is paid,
the stock price drops by this amount, which is logical. Thus we have to
determine the price that includes the dividend; more precisely, we must
distinguish between the right to receive that price (the cum dividend price)
and the price after the dividend is paid (the ex dividend price). We assume
1.2 Variance as a risk measure 5
that S (1) denotes the latter, hence the definition of the return has to be
modified to account for dividends:
S (1) + Div(1) − S (0)
K= .
S (0)
A bond is a special security that pays a certain sum of money, known
in advance, at maturity; this sum is the same in each state. The return on a
bond is not random (recall that we are dealing with a single time period).
Consider a bond paying a unit of home currency at time 1, that is B(1) = 1,
which is purchased for B(0) < 1. Then
1 − B(0)
R=
B(0)
defines the risk-free return. The bond price can be expressed as
1
B(0) = ,
1+R
giving the present value of a unit at time 1.

Exercise 1.3 Compute the expected returns for the stocks described
in Exercise 1.1 and Example 1.2.

Exercise 1.4 Assume that S (0) = 80 and that the ex dividend price
is
with probability 16 ,


 60
S (1) =  with probability 36 ,

80

with probability 26 .


 90

The company will pay out a constant dividend (independent of the fu-
ture stock price). Compute the dividend for which the expected return
on stock would be 20%.

1.2 Variance as a risk measure


The concept of risk in finance is captured in many ways. The basic and
most widely used one is concerned with risk as uncertainty of the unknown
6 Risk and return
future value of some quantity in question (here we are concerned with re-
turn). This uncertainty is understood as the scatter around some reference
point. A natural candidate for the reference value is the mathematical ex-
pectation (though other benchmarks are sometimes considered). The extent
of scatter is conveniently measured by the variance. This notion takes care
of two aspects of risk:
(i) The distances between possible values and the expectation.
(ii) The probabilities of attaining the various possible values.
Definition 1.3
By (the measure of) risk we mean the variance of the return
Var(K) = E(K − µ)2 = E(K 2 ) − µ2 ,

or the standard deviation Var(K).
The variance of the return can be computed from the variance of S (1),
!
S (1) − S (0)
Var(K) = Var
S (0)
1
= Var (S (1) − S (0))
S (0)2
1
= Var (S (1)) .
S (0)2
We use the Greek letter σ for standard deviations of various random
returns
σ = Var(K),
p

qualified by subscripts, as required.

Exercise 1.5 In a market with risk-free return R > 0, we buy a


‘leveraged’ stock S at time 0 with a mixture of cash and a loan at
rate R. To buy the stock for S (0) we use wS (0) of our own cash and
borrow (1 − w)S (0), for some w ∈ (0, 1). Denote the returns at time 1
on the stock and leveraged position by KS and Klev respectively.
1.2 Variance as a risk measure 7
Derive the relation
1
Klev = R + (KS − R) ,
w
and find the relationship between the standard deviations of the stock
and the leveraged position.

Standard deviation alone does not fully capture the risk of an investment.
We illustrate this with a simple example.

Example 1.4
Consider three assets with today’s prices S i (0) = 100 for i = 1, 2, 3 and
time 1 prices with the following distributions:
with probability 12 ,
(
120
S 1 (1) =
90 with probability 12 ,
with probability 12 ,
(
140
S 2 (1) =
90 with probability 12 ,
with probability 12 ,
(
130
S 3 (1) =
100 with probability 12 .
We can see that
σ1 = Var(K1 ) = 0.15,
p

σ2 = Var(K2 ) = 0.25,
p

σ3 = Var(K2 ) = 0.15.
p

Here σ2 > σ1 and σ3 = σ1 , but both the second and third assets are
preferable to the first, since at time 1 they bring in more cash. We shall
return to this example in the next section.

When considering the risk of an investment we should take into account


both the expectation and and the standard deviation of the return. Given the
choice between two securities a rational investor will, if possible, choose
that with the higher expected return and lower standard deviation, that is,
lower risk. This motivates the following definition.
8 Risk and return
µ

Figure 1.1 Efficient subset.

Definition 1.5
We say that a security with expected return µ1 and standard deviation σ1
dominates another security with expected return µ2 and standard devia-
tion σ2 whenever
µ1 ≥ µ2 and σ1 ≤ σ2 .

The meaning of the word ‘dominates’ is that we assume the investors to


be risk averse. One can imagine an investor whose personal goal is just the
excitement of playing the market. This person will not pay any attention to
return or may prefer higher risk. However, it is not our intention to cover
such individuals by our theory.
The playground for portfolio theory will be the (σ, µ)-plane, in fact the
right half-plane since the standard deviation is non-negative. Each security
is represented by a dot on this plane. This means that we are making a
simplification by assuming that the expectation and variance are all that
matters when investment decisions are made.
We assume that the dominating securities are preferred, which geomet-
rically (geographically) means that for any two securities, the one lying
further north-west in the (σ, µ)-plane is preferable. This ordering does not
allow us to compare all pairs: in Figure 1.1 we see for instance that the
pairs (σ1 , µ1 ) and (σ3 , µ3 ) are not comparable.
Given a set A of securities in the (σ, µ)-plane, we consider the subset
of all maximal elements with respect to the dominance relation and call
it the efficient subset. If the set A is finite, finding the efficient subsets
reduces to eliminating the dominated securities. Figure 1.1 shows a set of
five securities with efficient subset consisting of just three, numbered 1, 3
and 4.
1.3 Semi-variance 9

Exercise 1.6 Assume that we have three assets. The first has ex-
pected return µ1 = 10% and standard deviation of return equal to
σ1 = 0.25. The second has expected return µ2 = 15% and standard
deviation of return equal to σ2 = 0.3. Assume√ that the future prices of
the third asset will have E(S 3 (1)) = 100, Var(S 3 (1)) = 20. Find the
ranges of prices S 3 (0) so that the following conditions are satisfied:
(i) The third asset dominates the first asset.
(ii) The third asset dominates the second asset.
(iii) No asset is dominated by another asset.

1.3 Semi-variance
Consider the three assets described in Example 1.4. Although σ1 = σ3 ,
the third asset carries no ‘downside risk’, since neither outcome for S 3 (1)
involves a loss for the investor. Similarly, although σ2 > σ1 , the downside
risk for the second asset is the same as that for the first (a 50% chance of
incurring a loss of 10), but the expected return for the second asset is 15%,
making it the more attractive investment even though, as measured by vari-
ance, it is more risky. Since investors regard risk as concerned with failure
(i.e. downside risk), the following modification of variance is sometimes
used. It is called semi-variance and is computed by a formula that takes
into account only the unfavourable outcomes, where the return is below the
expected value
E(min{0, K − µ})2 . (1.1)
The square root of semi-variance is denoted by semi-σ. However, this no-
tion still does not agree fully with the intuition.

Example 1.6
Assume that Ω = {ω1 , ω2 }, P({ω1 }) = P({ω2 }) = 1
2
and
K(ω1 ) = 10%,
K(ω2 ) = 20%.
10 Risk and return

Consider a modification K 0 with


K 0 (ω1 ) = 10%,
K 0 (ω2 ) = 30%.
Then K 0 is definitely better than K but the semi-variance and the variance
for K 0 are both higher than for K.

If variance or semi-variance are to represent risk, it is illogical that a


better version should be regarded as more risky. This defect can be rectified
by replacing the expectation by some other reference point, for instance the
risk-free return with the following modification of (1.1),

E(min{0, K − R})2 ,
which eliminates the above unwanted feature. Instead of the risk-free rate,
one can also consider the return required by the investor.
These versions are not very popular in the financial world, the variance
being the basic measure of risk. In our presentation of portfolio theory
we follow the historical tradition and take variance as the measure of risk.
It is possible to develop a version of the theory for alternative ways of
measuring risk. In most cases, however, such theories do not produce neat
analytic formulae as is the case for the mean and variance.
We will return to a more general discussion of risk measures in the final
chapters of this volume. An analysis of the popular concept of Value at
Risk (VaR), which has been used extensively in the banking and investment
sectors since the 1990s, will lead us to conclude that, despite its ubiquity,
this risk measure has serious shortcomings, especially when dealing with
mixed distributions. We will then examine an alternative which remedies
these defects but still remains mathematically tractable.
2
Portfolios consisting of two assets

2.1 Return
2.2 Attainable set
2.3 Special cases
2.4 Minimum variance portfolio
2.5 Adding a risk-free security
2.6 Indifference curves
2.7 Proofs

We begin our discussion of portfolio risk and expected return with portfo-
lios consisting of just two securities. This has the advantage that the key
concepts of mean-variance portfolio theory can be expressed in simple ge-
ometric terms.
For a given allocation of resources between the two assets comprising
the portfolio, the mean and variance of the return on the entire portfolio
are expressed in terms of the means and variances of, and (crucially) the
covariance between, the returns on the individual assets. This enables us
to examine the set of all feasible weightings of (in other words, allocations
of funds to) the different assets in the portfolio, and to find the unique
weighting with minimum variance. We also find the collection of efficient
portfolios – ones that are not dominated by any other. Finally, adding a
risk-free asset, we find the so-called market portfolio, which is the unique
portfolio providing an optimal combination with the risk-free asset.
We denote the prices of the securities as S 1 (t) and S 2 (t) for t = 0, 1. We
start with a motivating example.

11
12 Portfolios consisting of two assets

Example 2.1
Let Ω = {ω1 , ω2 }, S 1 (0) = 200, S 2 (0) = 300. Assume that
1
P ({ω1 }) = P ({ω1 }) = ,
2
and that
S 1 (1, ω1 ) = 260, S 2 (1, ω1 ) = 270,
S 1 (1, ω2 ) = 180, S 2 (1, ω2 ) = 360.
The expected returns and standard deviations for the two assets are
µ1 = 10%, µ2 = 5%,
σ1 = 20%, σ2 = 15%.
Assume that we spend V(0) = 500, buying a single share of stock S 1 and a
single share of stock S 2 . At time 1 we will have
V(1, ω1 ) = 260 + 270 = 530,
V(1, ω2 ) = 180 + 360 = 540.
The expected return on the investment is 7% and the standard deviation is
just 1%. We can see that by diversifying the investment into two stocks we
have considerably reduced the risk.

2.1 Return
From the above example we see that the risk can be reduced by diversifica-
tion. In this section we discuss how to minimise risk when investing in two
stocks.
Suppose that we buy x1 shares of stock S 1 and x2 shares of stock S 2 .
The initial value of this portfolio is

V(x1 ,x2 ) (0) = x1 S 1 (0) + x2 S 2 (0).

When we design a portfolio, usually its initial value is the starting point of
our considerations and it is given. The decision on the number of shares
in each asset will follow from the decision on the division of our wealth,
which is our primary concern and is expressed by means of the weights
2.1 Return 13
defined by
x1 S 1 (0) x2 S 2 (0)
w1 = , w2 = . (2.1)
V(x1 ,x2 ) (0) V(x1 ,x2 ) (0)
If the initial wealth V(0) and the weights w1 , w2 , w1 +w2 = 1, are given, then
the funds allocated to a particular stock are w1 V(0), w2 V(0), respectively,
and the numbers of shares we buy are
w1 V(0) w2 V(0)
x1 = , x2 = .
S 1 (0) S 2 (0)
At the end of the period the securities prices change, which gives the final
value of the portfolio as a random variable

V(x1 ,x2 ) (1) = x1 S 1 (1) + x2 S 2 (1).

To express the return on a portfolio we employ the weights rather than the
numbers of shares since this is more convenient.
The return on the investment in two assets depends on the method of
allocation of the funds (the weights) and the corresponding returns. The
vector of weights will be denoted by w = (w1 , w2 ), or in matrix notation
" #
w1
w= ,
w2
and the return of the corresponding portfolio by Kw .

Proposition 2.2
The return Kw on a portfolio consisting of two securities is the weighted
average
Kw = w1 K1 + w2 K2 , (2.2)

where w1 and w2 are the weights and K1 and K2 the returns on the two
components.

Proof With the numbers of shares computed as above, we have the fol-
lowing formula for the value of the portfolio

V(x1 ,x2 ) (1) = x1 S 1 (1) + x2 S 2 (1)


w1 V(x1 ,x2 ) (0) w2 V(x1 ,x2 ) (0)
= S 1 (0)(1 + K1 ) + S 2 (0)(1 + K2 )
S 1 (0) S 2 (0)
= V(x1 ,x2 ) (0) (w1 (1 + K1 ) + w2 (1 + K2 ))
= V(x1 ,x2 ) (0)(1 + w1 K1 + w2 K2 ), (since w1 + w2 = 1)
14 Portfolios consisting of two assets
hence
V(x1 ,x2 ) (1) − V(x1 ,x2 ) (0)
Kw = = w1 K1 + w2 K2 .
V(x1 ,x2 ) (0)


In reality, the numbers of shares have to be integers. This, however, puts


a constraint on possible weights since not all percentage splits of our wealth
can be realised. To simplify matters we make the assumption that our stock
position, that is, the number of shares, can be any real number.
When the number of shares of given stock is positive, then we say that
we have a long position in the stock. We shall assume that we can also
hold a negative number of shares of stock. This is known as short-selling.
Short-selling is a mechanism by which we borrow stock at time 0 and sell it
immediately; we then need to buy it back at time 1 to return it to the lender.
This mechanism gives us additional money at time 0 that can be invested
in a different security.

Example 2.3
Consider the stocks S 1 and S 2 from Example 2.1. Suppose that at time 0
we have V(0) = 600. Suppose also that at time 0 we borrow three shares
of stock S 1 , meaning that we choose x1 = −3. We sell the three shares of
stock, which together with V(0) gives us 3 · 200 + 600 = 1200 to invest in
the second asset. We can thus take x2 = 4. Note that
V(x1 ,x2 ) (0) = x1 S 1 (0) + x2 S 2 (0) = 600 = V(0).
At time 1 we have the proceeds from holding four shares of S 2 , but we
need to buy back the three shares of S 1 at its market value. Since
V(x1 ,x2 ) (1) = x1 S 1 (1) + x2 S 2 (1),
we see that
V(x1 ,x2 ) (1, ω1 ) = −3 · 260 + 4 · 270 = 300,
V(x1 ,x2 ) (1, ω2 ) = −3 · 180 + 4 · 360 = 900.
We can compute the weights using (2.1)
−3 · 200 4 · 300
w1 = = −1, w2 = = 2.
600 600
We see that, as expected, w1 + w2 = 1.
2.2 Attainable set 15

Exercise 2.1 Compute the expected return and the standard devia-
tion of the return for the investment from Example 2.3. Explain why
this portfolio is less desirable than investing in any of the two securi-
ties.

When short-selling is allowed, we assume that the weights can be any


real numbers whose sum is one. For example, if at time 0 we take a short
position in stock S 1 , then x1 and hence the weight w1 is negative, and we
need w2 to be larger than 1, so that w1 + w2 = 1.
In real markets short-selling comes with restrictions. To take a short po-
sition a trader usually needs to pay a lending fee or to make a deposit.
Throughout the discussion we make the simplifying assumption that short-
selling is free of such charges. Since not all real markets allow short-
selling, we shall sometimes distinguish special cases where all the weights
are non-negative.

2.2 Attainable set


Finding the risk of a portfolio requires, apart from the risks of the compo-
nents and the weights, some knowledge about their statistical relationship.
Recall from [PF] the notion of covariance of two random variables, X, Y:

Cov(X, Y) = E [(X − E(X))(Y − E(Y)] = E(XY) − E(X)E(Y), (2.3)

with Cov(X, X) = Var(X) = σ2X in particular. Applying the Schwarz in-


equality ([PF, Lemma 3.49]) to X − E(X) and Y − E(Y) we obtain

|Cov(X, Y)| ≤ σX σY . (2.4)

This leads immediately to an inequality, that we leave as an exercise.

Exercise 2.2 Suppose that random variables X, Y have finite vari-


ances. Show that σX+Y ≤ σX + σY .

Let us introduce the following notation for the covariance of the returns
on the stocks S 1 , S 2 :
16 Portfolios consisting of two assets

σi j = Cov(Ki , K j ),

for i, j = 1, 2. In particular,

σ11 = Cov(K1 , K1 ) = Var(K1 ) = σ21 ,


σ22 = Cov(K2 , K2 ) = Var(K2 ) = σ22 .

From (2.3) we see that


σ12 = σ21 .

If the returns are independent, then we have σ12 = 0.


For convenience, the so-called correlation coefficient is also introduced

σi j
ρi j = . (2.5)
σi σ j
For this to make sense we have to assume that the variances of both returns
are non-zero. The variance is zero in one case only, namely when the ran-
dom variable is constant (almost surely). So we assume that the returns on
stocks are genuine, non-constant, random variables, unlike bonds, where
the return is the same in each state (scenario).
By (2.4) the correlation coefficient satisfies

−1 ≤ ρi j ≤ 1.

This makes correlation a good coefficient to measure dependence. If the


correlation coefficient is close to 1 or −1, then there is a strong influence
of one variable on the other. It is more difficult to make such assertions by
looking at covariance alone.

Theorem 2.4
The expected return and the variance of the return on a portfolio are given
by

µw = E(Kw ) = w1 µ1 + w2 µ2 , (2.6)
σ2w = Var (Kw ) = w21 σ21 + w22 σ22 + 2w1 w2 σ12 . (2.7)

Proof Equality (2.6) follows directly from (2.2) and linearity of mathe-
matical expectation:

µw = E(Kw ) = E (w1 K1 + w2 K2 ) = w1 E(K1 ) + w2 E(K2 ).


2.2 Attainable set 17
µ

Figure 2.1 Attainable set.

We wish to compute the standard deviation of the return on a portfolio


of two stocks:
σ2w = E(Kw2 ) − µ2w .
Substituting (2.2) and (2.6), and using (2.3) in the last equality, gives
σ2w = E(w21 K12 + w22 K22 + 2w1 w2 K1 K2 ) − w21 µ21 − w22 µ22 − 2w1 w2 µ1 µ2
= w21 [E(K12 ) − µ21 ] + w22 [E(K22 ) − µ22 ] + 2w1 w2 [E(K1 K2 ) − µ1 µ2 ]
= w21 σ21 + w22 σ22 + 2w1 w2 σ12 ,
which concludes the proof. 

Corollary 2.5
Using (2.5) we can rewrite the formula for the variance of a portfolio as
σ2w = w21 σ21 + w22 σ22 + 2w1 w2 ρ12 σ1 σ2 . (2.8)
Corollary 2.6
Using the following matrix notation
µ1
" # " #
w1
w= , µ= ,
w2 µ2

σ21 σ12
" #
C= ,
σ12 σ22
equations (2.6)–(2.7) can be written as
µw = wT µ, (2.9)
σ2w = w Cw
T
(2.10)
where we denote the transpose of the matrix A by AT .
18 Portfolios consisting of two assets
µ

Figure 2.2 Portfolio lines for various values of ρ12 .

The collection of all portfolios that can be manufactured by means of


two given assets (in other words, the attainable set, also known as the
feasible set) can conveniently be depicted in the (σ, µ)-plane. Assume that
µ1 , µ2 (let µ1 < µ2 for instance). Take the first weight as a parameter,
writing w = w1 . Hence w2 = 1 − w, w = (w, 1 − w) and the expected return
and standard deviation of the portfolio as functions of w have the form
µw = wµ1 + (1 − w)µ2 , (2.11)
σ2w =w2
σ21 + (1 − w) 2
σ22 + 2w(1 − w)ρ12 σ1 σ2 .
The attainable set is therefore a curve parameterised by w. An example of
such set is depicted in Figure 2.1. If short-selling is not allowed we restrict
our attention to the segment corresponding to w ∈ [0, 1]. This is the thicker
part of the curve in Figure 2.1.
The shape of the line depends on the correlation coefficient ρ12 . This is
shown in Figure 2.2. We see that for negative ρ12 we can reduce the risk
of the portfolio, at the same time achieving an expected return between the
expected returns of the two risky assets.
Suppose that the position of the two basis securities is such as in Figure
2.3, namely one dominates the other. The portfolios manufactured using the
securities may give the investor extra choice. For instance we may obtain
the portfolios whose risk is lower than the risk of any of the individual
assets, or portfolios with expected return higher than any of components.
This shows that rejecting the dominated security would be a bad decision.
2.2 Attainable set 19
µ

Figure 2.3 Portfolio line with one asset dominating the other.

Exercise 2.3 Assume that µ1 = 10%, µ2 = 20%, σ1 = 0.1, σ2 = 0.3


and ρ12 = 0.7. Find a portfolio for which σw < σ1 . Is it possible to
construct a portfolio with expected return equal to 30%?

From (2.11) we see that µw is affine, and σ2w is a quadratic function with
respect to w. Since a graph of the root of a quadratic function is a hyperbola,
one can guess that the attainable set consisting of all points (µw , σw ) should
be a hyperbola.
Theorem 2.7
If µ1 , µ2 and ρ12 ∈ (−1, 1), then the attainable set is a hyperbola with its
centre on the vertical axis.
Proof See page 31. 

Exercise 2.4 What is the shape of the attainable set when µ1 = µ2 ?

We shall return to the above discussion when working with n assets later
on. It may come as a surprise that from the point of view of technical
difficulties, the general case will be as simple as the particular situation
just worked out, where only two assets are involved. It will also turn out
that the case of many assets reduces to the case of just two and we will be
able to draw valuable conclusions, that remain valid in general case, from
the discussion of the present chapter.
In practice we can reject some of the portfolios drawing on the basic
preference property, namely, given two portfolios with the same risk, the
20 Portfolios consisting of two assets

Figure 2.4 Efficient frontier.

one with higher expected return is preferable. So we may discard the lower
part of the curve restricting our attention to the upper, called the efficient set
or frontier, as shown in Figure 2.4. More precisely, a portfolio is called ef-
ficient if there is no other portfolio, except itself, that dominates it. The set
of efficient portfolios among all attainable portfolios is called the efficient
frontier.

2.3 Special cases


Our first special case is when ρ12 = −1. From (2.8),
σ2w = w21 σ21 + w22 σ22 − 2w1 w2 σ1 σ2
= (w1 σ1 − w2 σ2 )2 ,
hence
σw = |w1 σ1 − w2 σ2 | .
Since σw is non-negative the smallest value it could take is σw = 0.
Taking w1 = w and w2 = 1 − w gives
σw = |wσ1 − (1 − w)σ2 | , (2.12)
and we can solve for σw = 0, obtaining
σ2 σ1
w= , 1−w= . (2.13)
σ1 + σ2 σ1 + σ2
Since σ1 , σ2 ≥ 0, we can see that w ∈ [0, 1], hence we can minimise our
risk to zero without short-selling.
From (2.12) and (2.11) one can show that the attainable set consists of
two half lines, emanating from the vertical axis (see Figure 2.5).
2.3 Special cases 21
µ

Figure 2.5 Attainable set for ρ12 = ±1.

Exercise 2.5 Assuming that ρ12 = −1, derive the formulae for the
half lines that form the attainable set.

Our second case is ρ12 = 1. Then


σ2w = w21 σ21 + w22 σ22 + 2w1 w2 σ1 σ2
= (w1 σ1 + w2 σ2 )2 ,
and
σw = |w1 σ1 + w2 σ2 | .
Similarly to the previous case, we obtain σw = 0 for
−σ2 σ1
w1 = , w2 = . (2.14)
σ1 − σ2 σ1 − σ2
This requires that σ1 , σ2 , and we exclude this trivial case. Since σ1 , σ2 ≥
0, either w or 1 − w has to be negative, hence we can not minimise risk to
zero without short-selling. Without short-selling the smallest risk is either
at w = 0 or at w = 1.

Exercise 2.6 Assuming that ρ12 = 1 and σ1 , σ2 , derive the formu-


lae for the half lines that form the feasible set.

Exercise 2.7 Investigate what happens when ρ12 = 1 and σ1 = σ2 .


22 Portfolios consisting of two assets
µ

Figure 2.6 Portfolio line for one risky and one risk-free security.

Exercise 2.8 Investigate what happens when illegal data with |ρ12 | >
1 are considered.

Finally, consider a particular case where one of the assets is risk-free,


σ1 = 0, say. The return on this asset is sure, µ1 = R and a reasonable
assumption is that R < µ2 since otherwise risk-averse investors would never
invest in the risky asset, its price should fall and so the expected return
should grow above the risk-free level. (The preferences of investors will
be discussed in more detail later.) The return and risk for portfolios take a
simplified form

µw = w1 R + w2 µ2 ,
σ2w = w22 σ22

giving

σw = |w2 | σ2 ,

and so the set in the (σ, µ)-plane is as shown in Figure 2.6 (with redundant
lower part according to the preference relation).
The segment between the risk-free asset and the asset characterised by
(σ2 , µ2 ) corresponds to positive weights. The line above (σ2 , µ2 ) requires
taking a short position in the risk-free asset, in other words, borrowing at
the risk-free rate (which we assume here to be possible). The rejected lower
segment shows portfolios with a short position in the risky asset.
2.4 Minimum variance portfolio 23

2.4 Minimum variance portfolio


We return to the case of two risky securities, S 1 and S 2 . We wish to min-
imise the variance σ2w – or, equivalently, the standard deviation σw . We start
with a theorem where the problem is solved when there are no restrictions
on short-selling.
Theorem 2.8
If short-selling is allowed, then the portfolio with minimum variance has
the weights wmin = (w1 , w2 ) with
a b
w1 = , w2 = ,
a+b a+b
where
a = σ22 − ρ12 σ1 σ2 ,
b = σ21 − ρ12 σ1 σ2 ,
unless both ρ12 = 1 and σ1 = σ2 .
Proof When ρ12 = −1, then from (2.13)
σ2 σ2 (σ1 + σ2 ) a
w1 = = = .
σ1 + σ2 (σ1 + σ2 ) 2 a+b
Similarly, for ρ12 = 1, using (2.14)
−σ2 −σ2 (σ1 − σ2 ) a
w1 = = = .
σ1 − σ2 (σ1 − σ2 ) 2 a + b
When ρ12 ∈ (−1, 1),
σ2w = w2 σ21 + (1 − w)2 σ22 + 2w(1 − w)ρ12 σ1 σ2
is a quadratic function. We compute the derivative of σ2w with respect to w
and equate it to 0:
2wσ21 − 2 (1 − w) σ22 + 2(1 − w)ρ12 σ1 σ2 − 2wρ12 σ1 σ2 = 0.
Solving for w gives the above result. The second derivative is positive,
2σ21 + 2σ22 − 4ρ12 σ1 σ2 > 2σ21 + 2σ22 − 4σ1 σ2 = 2 (σ1 − σ2 )2 ≥ 0,
which shows that we have a global minimum. 

Exercise 2.9 For which ρ12 will wmin require short-selling?


24 Portfolios consisting of two assets

Figure 2.7 Smallest variance with short-selling restrictions.

In Corollary 2.6 the return and variance of a given portfolio were stated
in terms of the covariance matrix
σ1 σ12
" 2 #
C=
σ12 σ22
for the two assets. We now do the same for the weights of the minimum
variance portfolio.
Since S 1 and S 2 are risky assets, the matrix C is invertible. By Cramer’s
rule
σ22 −σ12
" #
1
C =−1
.
det C −σ12 σ21
So we have, writing 1 = (1, 1),
σ2 − σ12
" 2 # " #
1 1 a
C 1=
−1
= ,
det C σ21 − σ12 det C b
1 1
1TC −1 1= (σ2 + σ22 − 2σ12 ) = (a + b),
det C 1 det C
since σ12 = ρ12 σ1 σ2 . We have proved the following:

Corollary 2.9
The vector wmin = (w1 , w2 ) of weights of the minimum variance portfolio
found in Theorem 2.8 has the form
C −1 1
wmin = .
1TC −1 1
We now discuss what happens when short-selling is not allowed. We
need to find the minimum of

σ2w = w2 σ21 + (1 − w)2 σ22 + 2w(1 − w)ρ12 σ1 σ2


2.5 Adding a risk-free security 25
µ

Figure 2.8 Feasible set after adding a risk-free security.

for restricted values of the weight 0 ≤ w ≤ 1. Let w1 be the coefficient from


Theorem 2.8. The claim is illustrated in Figure 2.7, where the bold parts
correspond to portfolios with no short-selling. We can see that the smallest
variance is attained at wmin = (w, 1 − w) with
0 if w1 < 0,



w=

w1 if w1 ∈ [0, 1],

 1 if w > 1.


1

Hence, if the global minimum is outside [0, 1], en embargo on short-selling


means that an investor wishing to minimise his/her risk should put all
his/her funds into one of the two assets.

2.5 Adding a risk-free security


All portfolios built of the risk-free asset (with rate of return R) and any
other asset are represented by a straight half-line starting from (0, R) and
passing though the corresponding points on the (σ, µ)-plane (see Figure
2.6). The new feasible region is thus obtained by taking any point on the
attainable set and linking it with the risk-free asset, as shown in Figure
2.8. To find the new efficient frontier we seek a line with the highest slope
according to the preference relation. Note that it is reasonable to make the
following restriction: the risk-free return is smaller than the expected return
of the risk-minimising portfolio. Under this assumption there is a unique
portfolio on the efficient frontier, called the market portfolio, such that the
line with the highest slope passes through it (see Figure 2.9). This optimal
line, called the capital market line, is tangent to the efficient frontier (as
follows from the elementary geometric properties of hyperbolas). Denoting
26 Portfolios consisting of two assets
CML

MP
MVP

Figure 2.9 The minimum variance portfolio (MVP), the market portfolio
(MP), and the capital market line (CML).

the expected return of the market portfolio by µm and its risk by σm , the
capital market line is given by
µm − R
µ=R+ σ. (2.15)
σm
Theorem 2.10
The weights of the market portfolio are m = (w, 1 − w), with
c d
w= , 1−w= , (2.16)
c+d c+d
where
c = σ22 (µ1 − R) − σ12 (µ2 − R),
d = σ21 (µ2 − R) − σ12 (µ1 − R).
Proof See page 33. 
Corollary 2.11
The formulae (2.16) for the weights of the market portfolio can be written
in matrix notation as
C −1 (µ − R1)
m = T −1 , (2.17)
1 C (µ − R1)
where C is the covariance matrix, µ = (µ1 , µ2 ), and 1 = (1, 1).

Exercise 2.10 Verify that (2.16) and (2.17) are equivalent.

The following argument illustrates the possible practical relevance of the


market portfolio.
2.5 Adding a risk-free security 27
Suppose that the market consists of two securities and suppose that the
investors make their decisions on the basis of the expected returns and the
covariance matrix, assuming in addition that they all use the same numer-
ical values (returns, variances and covariance for the assets). If they all
behave rationally, they perform the above computations and all arrive at
the same market portfolio. They may choose different portfolios on the
capital market line, but they all invest in the two given components in the
same proportions. We conclude that, for each asset, its weight in the mar-
ket portfolio represents its value as a proportion of the total value of the
market.
To see this consider an example. Asset A is represented by 1000 shares
at 20 dollars each, asset B by 500 shares at 40 dollars each, so each asset
represents 50% of the market. If the investors have these assets in any other
proportion, this leads to a contradiction with the fact that they all should
have the same portfolio. Should any have above 50% of asset A, say, this
would leave some other investors unsatisfied, since they wish to get more
A than is available, and to sell some unwanted B. This would result in ex-
cess supply of B and excess demand of A, which would alter the prices,
the expected returns and consequently the weights on the market portfo-
lio. For this argument to be valid we have to assume that the market is in
equilibrium.

Example 2.12
Assume that the covariance matrix C, the vector of expected returns µ, and
the risk-free return R are given. Assume also that an investor wishes to
spend V and that the aim is to achieve an expected return equal to a given
rate m. The question is how much he should spend on the risky assets, and
how much he should invest risk-free.
First we compute m using (2.16). We can then compute the expected
return of the market portfolio using (2.9)
µm = mT µ.
Optimal investments lie on the capital market line. The investor needs to
hold a combination of the market portfolio and the risk-free security. We
assume that he spends λV on the market portfolio and invests (1 − λ) V
risk-free. The desired λ can be computed from the expected return of the
position
λµm + (1 − λ) R = m,
28 Portfolios consisting of two assets

giving
m−R
λ= .
µm − R
Since the investor spends λV on the market portfolio, the vector
!
v1
= λVm,
v2
gives us the amount v1 invested in the first asset, and v2 invested in the
second asset. As mentioned above, (1 − λ) V is invested risk-free.

Exercise 2.11 Perform an analogous argument to the one in Exam-


ple 2.12, for an investor who wishes to have the investment risk equal
to a given σ (instead of requiring that the expected return is m).

2.6 Indifference curves


The dominance relation, where we prefer portfolios lying to the left upper
side of the (σ, µ)-plane, does not help us choose between two assets where
one has higher expected return and higher risk, and the other is less risky
but with lower return. It seems impossible to extend the relation to solve
this decision problem so that this extension would be accepted by all in-
vestors. The relation is based on risk aversion, but the investors who, as
assumed, share this attitude, may differ in the intensity of their aversion.
An investor who is sensitive to risk may require much higher returns as a
compensation for increased exposure. Another investor may be cornered,
forced to accept risk to earn the return needed to fulfil the requirements
created by his circumstances, or may be just less sensitive to risk. It is in-
evitable that we have to allow for the modelling of individual preferences.
Let us fix our attention on one particular investor, and fix one particular
asset (or portfolio of assets). We assume that this investor can answer the
following question: which assets are equally as attractive as the fixed one?
The answer provides us with a certain set of assets. Since the preference
relation is valid, two assets with the same expected returns and different
2.6 Indifference curves 29

Figure 2.10 An indifference curve for (σ1 , µ1 ).

risk will never be equally attractive; nor will be two assets with the same
risk but different expected returns. Thus the intersection of this set by any
line parallel to any of the axes can contain at most one element. So it is a
graph of an increasing function. We assume in addition that this function is
convex for each investor – in other words, to retain his peace of mind, the
investor demands that a unit increase of risk be offset by more than one unit
increase in return, as shown in Figure 2.10 – and we call it an indifference
curve.
We assume that indifference curves are level sets of a function
u : R2 → R.
We assume that a curve {u = c2 } lies above {u = c1 } for c1 < c2 . In other
words, the higher the value of u, the higher the investor’s satisfaction with
the investment. Given a set of attainable portfolios, an investor chooses the
one placed on the best indifference curve. It is geometrically obvious as a
result of convexity of the curves that the optimal portfolio is at the tangency
point with the capital market line, for some indifference curve, as shown in
Figure 2.11(a).
For another investor, who is less risk averse, that is, who has less steep
indifference curves, the optimal portfolio may be different, as in Figure
2.11(b). It lies further to the right, which agrees with our intuition regarding
the risk preferences of this investor.
30 Portfolios consisting of two assets

Figure 2.11 Indifference curves and optimal investment for an investor with
high risk aversion (a), and lower risk aversion (b).

Example 2.13
Assume that the covariance matrix C, the vector of expected returns µ, and
the risk-free return R are given, and that an investor’s indifference curves
are the level sets of the function
a
u(σ, µ) = µ − σ2 . (2.18)
2
We show how the investor should spend V to maximise u. The indifference
curves are the level sets u(σ, µ) = c, so that we obtain µ = c + a2 σ2 , which
is convex and has slope aσ.
Using (2.17), (2.9) and (2.10) we can find the market portfolio m, its
expected return µm and variance σ2m . Since the slope aσ of the indifference
curve needs to match the slope of the capital market line, the tangency point
can be found by solving the system of two linear equations
µm − R
µ=R+ σ,
σm
µm − R
aσ = .
σm
This means that
!2
1 µm − R
µ=R+ .
a σm
We can now determine how to divide V amongst the assets using the
same method as in Example 2.12.
2.7 Proofs 31

Exercise 2.12 Consider two risky securities and a risk-free asset


with the following parameters:
µ1 = 10%, σ1 = 0.1, ρ12 = −0.5,
µ2 = 20%, σ2 = 0.3, R = 5%.
Assume that the investors’s indifference curves are given by (2.18)
with a = 5. How should the investor divide V = 3000 amongst the
assets?

We shall return to indifference curves in Chapter 6, where we will dis-


cuss their relation to utility functions.

2.7 Proofs
Theorem 2.7
If µ1 , µ2 and ρ12 ∈ (−1, 1), then the attainable set is a hyperbola with its
centre on the vertical axis.
Proof For a more familiar notation we introduce the letters x, y for the
coordinates so that we have the following description of the attainable set:
y = wµ1 + (1 − w)µ2 , (2.19)
x =w
2 2
σ21 + (1 − w) 2
σ22 + 2w(1 − w)σ12 . (2.20)
The goal of further computations is to convert the above system of equa-
tions to the form
(x − h)2 (y − k)2
− = 1, (2.21)
a2 b2
from which we will be able to read off the properties of the hyperbola (see
Figure 2.12).
Solving (2.19) for w
y − µ2
w=
µ1 − µ2
(note the relevance of the assumption µ1 , µ2 ) and inserting into (2.20),
we get
1
x2 = [(y − µ2 )2 σ21 + (µ1 − y)2 σ22 + 2(y − µ2 )(µ1 − y)σ12 ],
A
32 Portfolios consisting of two assets

(x−h)2 (y−k)2
Figure 2.12 The hyperbola a2
− b2
= 1.

where A = (µ1 − µ2 )2 > 0. Simple computation gives


1
x2 = [By2 − 2Cy + D], (2.22)
A
where
B = σ21 + σ22 − 2σ12 ,
C = σ21 µ2 + σ22 µ1 − σ12 (µ1 + µ2 ),
D = σ21 µ22 + σ22 µ21 − 2σ12 µ1 µ2 .
Observe, that B > 0 if ρ12 < 1, since σ21 +σ22 −2σ12 > σ21 +σ22 −2σ1 σ2 ≥ 0.
We can write
 C D
By2 − 2Cy + D = B y2 − 2y +
B B
C 2 C2 D
" #
= B (y − ) − 2 +
B B B
= B(y − k)2 + c,
 
with k = C
B
and c = B1 BD − C 2 . Substituting into (2.22) gives
1h i
x2 = B(y − k)2 + c ,
A
hence
x2 (y − k)2
c − c = 1. (2.23)
A B

We can see that we have obtained the desired hyperbola equation (2.21),
with h = 0, meaning that the center of the hyperbola lies on the vertical
axis (see Figure 2.12).
2.7 Proofs 33
One loose end to tie up is to show that c , 0, as otherwise we would be
dividing by zero in (2.23). A simple but tedious computation shows that

BD − C 2 = Aσ21 σ22 (1 − ρ212 ).

Since ρ12 ∈ (−1, 1), B > 0 and A > 0,


1  A
c= BD − C 2 = σ21 σ22 (1 − ρ212 ) > 0.
B B


Exercise 2.13 Show that the asymptotes of the hyperbola


(x − h)2 (y − k)2
− =1
a2 b2
are
x−h y−k
± = 0.
a b

Theorem 2.10
The weights of the market portfolio are m = (w, 1 − w), with
c d
w= , 1−w= ,
c+d c+d
where

c = σ22 (µ1 − R) − σ12 (µ2 − R),


d = σ21 (µ2 − R) − σ12 (µ1 − R).

Proof For a portfolio (w, 1 − w), we denote its expected return by µ(w),
and standard deviation by σ(w). Optimisation is based on maximising the
slope coefficient:
µ(w) − R
s(w) = .
σ(w)
To this end it is necessary and sufficient to solve

s0 (w) = 0.

We have
µ0 (w)σ(w) − (µ(w) − R)σ0 (w)
s0 (w) = .
σ2 (w)
34 Portfolios consisting of two assets
Since
p 0 1 1
σ0 (w) = σ2 (w) = (σ2 (w))0 = (σ2 (w))0 ,
2 σ (w)
p
2 2σ(w)
the equation s0 (w) = 0 reduces to
2µ0 (w)σ2 (w) − (µ(w) − R)(σ2 (w))0 = 0,
that is
(µ1 − µ2 )(w2 σ21 + (1 − w)2 σ22 + 2w(1 − w)σ12 )
−(wµ1 + (1 − w)µ2 − R)(wσ21 − (1 − w)σ22 + (1 − 2w)σ12 ) = 0.
This is in fact a linear equation in w since all terms involving w2 cancel
out. Elementary, but tedious computations give
c d
w= , 1−w= ,
c+d c+d
which concludes the proof. 
3
Lagrange multipliers

3.1 Motivating examples


3.2 Constrained extrema
3.3 Proofs

The mean-variance analysis of asset portfolios carried out in the previous


chapter was greatly simplified by considering portfolios of only two as-
sets. This meant that the portfolio weights involved only a single variable,
making basic calculus techniques available for finding the portfolio of min-
imum variance. For portfolios of more than two assets this no longer ap-
plies. We will need a method that will allows us to find minima of functions
of many variables under constraints. (In portfolio theory the first natural
constraint is that all weights need to add up to one.)
In this chapter we digress a little from portfolio theory. We present a
general method that locates potential extreme points of functions under
constraints, and, in a special case that suffices for our intended applica-
tions, enables us to classify them as maxima or minima. It turns out that
the minimisation problem provides a system of equations whose solution
provides a candidate for the minimum. The ‘method of Lagrange multipli-
ers’ is a standard tool in advanced calculus, but the proofs we provide are
frequently only sketched in standard textbooks.

3.1 Motivating examples

The aim of this section is to provide the underlying geometric intuition for
the method.

35
36 Lagrange multipliers
We consider two functions
f : R2 → R,
g : R2 → R,
and show how to find solutions of the following problem:
Find
min f (x, y),
(3.1)
under the constraint: g(x, y) = 0.
We start with a simple example.

Example 3.1
Consider
f (x, y) = x2 + y2 ,
1 1 1
g(x, y) = x + y − .
2 2 2
Basic arguments (say, by substituting y = 1 − x into f (x, y) and computing
a derivative with respect to x) lead to the solution
1
x ∗ = y∗ = . (3.2)
2
We now present an alternative approach. We first observe that one of the
level curves {(x, y) : f (x, y) = r2 } (which are circles of radius r, as shown
in Figure 3.1) is tangent at the point (x∗ , y∗ ) to the line {(x, y) : g(x, y) = 0}.
Since the gradients
 ∂f  " #
 ∂x (x, y)  2x
∇ f (x, y) =  ∂ f
  = ,
∂y
(x, y)  2y
 ∂g  " 1
#
∂x
(x, y)
∇g(x, y) =   = ,
  2
∂g 1
∂y
(x, y) 2

are orthogonal to the level curves, the vectors ∇ f (x∗ , y∗ ) and ∇g(x∗ , y∗ )
should be collinear. This means that there should exist a number λ ∈ R
such that we have the following system of two equations:
∇ f (x, y) − λ∇g(x, y) = 0. (3.3)
The idea is to solve (3.3) instead of (3.1); in other words, we solve a
system of equations, instead of solving a minimisation problem.
3.1 Motivating examples 37

Figure 3.1 The level curves { f = r2 } for r = 1 (outer circle), r = √12 (mid-
dle circle) and r = 12 (inner circle), together with the gradients ∇ f and ∇g,
attached at (x∗ , y∗ ).

Together with the constraint g(x, y) = 0, (3.3) leads to the linear system
1
2x − λ = 0,
2
1
2y − λ = 0, (3.4)
2
1 1 1
x + y − = 0,
2 2 2
with the unique solution
1
x ∗ = y∗ =
, λ∗ = 2.
2
The points x∗ and y∗ found by this method are the same as those found in
(3.2).
In Figure 3.2 we see that in this example the point (x∗ , y∗ ) is the only
point on {g(x, y) = 0}, at which ∇ f and ∇g are collinear, hence the only
point where (3.3) can hold.

Exercise 3.1 Solve (3.4) using Cramer’s rule.


38 Lagrange multipliers

Figure 3.2 Gradients ∇ f (longer arrows) and ∇g (shorter arrows), attached


at (x∗ , y∗ ) and at four other points on g(x, y) = 0.

Example 3.1 suggests that instead of solving the problem (3.1) we can
look for a solution of the system of equations
∇ f (x, y) − λ∇g(x, y) = 0, (3.5)
g(x, y) = 0.
Solving a system of equations can turn out to be easier than minimising a
function under constraints.
We now test how this works on an example from portfolio theory that
was discussed in Chapter 2.

Example 3.2
We consider the problem of finding the minimum variance portfolio when
given two risky assets, as in Chapter 2. To use the same notation as in (3.5),
we write x and y instead of w1 and w2 , respectively, and take
f (x, y) = x2 σ21 + y2 σ22 + 2xyσ12 ,
g(x, y) = x + y − 1.
The constraint g(x, y) = 0 ensures that x and y add up to one, making the
pair (x, y) a well defined portfolio. The function f gives its variance.
3.1 Motivating examples 39

The gradients are


2σ21 x + 2σ12 y
" #
∇ f (x, y) = ,
2σ12 x + 2σ22 y
" #
1
∇g(x, y) = .
1
Equation (3.5) leads to
2σ21 x + 2σ12 y − λ = 0,
2σ12 x + 2σ22 y − λ = 0, (3.6)
x + y − 1 = 0.
This system can be solved, (using Cramer’s rule, for example) to obtain
σ22 − σ12
x∗ = ,
σ21 + σ22 − 2σ12
σ21 − σ12
y∗ = , (3.7)
σ21 + σ22 − 2σ12
σ21 σ22 − σ212
λ∗ = 2 .
σ21 + σ22 − 2σ12
We see that x∗ and y∗ are identical to the weights w1 and w2 obtained in
Theorem 2.8.
Figure 3.3 contains a numerically obtained plot of the point (x∗ , y∗ ), the
level curve { f (x, y) = σ2wmin } and the line {g(x, y) = 0}. We see that, as
expected, we have a point of tangency at (x∗ , y∗ ), which is the minimum
variance portfolio.

Exercise 3.2 Verify that (3.7) is a solution of (3.6).

Exercise 3.3 Recreate the plot from Figure 3.3.


40 Lagrange multipliers

MVP

Figure 3.3 The tangency of { f (x, y) = σ2wmin } and {x + y = 1} at the minimum


variance portfolio (computed for σ1 = 0.1, σ2 = 0.2 and ρ12 = −0.5).

3.2 Constrained extrema

The examples from the previous section have been considered on the plane.
It turns out that a similar approach can be used in higher dimensions, and
that we can consider more complicated constraints.
Our objective in this section is to show how to solve the following gen-
eral constrained minimisation problem:
Find

min f (v) ,
(3.8)
under the constraints: g(v) = 0,

where

f : Rn → R,
g : Rn → Rk .

We will provide necessary and, in the special case of quadratic forms, suf-
ficient conditions for a solution to this problem.
To keep better track of dimensions, we use a bold font whenever we are
dealing with vectors, and the normal font when dealing with numbers. Note
that in stating the problem above we used f for a function taking values in
R and g for a function

g(v) = (g1 (v), . . . , gk (v))

taking values in Rk .
For the reader’s convenience we review some notations from multi-variable
3.2 Constrained extrema 41
calculus. We use the notation g0 (v) to denote the k × n Jacobian matrix
 ∂g1 ∂g ∂g 
 ∂x1 (v) ∂x21 (v) · · · ∂xn1 (v) 
 ∂g2 (v) ∂g2 (v) · · · ∂g2 (v) 
 ∂x ∂x2 ∂xn
g (v) =  1.
0
.. ..  .

.
 ∂gk . . .
 
∂gk ∂gk

∂x1
(v) ∂x2
(v) · · · ∂xn
(v)

We say that g : Rn → Rk is continuously differentiable if all the entries in


its Jacobian matrix are continuous functions.
For a function f : Rn → R, the Jacobian matrix is
h ∂f ∂f ∂f
i
f 0 (v) = ∂x 1
(v) ∂x2
(v) · · · ∂xn
(v) .
At times it will be more convenient to use a vector instead of a 1 × n matrix.
We therefore introduce the notation ∇ f (v) for the gradient
 ∂f 
 ∂x1 (v) 
∇ f (v) =  ..
 .
 
 ∂ f . 
∂xn
(v)
The necessary condition for a continuously differentiable f : Rn → R
to have a minimum at v∗ , under the constraint that g(v∗ ) = 0, for some
continuously differentiable function g : Rn → Rk , can now be stated as
follows.
Theorem 3.3
If v∗ is a solution of the problem (3.8), and g0 (v∗ ) is a matrix of rank k, then
there exists a sequence of numbers λ1 , . . . , λk ∈ R such that
∇ f (v∗ ) − (λ1 ∇g1 (v∗ ) + · · · + λk ∇gk (v∗ )) = 0. (3.9)
Proof Following a brief review of standard auxiliary results, the proof is
given on page 45. 
The λ1 , . . . , λk from Theorem 3.3 are referred to as Lagrange multipli-
ers, and the function
L(v) = ∇ f (v∗ ) − (λ1 ∇g1 (v∗ ) + · · · + λk ∇gk (v∗ ))
is the Lagrangian of the constrained optimisation problem (3.8).
We emphasise that Theorem 3.3 only provides necessary conditions for
a minimum of f . Even if (3.9) holds for some v∗ , it does not necessarily
imply that v∗ is a minimum. This is similar in spirit to searching for a local
minimum of a function f : R → R, where we first find points x∗ satisfying
42 Lagrange multipliers
f 0 (x∗ ) = 0, but to confirm f has a minimum at such a point, additional
conditions need to be checked. Similarly, Theorem 3.3 is a handy tool for
finding candidates for a solution of problem (3.8). To prove that such a
candidate is indeed a solution one usually needs additional information.

Exercise 3.4 Show that for


f (x, y, z) = z,
g(x, y, z) = x2 − y2 + z2 − 1,
the method of Lagrange multipliers does not establish a solution of
(3.8).

Exercise 3.5 Show that for


f (x, y, z) = x + y + z,
g(x, y, z) = x2 + y2 + z2 − 1,
the system of equations (3.9) has two solutions, of which only one is
the solution of (3.8).

An analogous result to Theorem 3.3 can be formulated for a problem in


which we seek a maximum instead of a minimum. The method also works
for local minima and maxima. The resulting necessary condition in these
cases remains the same as (3.9).

Exercise 3.6 Find the maximal volume of a rectangular box, whose


edges are parallel to the axes, that fits entirely inside the ellipsoid
x2 y2 z2
+ + = 1.
a2 b2 c2

In special cases the necessary condition (3.9) turns out to be sufficient


for v∗ to be a solution of the problem. Before stating this result we need to
review some further concepts.
3.2 Constrained extrema 43
For a function f : Rn → R, we call the n × n matrix

∂2 f ∂2 f ∂2 f
 
 ∂x1 ∂x1
(v) ∂x1 ∂x2
(v) ··· ∂x1 ∂xn
(v) 
∂2 f ∂2 f ∂2 f
 

∂x2 ∂x1
(v) ∂x2 ∂x2
(v) ··· ∂x2 ∂xn
(v) 
H( f, v) =  .. .. ..

. . .
 

∂2 f ∂2 f ∂2 f
 
∂xn ∂x1
(v) ∂xn ∂x2
(v) ··· ∂xn ∂xn
(v)

the Hessian matrix of f at v. A function is said to be twice continuously


differentiable if all the entries in its Hessian matrix are continuous func-
tions with respect to v.

Theorem 3.4
Assume that f : Rn → R is twice continuously differentiable, and that for
any v ∈ Rn the Hessian H( f, v) is a positive semidefinite matrix, meaning
that

wT H( f, v)w ≥ 0, (3.10)

for any w ∈ Rn . Assume also that

g(v) = Av − c,

where A is a k × n matrix and c ∈ Rk .


If we can find a sequence of numbers λ1 , . . . , λk ∈ R and a point v∗ ∈ Rn
such that (3.9) is satisfied, then v∗ is a solution of the problem (3.8).

Proof See page 46. 

Exercise 3.7 Show that if the inequality in (3.10) is reversed, then


condition (3.9) implies that v∗ is a solution of the following con-
strained maximisation problem:
max f (v) ,
under the constraints: g(v) = 0.
44 Lagrange multipliers

3.3 Proofs
Our proof of Theorem 3.3 depends on the implicit function theorem, which
is a classical result in analysis. We state this theorem without proof,1 after
introducing some notation.
For
g = (g1 , . . . , gk ) : Rl × Rm → Rk
∂g ∂g
and (x, y) ∈ Rl × Rm , x = (x1 , . . . xl ) and y = (y1 , . . . ym ) we write ∂x
and ∂y
for the k × l (resp. k × m) matrices
 ∂g1 ∂g ∂g 
 ∂x1 (x, y) ∂x21 (x, y) · · · ∂x1l (x, y) 
∂g .. .. ..
(x, y) =   ,
 
∂x  ∂gk . . .
∂gk ∂gk

∂x1
(x, y) ∂x2 (x, y) · · · ∂xl (x, y)
 ∂g1 ∂g1 ∂g1 
 ∂y1
(x, y) ∂y2
(x, y) ··· ∂ym
(x, y) 
∂g 
.. .. ..

(x, y) =  . . .
 .
∂y 
∂gk ∂gk ∂gk

(x, y) (x, y) · · · (x, y)

∂y1 ∂y2 ∂ym

Theorem 3.5 (Implicit function theorem)


Consider n > k and a continuously differentiable function
g = (g1 , . . . , gk ) : Rn−k × Rk → Rk .
Assume that at a point (x∗ , y∗ ) ∈ Rn−k × Rk we have
g(x∗ , y∗ ) = 0,
and that the matrix ∂g
∂y
(x∗ , y∗ ) is invertible. Then there exists a neighbour-
hood U×V ⊂ Rn−k ×Rk of (x∗ , y∗ ) and a continuously differentiable function
h : U → V,
such that
g(x, h(x)) = 0 for all x ∈ U.
Moreover, for any v ∈ U × V, if g(v) = 0 then v = (x, h(x)) for some x ∈ U.
Corollary 3.6
For the function h from Theorem 3.5
!−1
∂g ∂g
h (x) = −
0
(x, h(x)) (x, h(x)) .
∂y ∂x
1
For proofs of the standard multi-variable calculus results used below, see (e.g.) T. M.
Apostol, Mathematical Analysis, 2nd edition, Addison-Wesley1974.
3.3 Proofs 45
Proof Since g(x, h(x)) = 0, by computing the derivative with respect to x
we obtain from the chain rule that
∂g ∂g
(x, h(x)) + (x, h(x))h0 (x) = 0.
∂x ∂y
In this identity, ∂g
∂x
(x, h(x)), h0 (x) and 0 denote k × (n − k) matrices. Since
∂g
∂y
(x, h(x)) is a k × k matrix, it can be inverted. The claim now follows by
rearranging so that h0 (x) is on the left-hand side. 
We are now ready to prove Theorem 3.3.
Theorem 3.3
If v∗ is a solution of the problem (3.8), and g0 (v∗ ) is a matrix of rank k, then
there exists a sequence of numbers λ1 , . . . , λk ∈ R such that
∇ f (v∗ ) − (λ1 ∇g1 (v∗ ) + · · · + λk ∇gk (v∗ )) = 0. (3.9)
Proof Since g0 (v∗ ) is of rank k, there exists a k-dimensional vector y such
that ∂g
∂y
(v∗ ) is invertible. We can always renumber the coordinates so that
v = (x, y) with x ∈ Rn−k and y ∈ Rk .
By the implicit function theorem, we know that there exists a function h
such that
g(x, h(x)) = 0.
Since v∗ = (x∗ , y∗ ) is a solution of problem (3.8), x∗ is a minimum of
f (x, h(x)), meaning that the derivative of f (x, h(x)) with respect to x is
zero at x∗ . Applying Corollary 3.6, this gives
∂f ∗ ∂f ∗ 0 ∗
0= (v ) + (v )h (x )
∂x ∂y
!−1
∂f ∗ ∂ f ∗ ∂g ∗ ∂g ∗
= (v ) − (v ) (v ) (v ) . (3.11)
∂x ∂y ∂y ∂x
We define a 1 × k matrix Λ by
!−1
h i ∂ f ∗ ∂g ∗
Λ= λ1 λ2 ··· λk = (v ) (v ) .
∂y ∂y
From (3.11) it follows that
∂f ∗ ∂g
(v ) = Λ (v∗ ) . (3.12)
∂x ∂x
From the definition of Λ,
∂f ∗ ∂g
(v ) = Λ (v∗ ) . (3.13)
∂y ∂y
46 Lagrange multipliers
Conditions (3.12) and (3.13) combined give (3.9). 
The proof of Theorem 3.4 is based on a particular case of Taylor’s theo-
rem, which we state (without proof) in the following form:
Theorem 3.7 (Taylor formula)
Suppose that f : Rn → R is a twice continuously differentiable function.
Then for any v, w ∈ Rn there exists a point ξ contained in the line segment
joining v and v + w,
ξ ∈ {v+αw : α ∈ [0, 1]},
such that
1
f (v + w) = f (v) + ∇ f (v) · w + wT H( f, ξ)w,
2
where the dot stands for the scalar product.
We are now ready to prove Theorem 3.4.
Theorem 3.4
Assume that f : Rn → R is twice differentiable, and that for any v ∈ Rn the
Hessian H( f, v) is a positive semidefinite matrix, meaning that
wT H( f, v)w ≥ 0, (3.10)
for any w ∈ Rn . Assume also that
g(v) = Av − c,
where A is a k × n matrix and c ∈ Rk .
If we can find a sequence of numbers λ1 , . . . , λk ∈ R and a point v∗ ∈ Rn
such that (3.9) is satisfied, then v∗ is a solution of the problem (3.8).
Proof Let us take any v satisfying g(v) = 0. We need to show that
f (v) ≥ f (v∗ ).
Since g(v) = Av − c, using the notation λ = (λ1 , . . . , λk ) we can write
λ1 ∇g1 (v∗ ) + · · · + λk ∇gk (v∗ ) = AT λ. (3.14)
Let w = v − v∗ . Since g(v) = 0 and g(v∗ ) = 0, we use the linearity of A to
obtain
0 = g(v) = g(v∗ + w) = Av∗ + Aw − c = g(v∗ ) + Aw = Aw. (3.15)
By the Taylor formula recalled in Theorem 3.7,
1
f (v∗ + w) = f (v∗ ) + ∇ f (v∗ ) · w + wT H( f, ξ)w, (3.16)
2
3.3 Proofs 47
for some point ξ on the line segment in Rn between v∗ and v∗ + w.
We can now compute
f (v) = f (v∗ + w)
= f (v∗ ) + ∇ f (v∗ ) · w + 12 wT H( f, ξ)w (from (3.16))
= f (v∗ ) + AT λ · w + 12 wT H( f, ξ)w (from (3.9) and (3.14))
 T
= f (v∗ ) + AT λ w + 12 wT H( f, ξ)w
= f (v∗ ) + λT Aw + 12 wT H( f, ξ)w
= f (v∗ ) + 12 wT H( f, ξ)w (from (3.15))
≥ f (v∗ ). (from (3.10))
We have proved that v∗ is a (non-strict) global minimum point, as required.

4
Portfolios of multiple assets

4.1 Risk and return


4.2 Three risky securities
4.3 Minimum variance portfolio
4.4 Minimum variance line
4.5 Market portfolio

Having developed the required mathematical tools, the tasks of finding the
minimum variance portfolio, minimum variance line and market portfo-
lio for portfolios of n risky assets can be cast as constrained minimisa-
tion problems whose solutions are provided by applying the method of La-
grange multipliers. Using simple linear algebra, the formulae for the min-
imum variance and market portfolios and the capital market line can be
shown to mirror those found for portfolios of two assets. The derivations
of these formulae will be preceded by an examination of the portfolios of
three assets in order to provide geometric intuition.

4.1 Risk and return


A portfolio constructed from n different securities can be described by
means of the vector of weights
w = (w1 , . . . , wn ),
w j = 1. Denoting by 1 the n-dimensional vector
Pn
with the constraint j=1

1 = (1, . . . , 1) ,
the constraint can conveniently be written as
wT 1 = 1. (4.1)

48
4.1 Risk and return 49
The attainable set is the set of all weight vectors w that satisfy this con-
straint.
If short-selling is not possible, the condition w j ≥ 0 is added to the
constraint, so in that case the attainable set becomes
{w : wT 1 = 1, w j ≥ 0 for all j ≤ n}.
Unless stated otherwise, we shall assume availability of short sales.
Alternatively a portfolio is described by the vector of positions taken in
particular components (numbers of units of assets)
x = (x1 , . . . , xn ).
We have the following relations between the weights, prices and the num-
bers of shares:
x j Sj (0)
wj = , j = 1, . . . , n,
V(0)
where x j is the number of shares of security j in the portfolio, Sj (0) is the
initial price of security j, and V(0) is the total money invested.
Denote the random returns on the securities by K1 , . . . , Kn , and the vec-
tor of expected returns by
µ = (µ1 , . . . , µn ),
with
µ j = E(K j ), for j = 1, . . . , n.
The covariances between returns will be denoted by σ jk = Cov(K j , Kk ), in
particular σ j j = σ2j = Var(K j ). These are the entries of the n×n covariance
matrix
 σ11 σ12 · · · σ1n 
 
 σ21 σ22 · · · σ2n 
C =  . .. .. ..  .

 .. . . . 
σn1 σn2 · · · σnn

Exercise 4.1 Assume that C is invertible. Show that C −1 is symmet-


ric.

We write as before
n
X
Kw = w j K j.
j=1
50 Portfolios of multiple assets
Theorem 2.4 can easily be generalised.
Theorem 4.1
The expected return µw = E(Kw ) and variance σ2w = Var(Kw ) of a portfolio
with weights w are given by
µw = wT µ,
σ2w = wTCw.
Proof The formula for µw follows from the linearity of mathematical ex-
pectation:
 n  n n
X  X X
µw = E(Kw ) = E  w j K j  =
 w j E(K j ) = w j µ j = wT µ.
j=1 j=1 j=1

For σ2w we use the bilinearity of covariance:


σ2w = Var(Kw )
= Cov (Kw , Kw )
 n n

X X 
= Cov  w j K j , wk Kk 
j=1 k=1
n
X
= w j wk σ jk (since Cov(K j , Kk ) = σ jk )
j,k=1

= wTCw.


Exercise 4.2 Show that the covariance matrix is symmetric and pos-
itive semidefinite. (Recall that C is positive semidefinite if for any
x ∈ Rn , xTCx ≥ 0.) Does C have to be invertible?

Exercise 4.3 Show that any invertible covariance matrix C is pos-


itive definite. (We say that C is positive definite if for any x ∈ Rn ,
x , 0, xTCx > 0.)
4.2 Three risky securities 51

w2 w1 w1
w2

Figure 4.1 The plots of µw and σw with respect to w1 , w2 .

Exercise 4.4 Investigate the limit behaviour of the sequence σw as


n → ∞, taking w j = 1n . Formulate sufficient conditions for σw to be
convergent.

Proposition 4.2
For any two portfolios

wA = wA,1 , . . . , wA,n ,


wB = wB,1 , . . . , wB,n ,


the covariance between the returns is

Cov(KwA , KwB ) = wTACwB .

Proof Using the bilinearity of covariance we compute


 n n

X X 
Cov(KwA , KwB ) = Cov  wA, j K j , wB,k Kk 
j=1 k=1
n
X
= wA, j wB,k σ jk (since Cov(K j , Kk ) = σ jk )
j,k=1

= wTACwB ,

as required. 
52 Portfolios of multiple assets

1 1

1 1

Figure 4.2 The lines µw = m (left) and the curves σw = c (right).

4.2 Three risky securities


The purpose of this section is to provide geometric intuition as to the shape
of the attainable set.
In the case when we have three risky assets, the third weight of a portfo-
lio can be computed from the first two weights
w3 = 1 − w2 − w1 ,
meaning that the attainable set is parameterised by w1 and w2 . We can write
the formulae for µw and σw with respect to these two parameters as
µw = w1 µ1 + w2 µ2 + w3 µ3
= w1 µ1 + w2 µ2 + (1 − w1 − w2 ) µ3 ,
and
σ2w = w21 σ21 + w22 σ22 + w23 σ23 + 2w1 w2 σ12 + 2w1 w3 σ13 + 2w2 w3 σ23
= w21 σ21 + w22 σ22 + (1 − w2 − w1 )2 σ23 + 2w1 w2 σ12
+2w1 (1 − w2 − w1 ) σ13 + 2w2 (1 − w2 − w1 ) σ23 .
The plots of µw and σw are given in Figure 4.1. The lines on the graphs
represent the level sets {µw = m} and {σw = c} for several values of m and
c.
Since the third weight can be computed from the first two, the attainable
set is represented as the (w1 , w2 )-plane in Figure 4.2. The vertices of the
grey triangle represent investments in single assets. The point (1, 0) repre-
sents the first asset, (0, 1) the second asset, and since w3 = 1 − w1 − w2 ,
the point (0, 0) represents the third asset. The grey triangle consists of the
points
{(w1 , w2 ) | w1 , w2 ≥ 0, w1 + w2 ≤ 1}, (4.2)
4.2 Three risky securities 53

Figure 4.3 The plot of σw together with µw = m.

and contains portfolios attainable without short-selling.


The level sets {µw = m} and {σw = c} from Figure 4.1 can be projected
onto the (w1 , w2 )-plane in Figure 4.2. These are the straight lines and el-
lipses in Figure 4.2, respectively. The middle point of the ellipses is the
minimum variance portfolio. In this particular figure, since the point lies
outside of the triangle, we see that the minimum variance portfolio requires
short selling. In Figure 4.2 we also see that if short-selling is not allowed,
then the smallest attainable σw lies on the ellipse which is tangent to the
grey triangle. The minimum variance portfolio without short-selling is the
tangency point.
We now discuss the shape that the set of attainable portfolios takes in
the (σ, µ)-plane. We start with Figure 4.3, where we see the plane corre-
sponding to portfolios with µw = m, together with the plot of σw . We see
that there is a single point that has smallest attainable variance under the
constraint µw = m. This is the point at the bottom of the intersection of the
plane with the hyperbola. From the plot we also see that for µw = m we can
have portfolios with arbitrarily large σ. This leads to the conclusion that in
the (σ, µ)-plane, the set of portfolios with µw = m is a horizontal half line,
which is depicted in Figure 4.4. Intuitively one can think of Figure 4.4 as
the leftmost graph from Figure 4.3, rotated clockwise by ninety degrees,
and projected onto the plane. Since the plot of σw is a hyperbola, one is led
to believe that the boundary of the attainable set on the (σ, µ)-plane should
also be a hyperbola. This is just a geometric intuition, and is by no means
meant as a proof. We shall prove this fact later on.
When short-selling is not allowed, the attainable set is restricted to the
set from (4.2). In that case, in the (σ, µ)-plane the attainable set takes the
shape depicted in Figure 4.5. The three points represent the three assets. A
hyperbola passing through any two points represents portfolios involving
54 Portfolios of multiple assets

Figure 4.4 Attainable portfolios.

investments in the two securities corresponding to the points. The frag-


ments of the hyperbolas between two points correspond to the edges of the
triangle from Figure 4.2. The attainable set in Figure 4.5 can therefore be
interpreted as a distorted and folded projection of the triangle from Figure
4.2.

4.3 Minimum variance portfolio


In this section we give the formula for the weights of the portfolio with
smallest variance. Before doing so, we need to consider a technical lemma.

Lemma 4.3
We have the following formulae for the gradients computed with respect to
w:
 
∇ wT µ = µ, (4.3)
 
∇ wT 1 = 1, (4.4)
 
∇ w Cw = 2Cw,
T
(4.5)

and the Hessian of wTCw is equal to 2C.

Proof Since

∂  T  ∂
w µ = (w1 µ1 + · · · + wn µn ) = µi
∂wi ∂wi
4.3 Minimum variance portfolio 55

Figure 4.5 Attainable portfolios with short-selling constraints.

we see that

 
wT µ µ1
   
∂w1
   
  .. ..
  
∇ w µ = 
T  =   = µ,

 .    .


∂wn
wT µ
 µn
which proves (4.3).
The proof of (4.4) follows from an identical argument, using 1 instead
of µ.
To prove (4.5) we observe that in
n n
∂  T  ∂ XX
w Cw = w j wk σ jk
∂wi ∂wi j=1 k=1

the derivative of each term can be non-zero only when j = i or k = i. This


means that
n n
∂ XX
w j wk σ jk
∂wi j=1 k=1
 
∂  XX XX 
= wi wi σii + w j wk σ jk + w j wk σ jk 
∂wi j=i k,i j,i k=i
X X
= 2wi σii + wk σik + w j σ ji
k,i j,i
n
X
=2 wk σik (since σ ji = σi j ) (4.6)
k=1
= 2 (Cw)i
where (Cw)i stands for the i-th coordinate of the vector Cw. Combining
the partial derivatives on all coordinates gives (4.5).
56 Portfolios of multiple assets
Using (4.6) we can compute
 n 
∂ ∂  T  ∂  X
w Cw = wk σik 

2
∂wl ∂wi ∂wl k=1
= 2σil
= 2σli ,
hence
∂2  T 
!
w Cw = (2σli )l,i≤n = 2C,
∂wl ∂wi l,i≤n

which is the Hessian of wTCw 


We are ready to derive the formula for the weights of the minimum vari-
ance portfolio.
Theorem 4.4
The portfolio with the smallest variance in the attainable set has weights
C −1 1
wmin = . (4.7)
1TC −1 1
Proof We need to find the minimum of wTCw subject to the constraint
wT 1 = 1. (4.8)
To this end we use the method of Lagrange multipliers taking the La-
grangian
   
L(w) = ∇ wTCw −∇ λ(1T w − 1) .
By (4.4) and (4.5) from Lemma 4.3,
L(w) = 2Cw − λ1 = 0,
hence
λ −1
w= C 1. (4.9)
2
Substituting this into the constraint (4.8), we obtain
λ T −1
1 = wT 1 = 1T w = 1 C 1.
2
Solving this for λ and substituting the result into (4.9) gives (4.7). We have
shown that (4.7) is the only candidate for a local extremum. From Lemma
4.3 we know that the Hessian of wTCw is 2C, which is positive semidefi-
nite. By Theorem 3.4 this means that wmin is a global minimum. 
4.4 Minimum variance line 57
The minimum variance portfolio has the surprising property that its co-
variance with any other portfolio is constant. This property will prove use-
ful later on, when discussing the shape of the attainable set in the (σ, µ)-
plane.

Corollary 4.5
For any portfolio w
Cov(Kw , Kwmin ) = σ2wmin .

Proof By Proposition 4.2

Cov(Kw , Kwmin ) = wTCwmin


C −1 1
= wTC T −1
1 C 1
wT 1
= T −1
1 C 1
1
= T −1 . (4.10)
1 C 1
The above holds for any portfolio w, hence also in particular for w = wmin ,
giving
1
σ2wmin = Var(Kwmin ) = Cov(Kwmin , Kwmin ) = . (4.11)
1TC −1 1
Combining (4.10) with (4.11) we obtain our claim. 

4.4 Minimum variance line


To find the efficient frontier, we have to recognise and eliminate the dom-
inated portfolios. To this end we fix a level of expected return, denote it
by m, and consider all portfolios with µw = m. All of these are redundant
except the one with the smallest variance. The family of such portfolios,
parameterised by m, is called the minimum variance line (see Figure 4.6).
More precisely, portfolios on the minimum variance line are solutions of
the following problem:

min wTCw,
subject to: wT µ = m, (4.12)
wT 1 = 1.
58 Portfolios of multiple assets

MVL

Figure 4.6 Minimum variance line (MVL).

Theorem 4.6
Let M be a 2 × 2 matrix of the form
µ C µ µTC −1 1
" T −1 #
M= .
µTC −1 1 1TC −1 1
If C and M are invertible, then the solution of problem (4.12) is given by
1
w= C −1 (det(M1 ) µ + det(M2 ) 1) , (4.13)
det(M)
where
m µTC −1 1 µTC −1 µ m
" # " #
M1 = , M2 = .
1 1TC −1 1 µTC −1 1 1
Proof We introduce the Lagrange multiplier λ = (λ1 , λ2 ), and the La-
grangian
     
L(w) = ∇ wTCw −λ1 ∇ wT µ − m + λ2 ∇ wT 1 − 1 = 0.
Using Lemma 4.3 we can compute
L(w) = 2Cw − λ1 µ − λ2 1 = 0.
We solve this system for w:
1 1
w= λ1C −1 µ + λ2C −1 1. (4.14)
2 2
Since wT µ = µT w and wT 1 = 1T w, substituting (4.14) into the constraints
from (4.12), we obtain a system of linear equations
1 1
λ1 µTC −1 µ + λ2 µTC −1 1 = m,
2 2
1 1
λ1 1 C µ + λ2 1TC −1 1 = 1.
T −1
2 2
4.4 Minimum variance line 59
We can solve the above system for λ1 and λ2 to obtain (note the relevance
of the assumption that M is invertible, which ensures that det(M) , 0)
1 det (M1 ) 1 det (M2 )
λ1 = , λ2 = .
2 det (M) 2 det (M)
Substituting the above back into (4.14) gives (4.13).
We have found a candidate for the solution of (4.12). By Lemma 4.3 we
know that the Hessian of wTCw is equal 2C, which is a positive semidefi-
nite matrix. By Theorem 3.4 this ensures that we have found a global min-
imum. 

Exercise 4.5 Consider three uncorrelated assets with


σ21 = 0.01, σ22 = 0.02, σ23 = 0.04,
µ1 = 10%, µ2 = 20%, µ3 = 30%.
Using (4.13) compute the portfolio which solves the problem (4.12)
for m = 25%.

The formula (4.13) is long and somewhat cumbersome to apply. Our


aim will be to simplify it. The first step towards this end is to notice that all
portfolios on the minimum variance line can be expressed by means of an
affine function of m involving two fixed vectors.

Corollary 4.7
There exist two vectors a and b, which depend only on C and µ, such that
for any real m the solution of the problem (4.12) is

w = ma + b.

Proof Since

det (M1 ) = m1TC −1 1 − µTC −1 1,


det (M2 ) = µTC −1 µ − mµTC −1 1,

from (4.13) we see that w = ma + b for


1     
a= C −1 1TC −1 1 µ − µTC −1 1 1 ,
det(M)
1     
b= C −1 µTC −1 µ 1 − µTC −1 1 µ .
det(M)

60 Portfolios of multiple assets

MVP

Figure 4.7 Efficient frontier, together with the minimum variance portfolio
(MVP).

The efficient frontier, which is the set of all portfolios not dominated by
any other portfolios, consists of w = am + b for m ≥ µwmin (see Figure 4.7).
We now show that the whole minimum variance line can be found from
just two portfolios. This result is often referred to as the two-fund the-
orem, since it means that two efficient portfolios (with unequal returns)
suffice to establish an efficient investment policy.
Corollary 4.8
Suppose that w1 and w2 are two portfolios on the minimum variance line
with different expected returns: µw1 , µw2 . Then any portfolio w on the
minimum variance line can be obtained from these two, that is, there is a
real number α such that w = αw1 + (1 − α)w2 .
Proof We first find α so that
µw = αµw1 + (1 − α)µw2 .
This is possible since the returns are different:
µw − µw2
α= .
µw1 − µw2
Since the two portfolios lie on the minimum variance line, they satisfy
w1 = µw1 a + b,
w2 = µw2 a + b.
From these relations we have
αw1 + (1 − α)w2 = (αµw1 + (1 − α)µw2 )a + b = µw a + b,
but w is also on the minimum variance line so w = µw a + b, hence the
result. 
4.4 Minimum variance line 61
The minimum variance portfolio wmin lies on the minimum variance line.
We therefore already have a simple formula (4.7) for one of the two port-
folios needed to obtain the minimum variance line. The second portfolio
is the market portfolio, whose formula will be derived in the next section.
The resulting parameterisation of the minimum variance line will then be
written out in equation (4.18).
From Corollary 4.8 we obtain the following important observation.
Theorem 4.9
Suppose that there exist two portfolios w1 and w2 on the minimum variance
line with different expected returns: µw1 , µw2 . Then the minimum variance
line is a hyperbola centred on the vertical axis.
Proof Let Kw1 and Kw2 be the returns on portfolios w1 and w2 , respec-
tively. From Corollary 4.8 we know that any portfolio on the minimum
variance line can be expressed as
w = αw1 + (1 − α)w2 ,
hence its return is equal to
Kw = αKw1 + (1 − α)Kw2 .
We can treat each of the two portfolios as if it were a single security. Ap-
plying the results from Chapter 2 for portfolios consisting of two securities,
we know that
µw = αµw1 + (1 − α) µw2 ,
σ2w = α2 σ2w1 + (1 − α)2 σ2w2 + 2α (1 − α) Cov Kw1 , Kw2 .


Since µw1 , µw2 , by Theorem 2.7 the curve (σw , µw ) is a hyperbola. 

Exercise 4.6 Consider three securities with the following parame-


ters:
   
 0.01 0 0   0.1 
C =  0 0.02 0.02  , µ =  0.2  .
   
   
0 0.02 0.04 0.3
Find the vectors a, b described in Corollary (4.7). Using a and b
compute the vector on the minimum variance line corresponding to
m = 20%.
62 Portfolios of multiple assets

CML

MP
MVP

Figure 4.8 Minimum variance portfolio (MVP), the market portfolio (MP),
and the capital market line (CML).

Exercise 4.7 Consider the data from Exercise 4.6. Plot the mini-
mum variance line in the (w1 , w2 )-plane. Consider two portfolios cor-
responding to m = 10% and m = 20%. Find the variances of, as well
as the covariance between, their returns. Use these to plot the mini-
mum variance line in the (σ, µ) plane.

Exercise 4.8 Consider the data from Exercise 4.6. Find the weights
and the expected return of a portfolio on the minimum variance line
with σ2 = 0.007.

4.5 Market portfolio


Recall that the market portfolio is the optimal portfolio on the efficient
frontier taking into account the existence of a risk-free asset. The line con-
necting the market portfolio with the risk-free asset is tangent to the mini-
mum variance line and has maximal slope among the lines determined by
all portfolios (see Figure 4.8).
In Chapter 2 we found the formula for the market portfolio obtained in
the case of two risky securities determining the efficient set. This result
is of course applicable to the general situation in view of Corollary 4.8.
4.5 Market portfolio 63
However, we derive the formula again; this time the parameters of all n
securities will be used.

Theorem 4.10
If the risk-free return R is smaller than the expected return of the minimum
variance portfolio, then the market portfolio exists and is given by
C −1 (µ − R1)
m= . (4.15)
1TC −1 (µ − R1)
Proof From Theorem 4.9 we know that the minimum variance line is
a hyperbola. Since its centre is on the vertical axis, there exists a single
tangency point for a half line emanating from (0, R), which maximises the
slope (see Figure 4.8). The slope in question is of the form
µw − R wT µ − R
= √ ,
σw wTCw
where w are the weights of a portfolio and R is the risk-free rate of return.
At the maximal slope the Lagrangian
wT µ − R
!
L(w) = ∇ √ − λ∇(wT 1 − 1),
wTCw
needs to be equal to zero. We can compute the gradients using Lemma 4.3
and equate them to zero:

µ wTCw − (wT µ − R) 2 √w1T Cw 2Cw
L(w) = − λ1 = 0.
wTCw
This yields
Cw
µσw − (µw − R) − λσ2w 1 = 0,
σw
hence
µw − R
Cw = µ − λσw 1.
σ2w
Multiplying by wT on the left and using the fact that wT 1 = 1 we get
µw − R T
w Cw = µw − λσw ,
σ2w
so
R
λ= ,
σw
64 Portfolios of multiple assets
therefore we have the equation
γCw = µ − R1,
µw −R
where γ = σ2w
. Therefore

γw = C −1 (µ − R1). (4.16)
Even though we have w in the formula for γ, we show that γ turns out to
be a constant. This follows from multiplying the above equation by 1T on
both sides, which gives
γ = 1TC −1 (µ − R1).
By substituting γ into (4.16) we obtain our claim. 

Exercise 4.9 Prove that when R is equal to the expected return of the
minimum variance portfolio, then the formula for the market portfolio
results in a division by zero. Explain geometrically why this is so.

The line joining the risk-free security represented by (0, R) and the mar-
ket portfolio with coordinates (σm , µm ) is given by the equation
µm − R
µ=R+ σ. (4.17)
σm
It is called the capital market line, CML in brief. For a portfolio on CML
with risk σ the term µmσ−Rm
σ is called the risk premium, which is the addi-
tional return above the risk-free level, representing a reward or compensa-
tion for exposure to risk.
If all the investors agree on the values of the model parameters (the ex-
pected returns on the basic assets and the entries of the covariance matrix)
and if each investor chooses an optimal portfolio according to convex in-
difference curves on the basis of risk-return analysis, then all these optimal
portfolios are placed on the CML. Consequently, they should all invest in
just one risky portfolio, namely the market portfolio (combining it with
the risk-free asset in a preferred individual way). Consequently, the mar-
ket portfolio weights should represent the relative volumes of the values
of particular shares of stock with respect to the whole market (just as in
Chapter 2, where we discussed a simple market with just two ingredients).
Such a portfolio is represented in practice by the market index.
We now return to our discussion of the shape of the minimum variance
line. From Corollary 4.8 we know that this line can be constructed using
4.5 Market portfolio 65

m2
m1

Figure 4.9 Efficient frontier in the case of different rates for investing and
borrowing risk free.

wmin and m. By Corollary 4.8, Cov(Kwmin , Km ) = σ2wmin , which gives the


following parameterisation of all (σw , µw ) on the minimum variance line:
µw = αµwmin + (1 − α) µm , (4.18)
σ2w =α 2
σ2wmin + (1 − α) 2
σ2m + 2α (1 − α) σ2wmin .
The quantities µwmin , σwmin , µm and σm are easy to compute, due to the sim-
plicity of the expressions for wmin and m (see (4.7) and (4.15)). This makes
(4.18) a handy tool for making plots of the minimum variance line.
We conclude this chapter by considering a situation where we have dif-
ferent rates for risk-free borrowing and investing. This is a more realistic
setting than assuming that we have a single risk-free rate of return R.
Assume that we can invest risk-free at a rate of return R1 and borrow
at R2 . We assume that R1 < R2 , since the opposite inequality would allow
investors to make risk-free profits. Any portfolio w invested in the risky
securities can be combined with a risk-free investment at the rate of return
R1 . This gives the following portfolios on the (σ, µ)-plane:
µα = αR1 + (1 − α) µw ,
for α ≥ 0.
σα = |1 − α| σw ,
Note that we can not take α < 0, since this implies a short position at R1 ,
which would mean borrowing at R1 .
We can also combine any portfolio w with borrowing at R2 , giving
µα = αR2 + (1 − α) µw ,
for α ≤ 0.
σα = (1 − α)σw ,
We cannot take α > 0 here since this would mean investing at R2 , which is
not allowed. We can only borrow at this rate.
66 Portfolios of multiple assets
To find the efficient frontier we first establish two tangency portfolios
m1 and m2 , for the half-lines starting from (0, R1 ) and (0, R2 ), respectively.
The portfolios m1 and m2 can be computed using (4.15) taking R1 and R2
instead of R, respectively. The frontier is depicted in Figure 4.9 and consists
of the interval between (0, R1 ) to (σm1 , µm1 ), the fragment of the minimum
variance line between (σm1 , µm1 ) and (σm2 , µm2 ), together with the half line
starting from (σm2 , µm2 ).

Exercise 4.10 Consider the data from Exercise 4.6. Let R1 = 5%


and R2 = 10%. Assume that we invest V = 1000. Determine how we
should divide V amongst the securities to obtain an efficient portfolio
with:
(i) σ2 = 0.003;
(ii) σ2 = 0.023;
(iii) σ2 = 0.16.
5
The Capital Asset Pricing Model

5.1 Derivation of CAPM


5.2 Security market line
5.3 Characteristic line

The market portfolio exists when the return on the minimum variance
portfolio exceeds the risk-free return. The Capital Asset Pricing Model
(CAPM) provides a linear relationship between the expected return µm on
the market portfolio and that of any risky asset. The two are linked by
means of a parameter, commonly known as the beta (β), providing a mea-
sure of undiversifiable risk of an asset. In the chapter we explore this rela-
tionship and show how the CAPM formula can assist investment decisions
and introduce measures of portfolio performance.
Paradoxically, although we use variance to quantify risk, in assessing
portfolio risk the variances of the assets in the portfolio turn out to be less
relevant than their mutual covariances. To demonstrate this, let us consider
the following example.

Example 5.1
Suppose that the weights of a portfolio are of the form w j = 1n , j ≤ n,
where n is the number of assets in the portfolio. We investigate the risk of
this portfolio in terms of its dependence on n. Assume that the variances of
all securities on the market are uniformly bounded, σ2j ≤ L. Then
n n
X X X 1 1 X
σ2w = w j wk σ jk = w2j σ2j + w j wk σ jk ≤ n L + σ jk .
j,k=1 j=1 j,k
n2 n2 j,k

67
68 The Capital Asset Pricing Model

Assume further that the off-diagonal elements of the covariance matrix are
uniformly bounded, |σ jk | ≤ c, for some c > 0. Then
L 1
σ2w ≤ + n(n − 1)c.
n n2
The upper bound converges to c as n → ∞. Hence the risk of a portfolio
containing many assets is determined by the covariances. The variances of
the ingredients become irrelevant for large n.

This example motivates the following distinction between two kinds of


risk: diversifiable, or specific risk, which can be reduced to zero by ex-
panding the portfolio, and undiversifiable, systematic, or market risk,
which cannot be avoided because the securities are linked to the market
From the above example we see that the variances of returns on indi-
vidual securities are not the leading factors in determining the risk of a
portfolio. The risk should rather depend on its undiversifiable risk, which
should in turn depend on the asset’s covariances with the remaining assets.
The aim of the Capital Asset Pricing Model (CAPM) is to quantify the
systematic risk of an asset and to link it with its expected return.

5.1 Derivation of CAPM


In this section we derive the Capital Asset Pricing Model formula for the
expected return of a risky security. Before doing so we need the following
definition.
Definition 5.2
We call
Cov(Ki , Km )
βi =
σ2m
the beta factor of the i-th security.
It will turn out that the beta factor is directly related to the systematic
risk of a security. We discuss this later on. First we state the famous CAPM
formula.
Theorem 5.3 (CAPM)
Suppose that the risk-free return R is lower than the expected return of the
5.1 Derivation of CAPM 69

Figure 5.1 Lack of tangency for portfolios built out of a security and the
market portfolio, leads to portfolios with higher slope than that of the market
portfolio.

minimal variance portfolio (so that the market portfolio m exists). Then,
for each i ≤ n, the expected return µi of the i-th asset in the portfolio is
given by the formula
µi = R + βi (µm − R). (5.1)

Proof As we know, the capital market line is tangent to the minimum vari-
ance line at the market portfolio point (σm , µm ) (see Figure 4.8). Consider
all portfolios built by means of the market portfolio and the i-th security.
They form a hyperbola which we claim to be tangent to the capital market
line at (σm , µm ). Suppose that, on the contrary, this hyperbola intersects the
CML. This clearly contradicts the fact that the slope of CML is maximal,
see Figure 5.1
We compute the slope of the tangent line to the hyperbola at (σm , µm )
and then we will use the fact that the slope of CML is the same. Denote
the proportion of wealth invested in security i by x and that invested in the
market portfolio by 1 − x. We use x to denote the portfolio x = (x, 1 − x).
The risk and return are of the form

µx = xµi + (1 − x)µm ,
q
σx = x2 σ2i + (1 − x)2 σ2m + 2x(1 − x)Cov(Ki , Km ),

and we compute their derivatives with respect to x at x = 0 to obtain


∂µx
= µi − µm ,
∂x x=0
∂σx Cov(Ki , Km ) − σ2m
= .
∂x x=0 σm
70 The Capital Asset Pricing Model
The slope of the tangent is the ratio of these derivatives and we equate it to
the slope of CML:
µi − µm µm − R
= .
Cov(Ki ,Km )−σ2m σm
σm

Solving for µi we get


Cov(Ki , Km )
µi = R + (µm − R) = R + βi (µm − R),
σ2m
as required. 
The term βi (µm − R) in the CAPM formula (5.1) is called the risk pre-
mium. It represents the additional return required by an investor who faces
the risk represented by the link of the portfolio to the whole market.
We see that the beta factor determines the expected return on a security.
This means that beta quantifies the undiversifiable risk.
For a portfolio w we define
Cov(Kw , Km )
βw = .
σ2m
Observe that for the market portfolio
βm = 1.

Exercise 5.1 Derive the CAPM formula


µw = R + βw (µm − R),
for a portfolio from the CAPM formula (5.1) for a single security.

Exercise 5.2 Assume that we can invest risk-free at a rate of return


R1 and borrow at R2 . Let m1 and m2 be the weights of the two tan-
gency portfolios, corresponding to R1 and R2 , respectively. Prove that
Cov(Ki , Km1 )
µi = R1 + µm1 − R1 ,

σ2m1
Cov(Ki , Km2 )
µi = R2 + µm2 − R2 .

σm2
2
5.2 Security market line 71

SML CML

MP MP

Figure 5.2 Security market line (SML) and the capital market line (CML).
MP is the market portfolio.

5.2 Security market line


We start by presenting an alternative proof of Theorem 5.3. We do this in a
slightly more general context, formulating the result for a portfolio instead
of a single security.
Theorem 5.4
Suppose that the risk-free return R is lower than the expected return of the
minimal variance portfolio (so that the market portfolio m exists). Then,
for any portfolio w
µw = R + βw (µm − R). (5.2)
Proof From Theorem 4.10 we know that
1 −1
m= C (µ − R1),
γ
for γ = 1TC −1 (µ − R1). Applying Proposition 4.2,
1 T
Cov(Kw , Km ) wTCm γ
w (µ − R1)
βw = = = .
σm
2 T
m Cm 1
γ
mT (µ − R1)

Since wT µ = µw , mT µ = µm and wT 1 = mT 1 = 1, this gives


µw − R
βw = .
µm − R
Rearranging we obtain (5.2). 
The above proof is shorter than our first proof of Theorem 5.3. The first
proof, however, is more intuitive, showing that the beta factor arises from
purely geometric considerations.
72 The Capital Asset Pricing Model
From Theorem 5.4 we see that in the (β, µ)-plane all portfolios lie on the
straight line
µ = R + β(µm − R).
The graph of this function in the (β, µ)-plane is called the security mar-
ket line. This is shown in Figure 5.2 where the CML is also plotted for
comparison.
In Figure 5.2, we see that we can have securities that remain attractive
to investors despite having small expected returns and large variances, The
reason for this is that these securities have negative betas, which implies
that the covariance of the return on such an asset with the market is nega-
tive, meaning that the prices of such securities tend to move in the opposite
direction to the market. Such assets are useful for hedging against negative
trends on the market. A standard example of an asset with negative beta is
gold, which can act as an insurance in a financial crisis.
The CAPM formula can be used to make investment decisions. Let us
refer to the return from the CAPM formula as the required return. We
can think of the required return as how the market perceives the expected
return on a given security. Each individual investor, however, has his own
beliefs. If for a given security an investor thinks, due to some additional
information he has, that the true expected return is higher than the required
return,
µi > R + βi (µm − R),
then this means that the security is underpriced. He should then invest in the
security. If more investors share this belief, they will do the same, and as a
result of the demand created the price goes up, which pushes the expected
return down. On the other hand, if
µi < R + βi (µm − R),
investors want to sell or even short-sell the security, the price falls because
of the excess supply, and the expected return increases. In both cases we
should therefore observe price adjustments restoring the CAPM formula to
an equilibrium.
Apart from illustrating the market equilibrium, CAPM has applications
in analysing the performance of various investments. The right-hand side
of CAPM gives the target return and this is compared with the realised
return. The difference: the realised return minus the target return, is called
the Jensen index. A possible goal is to achieve a positive value of this
index, the higher the better.
5.3 Characteristic line 73
Another approach to the evaluation of performance comes from com-
paring a portfolio’s market price of risk with an agreed benchmark. For a
given portfolio w the market price of risk is defined as the excess return per
unit risk:
µw − R
MPRw = .
σw
This quantity is referred to as the Sharpe index or Sharpe ratio. The bench-
mark is the market price of risk for the market portfolio, in other words the
slope of the CML:
µm − R
MPRm = .
σm
The investor will clearly seek to maximise the Sharpe index of his portfolio.

5.3 Characteristic line


The CAPM formula is concerned with expectations. Our next step is to
consider the returns themselves, that is the random variables
Kw = R + βw (Km − R) + ew , (5.3)
where the error ew is a random variable defined as
ew = Kw − [R + βw (Km − R)].
From the CAPM formula (5.2) we have
E(ew ) = µw − [R + βw (µm − R)] = 0.
It is interesting to observe that the principle of error minimisation im-
plies the form of the beta coefficient:
Proposition 5.5
Given a portfolio w, let ew = Kw − R − β(Km − R) for some number β. The
w ,Km )
variance of ew is minimal for β = Cov(K
Var(Km )
.
Proof We can compute the variance of ew as
Var(ew ) = Var(Kw − R − β(Km − R))
= Var(Kw − βKm ) (Var(X + a) = Var(X) for constant a)
= Var(Kw ) + Var(−βKm ) + 2Cov(Kw , −βKm )
= Var(Kw ) + β2 Var(Km ) − 2βCov(Kw , Km ).
74 The Capital Asset Pricing Model
This is a quadratic function of β with a positive coefficient for β2 . The
minimum is found when
0 = 2βVar(Km ) − 2Cov(Kw , Km ),
hence
Cov(Kw , Km )
β= ,
Var(Km )
which concludes the proof. 
The relation between the returns (5.3) and the connection to the minimis-
ing of the variance of the error provides a method of finding the beta from
historical data. Plotting the realised past returns on the securities against
the realised returns on the market portfolio enables one to find the line of
best fit, also known as the security characteristic line. For an asset with
return Ki we have
Ki − R = αi + βi (Km − R) + ei
where ei is the error, with E(ei ) = 0, and αi is called the alpha, or abnormal
return of the asset. By CAPM theory, the coefficient αi should be zero. In
practice though, markets do not strictly follow the theory and non-zero
abnormal returns can be observed from historical data.
If K̂i1 , . . . , K̂id and K̂m
1
, . . . , K̂m
d
are the historical realised returns, then
we can find the parameters of the characteristic line using the least square
method. It will be convenient to use the notation
x j = K̂mj − R,
for j = 1, . . . , d,
y j = K̂ij − R,
to stand for historical excess returns. We define a function
d 
X  d 
2 X 2
f (α, β) = K̂ij − R − α − β K̂mj − R = y j − α − βx j ,
j=1 j=1

and find its minimum by solving the system of equations


∂f
∂α
= 0,
∂f
(5.4)
∂β
= 0.
This leads to
x̄ȳ − xy
β= , (5.5)
x̄ x̄ − xx
α = ȳ − β x̄,
5.3 Characteristic line 75
where
x̄ = d1 dj=1 x j , xy = xy,
P 1 Pd
d Pdj=1 2j j
ȳ = d1 dj=1 y j , xx = 1
j=1 x j .
P
d

Formula (5.5) can be used to estimate the beta factor of a security, based
on historical data.

Exercise 5.3 Derive (5.5) from (5.4).

We conclude this chapter by returning to (5.3), in order to compute the


variance of Kw . This will highlight from yet another angle the fact that the
beta factor quantifies the undiversifiable risk.
Proposition 5.6
The variance of the return on a portfolio can be expressed as
σ2w = β2w σ2m + Var(ew ). (5.6)
Proof First we find the covariance between ew and Km
Cov(Km , ew ) = Cov(Km , Kw − R − βw (Km − R))
= Cov(Km , Kw ) − βw Cov(Km , Km )
= 0.
Next
Var(Kw ) = Var(R + βw (Km − R) + ew )
= Var(βw Km + ew ) (since Var(X + a) = Var(X))
= β2w Var(Km ) + Var(ew ) + 2βw Cov(Km , ew )
= β2w Var(Km ) + Var(ew ),
which concludes the proof. 
The formula (5.6) sheds more light on the distinction between the two
kinds of risk. The first term represents the systematic risk that cannot be
avoided by adding more securities to the portfolio and it is measured by
the beta coefficient. The second term is the diversifiable part of the risk.
Taking w = m, since βm = 1,
em = Km − R − βm (Km − R) = 0,
hence the term Var(ew ) can be discarded if we invest in the market portfolio
or in a portfolio sufficiently diversified to serve in practice as its substitute.
6
Utility functions

6.1 Basic notions and axioms


6.2 Utility maximisation
6.3 Utilities and CAPM
6.4 Risk aversion

Making the fundamental assumption that rational investors prefer more


wealth to less, we impose preference relations on the set of possible fi-
nal (time 1) positions of an investor who, at time 0, invests a fixed sum
in a range of risky securities. In this chapter we simplify the analysis by
restricting to a finite sample space, so that there are N possible outcomes.
We state axioms for preference relations among the N-dimensional vectors
representing the possible outcomes for his final wealth. Each such relation
is expressed in terms of a real-valued function called a utility.
We focus on utilities arising as expectations, and show that utility max-
imisation is closely related to the No Arbitrage Principle (NAP), which
is discussed in detail in [DMFM]. This leads to the introduction of state
prices (equivalently, risk-neutral probabilities). We solve the utility max-
imisation problem in terms of minimising expectations with respect to the
set of possible state price vectors. We also explore the relationship between
quadratic utility functions and the CAPM and conclude with a brief study
of risk aversion measures.

6.1 Basic notions and axioms


We begin with recalling some basic probability notation. In this chapter
we restrict our attention to the case of a discrete probability space, Ω =

76
6.1 Basic notions and axioms 77
{ω1 , . . . , ωN }, with
P({ωi }) = pi > 0.
The prices of securities are denoted by Sj (0), the initial prices, and
Sj (1, ωi ) = Sj (1)(ωi ),
the prices at the end of the period, which depend on the state. Portfolios will
be described by the numbers x j of securities held. A portfolio is represented
by a vector x = (x1 , . . . , xn ). We denote the initial wealth of the investor by
V, so the formation of a portfolio is subject to the bound
n
X
x j Sj (0) = V.
j=1

The final wealth is a random variable determined by the portfolio chosen,


and we denote it by Vx (1). In the state ωi it takes the value
n
X
Vx (1, ωi ) = x j Sj (1, ωi ).
j=1

We will find it convenient to use the following matrix notation:


 s11 · · · s1n
 

S(1) =  ... ..
h i
S(0) = S 1 (0) · · · S n (0) ,  ,
 
 . 
(6.1)
sN1 · · · sNn
where
si j = Sj (1, ωi ).
We can then write
Vx (0) = S(0)x,
Vx (1) = S(1)x. (6.2)
The matrix S(1) represents a linear map, which we assume to be one-to-
one. This means in particular that the number of rows (N) is not less than
the number of columns (n) and that the matrix has maximal rank, namely
n. In other words, the number N of scenarios (members of Ω) is at least as
great as the number of assets (n).
At times we will find it convenient to identify a random variable X :
Ω → R with a vector X = (X1 , . . . , XN ) ∈ RN , by which we mean that
Xi = X(ωi ).
78 Utility functions
The amount Vx (1) can be consumed by the investor. This motivates the
name feasible consumption set for the set
n o
FCS = X ∈ RN | Xi ≥ 0, X = Vx (1) where Vx (0) = V .
We assume that the investor can decide between any two possible final
consumptions from the FCS . So we assume that a binary relation on FCS
is given: for X, Y ∈ FCS we write X  Y to mean that the investor prefers
Y to X.

Axiom 1 (transitivity) If X  Y and Y  Z then X  Z.

This axiom is sometimes called the consistency axiom since it excludes


irrational preferences.

Axiom 2 (completeness) For all X, Y either X  Y or Y  X.

Thus we assume that each individual can always decide which of two
given positions he prefers.
If Axioms 1 and 2 are satisfied, we call  a preference relation. In
practice, a preference relation may be difficult to specify. An alternative
approach is based on employing a so-called utility.
Definition 6.1
A function U : RN → R is called a utility if it is strictly increasing with
respect to each variable, differentiable and strictly concave.
Using a utility U we can define the relation
X U Y if and only if U(X) ≤ U(Y).

Exercise 6.1 Show that when U is a utility, U is a preference rela-


tion.

Not every preference relation can be represented by a utility. We give an


example of this in the form of an exercise.

Exercise 6.2 The lexicographic order lex on R2 is defined as fol-


lows: for p = (p1 , p2 ) and q = (q1 , q2 )
p lex q
6.1 Basic notions and axioms 79
if and only if
p1 < q1 or p1 = q1 and p2 ≤ q2 .
Show that lex is a preference relation that cannot be represented by a
utility.

A particular case of utility is the expected utility, determined by means


of a utility function.

Definition 6.2
We say that u : R → R is a utility function if it is strictly increasing,
differentiable and strictly concave.

Proposition 6.3
If u : R → R is a utility function, then U defined by

U(X) = E(u(X))

is a utility.

Proof The function U can be written as


N
X
U(X) = E(u(X)) = pi u(Xi ).
i=1

We need to show that U is strictly increasing with respect to each variable,


differentiable and strictly concave.
The function U is differentiable since u is differentiable; in particular
h ∂U ∂U ∂U
i
U 0 (X) = ∂X 1
(X) ∂X2
(X) · · · ∂XN
(X)
h i
= p1 u (X1 ) p2 u (X2 ) · · · pN u0 (XN ) .
0 0

The function u is strictly increasing, hence u0 (x) > 0 for all x ∈ R. We


also have pi > 0 for i = 1, . . . , N, hence
∂U
(X) = pi u0 (Xi ) > 0.
∂Xi
This means that U is strictly increasing with respect to each variable.
Since u is strictly concave, for any x1 , x2 and any λ ∈ (0, 1)

u(λx1 + (1 − λ)x2 ) > λu(x1 ) + (1 − λ)u(x2 ).


80 Utility functions
For any X, Y ∈ RN this gives
U(λX + (1 − λ) Y) = U(λX1 + (1 − λ) Y1 , . . . , λXN + (1 − λ) YN )
XN
= pi u(λXi + (1 − λ) Yi )
i=1
N
X
> pi [λu(Xi ) + (1 − λ) u(Yi )]
i=1

= λU(X) + (1 − λ)U(Y),
which means that U is strictly concave. 
Definition 6.4
We say that a utility U is a von Neumann–Morgenstern utility if there
exists a utility function u such that
U(X) = E(u(X)).
The crucial feature of a von Neumann–Morgenstern utility is that it is
determined by a single-variable function u.

Example 6.5
Typical examples of utility functions are as follows:
(i) Exponential: u(x) = −e−ax ;
(ii) Logarithmic: u(x) = ln x;
(iii) Power: u(x) = axa for a ≤ 1;
(iv) Quadratic: u(x) = x − 12 bx2 (which is increasing only for x < 1b ).

Exercise 6.3 Verify that the functions from Example 6.5 satisfy the
conditions of Definition 6.2.

6.2 Utility maximisation


An investor wishes to maximise his utility, meaning that he seeks a solution
to the problem
max{U(X) : X ∈ FCS }. (6.3)
6.2 Utility maximisation 81
The existence of a solution to this problem is related to the notion of arbi-
trage.
Definition 6.6
We say that a portfolio x = (x1 , . . . , xn ) is an arbitrage opportunity if
Vx (0) = 0 and Vx (1) ≥ 0 with Vx (1, ωi ) > 0 for at least one ωi ∈ Ω.
A fundamental assumption of mathematical finance is that arbitrage op-
portunities do not exist (this is known as the No Arbitrage Principle; see
[DMFM] and [BSM] for extensive discussions). The next result explains
how this principle relates to utility maximisation.
Theorem 6.7
If there is a solution to problem (6.3), then there is no arbitrage. Con-
versely, if U is continuous and there is no arbitrage, then problem (6.3)
has a solution.
Proof Suppose there is an x∗ ∈ Rn such that Vx∗ (1) ∈ FCS is a solution
of (6.3), meaning that
U(X) ≤ U(Vx∗ (1)), (6.4)
for any feasible consumption X. Suppose that there exists an arbitrage op-
portunity y. Take z = x∗ + y. Since Vy (0) = 0, and Vy (1, ωi ) ≥ 0 for any
ωi ∈ Ω,
Vz (0) = Vy (0) + Vx∗ (0) = Vx∗ (0) = V,
Vz (1, ωi ) = Vy (1, ωi ) + Vx∗ (1, ωi ) ≥ Vx∗ (1, ωi ) ≥ 0,
so z is feasible. We know that Vy (1, ωk ) > 0 for some ωk ∈ Ω, which
implies that
Vz (1, ωk ) > Vx∗ (1, ωk ).
This means that since U is strictly increasing in each variable,
U(Vz (1)) > U(Vx (1)),
which contradicts (6.4). We have thus proved that there is no arbitrage.
We now show that no arbitrage implies existence of a solution of (6.3).
We shall use the fact that a continuous function on a closed bounded subset
of RN admits a maximum.
The set FCS , which is a subset of RN , is closed, since U is continuous
and defines FCS by weak inequalities. So to obtain an maximum it is suf-
ficient to show that FCS is bounded. Suppose that, on the contrary, there is
82 Utility functions
a sequence xk such that Vxk (1) → ∞ as k → ∞. (Here kZk = maxi≤N |zi |
for any Z = (z1 , ..., zN ) in RN .) Let
C = max Sj (1, ωi ) .
j=1,...,n
i=1,...,N

Observing that for any y = (y1 , . . . , yn ) and any i ≤ N,


n
X
Vy (1, ωi ) = y j Sj (1, ωi ) ≤ C max y j .
j=1,...,n
j=1

This shows that we can only have Vxk (1) → ∞ when kxk k → ∞. The
sequence zk = kxxkk k is bounded, hence has a subsequence convergent to
a limit z. We show that z is an arbitrage opportunity, which provides the
contradiction we seek. First,
n n
X 1 X V
Vzk (0) = (zk ) j Sj (0) = (xk ) j Sj (0) = → 0,
j=1
kxk k j=1 kxk k

so Vz (0) = 0. Second, for any ωi ∈ Ω


n
1 X 1
Vzk (1, ωi ) = (xk ) j Sj (1, ωi ) = Vx (1, ωi ) ≥ 0,
kxk k j=1 kxk k k

by the definition of FCS , and this inequality is preserved in the limit, giv-
ing
Vz (1, ωi ) ≥ 0. (6.5)
Since S(1) is one-to-one, if we had S(1)z = 0, then z would need to be
equal to zero. This is not possible since kzk = 1, hence
Vz (1) = S(1)z , 0.
Combined with (6.5), this means that Vz (1, ωi ) > 0 for some ωi ∈ Ω,
showing that z is an arbitrage opportunity. 
We now turn to the question of the relation between the security prices
at time 0 and 1.
Definition 6.8
We say that π = (π1 , . . . , πN ) is a vector of state prices, if πi > 0 for
i = 1, . . . , N, and
XN
Sj (0) = πi Sj (1, ωi ). (6.6)
i=1
6.2 Utility maximisation 83
Condition (6.6) can be written in matrix notation as
S(0) = πT S(1). (6.7)
We have the following relation linking the value of a strategy with state
prices.
Lemma 6.9
For any x ∈ Rn
N
X
Vx (0) = πi Vx (1, ωi ).
i=1

Proof The claim follows from computing


n
X
Vx (0) = x j Sj (0)
j=1
n
X N
X
= xj πi Sj (1, ωi ) (from (6.6))
j=1 i=1
N
X n
X
= πi x j Sj (1, ωi )
i=1 j=1
N
X
= πi Vx (1, ωi ).
i=1

Suppose that one of the securities is risk-free, that is, S 1 (1, ωi ) = 1 for
all i, say. Then
XN
S 1 (0) = πi ,
i=1

which is the price of a sure unit of currency (say euro) to be received at


time 1, that is, it is the discount factor. We then have the relation with the
risk-free return
N
X 1
πi = . (6.8)
i=1
1 + R
State prices are related to risk-neutral probabilities.
Definition 6.10
We say that a probability Q
Q({ωi }) = qi for i = 1, . . . , N,
84 Utility functions
is a risk-neutral probability if for any j ∈ {1, . . . , n}
N
1 1 X
Sj (0) = EQ (Sj (1)) = qi Sj (1, ωi ). (6.9)
1+R 1 + R i=1
One of the fundamental results in mathematical finance, referred to in
the literature as the first fundamental theorem of asset pricing, states that
lack of arbitrage is equivalent to the existence of a risk-neutral probability.
(For details the reader is directed to [DMFM].) Comparing (6.6) with (6.9),
we see that
qi
πi = ,
1+R
thus existence of state prices is equivalent to the No Arbitrage Principle.
However, the No Arbitrage Principle does not guarantee that a risk-neutral
probability is unique. For this we need the notion of completeness of the
market model.
Definition 6.11
A market model is complete if for any H : Ω → R, there exists an x ∈ Rn
such that
Vx (1) = H.
When the market model is arbitrage-free and complete, the risk-neutral
probability exists and is unique. This result is referred to as the second
fundamental theorem of asset pricing. Details and a proof can be found
in [DMFM]. Existence and uniqueness of the risk-neutral probability is
therefore equivalent to existence and uniqueness of state prices.
We now show how state prices are related to the optimal solution of the
utility maximisation problem.
Theorem 6.12
Assume that X ∗ is a strictly positive solution (meaning that X ∗ (ωi ) > 0 for
all ωi ∈ Ω) of the maximisation problem (6.3). Then there is a number λ
such that
∂U ∗
πi = λ (X ) (6.10)
∂Xi
are state prices.
Proof Let us consider two functions f, g : Rn → R, defined by
f (x) = U(Vx (1)),
g(x) = Vx (0) − V.
6.2 Utility maximisation 85
The problem (6.3) is equivalent to solving
max f (x),
subject to: g(x) = 0.
Let x∗ be the solution of the problem, implying that X ∗ = Vx∗ (1). By the
method of Lagrange multipliers, there exists an α ∈ R such that
∇ f (x∗ ) − α∇g(x∗ ) = 0. (6.11)
The j-th coordinate of ∇g is equal to
n
∂g ∂ X
= xk S k (0) = Sj (0).
∂x j ∂x j k=1

Let (Vx (1))i denote the i-th coordinate of the N-dimensional vector Vx (1).
Using the chain rule we obtain
∂f ∂
(x) = U(Vx (1))
∂x j ∂x j
N
X ∂U ∂
= (Vx (1)) (Vx (1))i
i=1
∂Xi ∂x j
N
 n 
X ∂U ∂ X
= xk S k (1, ωi )

(Vx (1))
∂Xi ∂x j k=1

i=1
N
X ∂U
= (Vx (1))Sj (1, ωi ),
i=1
∂Xi

hence, since Vx∗ (1) = X ∗ ,


N
∂f ∗ X ∂U
(x ) = (X ∗ )Sj (1, ωi ).
∂x j i=1
∂X i

Taking λ = 1
α
and looking at the j-th coordinate of (6.11) gives
N
X ∂U ∗
Sj (0) = λ (X )Sj (1, ωi ).
i=1
∂Xi

Comparing with (6.6) we see that for each i ≤ N,


∂U ∗
πi = λ (X ),
∂Xi
satisfies the condition required to be a state price. 
86 Utility functions
Corollary 6.13
For the particular case of expected utility, where U(X) = E(u(X)), the state
prices take the form
πi = λu0 (X ∗ (ωi ))pi .
Proof Since we are dealing with expected utility
N
X
U (X1 , . . . , XN ) = pk u(Xk ),
k=1

so
∂U
(X1 , . . . , XN ) = u0 (Xi ) pi ,
∂Xi
hence
∂U ∗
(X ) = u0 (X ∗ (ωi ))pi ,
∂Xi
and combined with (6.10) this implies the claim. 
Theorem 6.12 can be used to find the solution of the optimisation prob-
lem. We focus on the particular case of expected utility U(X) = E(u(X)).
Theorem 6.14
Assume that U(X) = E(u(X)). If X ∗ = (X1∗ , . . . , XN∗ ) is a solution of the
problem (6.3), then, with (u0 )−1 denoting the inverse function of u0 , we ob-
tain
πi
!
Xi∗ = (u0 )−1 , (6.12)
λpi
where λ is determined by the condition
N
πi
X !
V= πi (u )0 −1
. (6.13)
i=1
λpi

Proof The assertion (6.12) follows directly from Corollary 6.13.


Since X ∗ = Vx∗ (1), by Lemma 6.9
N
X N
X
V = V (0) =
x∗ πi V (1, ωi ) =
x∗ πi X ∗ (ωi ).
i=1 i=1

Substituting (6.12) into the above equation gives (6.13). 


We observe that (6.12) and (6.13) combined, constitute of N + 1 equa-
tions with N + 1 unknowns. Thus Theorem 6.14 provides a tool for finding
6.2 Utility maximisation 87
candidates for the solution of the optimisation problem, by way of solv-
ing a system of equations. The system of equations provides a necessary
condition for the solution of (6.3). Each solution depends on the choice
of the state prices. In cases where the state prices are not uniquely deter-
mined, we can have solutions of (6.12)–(6.13) that are not solutions of the
optimisation problem.

Example 6.15
In this example we consider the case of a logarithmic utility function u(x) =
ln(x). Then u0 (x) = 1x and (u0 )−1 (y) = 1y . By (6.12) this gives
πi λpi
!
X (ωi ) = (u )
∗ 0 −1
= , (6.14)
λpi πi
and this λ is determined by (6.13) so that
N N
πi λpi
X ! X
V= πi (u0 )−1 = πi = λ. (6.15)
i=1
λpi i=1
πi
We consider a trinomial model with a single risky security with today’s
price S (0) = 100 and future prices
S = S (0) (1 + u) with probability 41 ,
 u


S (1) =  S = S (0) (1 + m) with probability 21 ,

 m
 S d = S (0) (1 + d)


with probability 1 , 4

with u = 0.1, m = 0 and d = −0.1. We consider V = 100 and for simplicity


assume that we can invest risk-free at R = 0.
From (6.6) and (6.8), state prices satisfy
S (0) = π1 S (0) (1 + u) + π2 S (0) (1 + m) + π3 S (0) (1 + d) ,
1 = π1 + π2 + π3 .
This system of equations admits infinitely many solutions:
π1 (x) = x,
x (d − u) − d
π2 (x) = ,
m−d
x (u − m) + m
π3 (x) = .
m−d
For each solution we can use (6.14) to compute X ∗ . Below we see results
88 Utility functions

5.1

4.9

4.7

0 0.1 0.2 0.3 0.4 0.6

Figure 6.1 Optimal expected utility from Example 6.15.

for a selection of choices of x:

x X ∗ (ω1 ) X ∗ (ω2 ) X ∗ (ω3 ) E(u(X ∗ ))


250
0.1 250 4
250 4.83
250
0.2 125 3
125 4.63
0.25 100 100 100 4.6
250 250
0.3 3
125 3
4.63
250 250
0.4 4
250 4
4.83

It appears that, out of the above, the X ∗ for x = 0.1 and x = 0.4 have the
highest expected utility. But X ∗ associated both with x = 0.1 and x = 0.4
is not attainable though by means of a portfolio. Only X ∗ for x = 0.25 is
attainable, by investing V risk free.
We see therefore that not all solutions of (6.12)–(6.13) need to be solu-
tions of the optimisation problem. In fact, only the solution with the small-
est expected utility turns out to be feasible (see Figure 6.1).

Exercise 6.4 Prove that X ∗ = 100 is the solution to the problem


posed in Example 6.15.

In Example 6.15 the solution of the optimisation problem turned out to


have the smallest utility amongst the solutions of (6.12)–(6.13). We now
show that this should not be a surprise. First we introduce some notation
and an auxiliary lemma.
6.2 Utility maximisation 89
For a fixed state price vector π = (π1 , . . . , πN ) we use the following
notation:
X(π) = {X ∈ RN | X > 0, πT X = V}.

Lemma 6.16
If Xπ∗ = (Xπ,1

, . . . Xπ,N

) ∈ X(π) is a solution of
max{E(u(X)) : X ∈ X(π)}
then there exists a λ such that
πi
!

Xπ,i = (u0 )−1 ,
λpi
N
πi
X !
V= πi (u0 )−1 .
i=1
λpi

Proof The claim follows from the method of Lagrange multipliers (The-
orem 3.3), taking
N
X
f (X1 , . . . , XN ) = pi u(Xi )
i=1

and
N
X
g(X1 , . . . , XN ) = πi Xi − V,
i=1

and is left as an exercise. 

Exercise 6.5 Prove Lemma 6.16.

Theorem 6.17
Assume that U(X) = E(u(X)). Let Π denote the set of all state price vec-
tors. If the model admits a strictly positive solution X ∗ of the optimisation
problem (6.3), then
E(u(X ∗ )) = min E(u(Xπ∗ )).
π∈Π

Proof By Lemma 6.9, for any π ∈ Π


n
X
πT X ∗ = π j X ∗j = V
j=1
90 Utility functions
shows that X ∗ ∈ X(π), hence

E(u(X ∗ )) ≤ max E(u(X)) = E(u(Xπ∗ )),


X∈X(π)

and therefore
E(u(X ∗ )) ≤ min E(u(Xπ∗ )).
π∈Π

To obtain the inequality in the opposite direction, let v = (v1 , . . . , vN ) be


the state price vector from Theorem 6.12, i.e.
∂U ∗
vi = λ (X ).
∂Xi
By Corollary 6.13 and Theorem 6.14

vi = λu0 (Xi∗ )pi , (6.16)

where λ is chosen to satisfy


N !
X vi
V= vi (u0 )−1 .
i=1
λpi

By Lemma 6.16 we know that


!
vi

Xv,i = (u0 )−1 .
λpi

Substituting (6.16) into the above we see that Xv,i = Xi∗ , hence

E(u(X ∗ )) = E(u(Xv∗ )) ≥ min E(u(Xπ∗ )),


π∈Π

which concludes our proof. 

Theorem 6.17 gives the following recipe for finding the optimal solution:
• find the family of state price vectors Π;
• using (6.12)–(6.13) for each π ∈ Π compute Xπ∗ ;
• the Xπ∗ with the smallest expected utility is the candidate for the solution.
In an arbitrage free and complete model, state prices are unique, in which
case finding the optimal solution turns out to be straightforward. In our
setting the model is complete if the matrix S(1) defined in (6.1) is square
(i.e. n = N) and invertible. Then, from (6.7), we obtain the formula for the
state price vector
πT = S(0) (S(1))−1 . (6.17)
6.2 Utility maximisation 91
By Theorem 6.7 we know that the solution to the optimisation problem ex-
ists. The state price vector π is uniquely determined, meaning that (6.12)–
(6.13) admits a unique solution X ∗ , which is the solution of the optimisation
problem. Let us denote by x∗ the strategy which gives the optimal utility,
X ∗ = Vx∗ (1).
Using (6.2) we can compute
x∗ = (S(1))−1 X ∗ . (6.18)

Example 6.18
As in Example 6.15, let us consider the problem of maximising the ex-
pected logarithmic utility. In addition to the risk-free investment and the
risky asset from Example 6.15, let us also consider a second risky asset.
We assume that
h i
S(0) = 1 100 200 ,
 
 1 110 200 
S(1) =  1 100 220  .
 
 
1 90 180
The state prices can be computed as
h i
πT = S(0) (S(1))−1 = 1
3
1
3
1
3
.
Let us assume that we invest V = 100. From the state prices we can com-
pute the optimal consumption using (6.14)–(6.15). Using (6.18) the optimal
strategy, we obtain
   
 75   −150 
X ∗ =  150  , x∗ =  −2.5  .
   
   
75 2.5

Exercise 6.6 Consider a trinomial model Ω = {ω1 , ω2 , ω3 }, where


1 1 1
P({ω1 }) = , P({ω2 }) = , P({ω3 }) = ,
4 2 4
92 Utility functions
with a risk-free security and a single risky asset:
h i
S(0) = 1 100 ,
 
 1.02 120 
S(1) =  1.02 110  .
 
 
1.02 90
Find the optimal strategy, assuming that the aim of the investor is to
maximise the expected utility, for the utility function u(x) = −e−ax
with a = 0.01.

Exercise 6.7 Consider the trinomial model Ω = {ω1 , ω2 , ω3 }, with


the same probabilities as in Exercise 6.6. Consider a risk-free security
and two risky assets:
h i
S(0) = 1 100 200 ,
 
 1.02 120 180 
S(1) =  1.02 110 220  .
 
 
1.02 90 200
Find the optimal strategy, assuming that the investor uses the same
utility as in Exercise 6.6.

6.3 Utilities and CAPM


Our next step is to explore the relationship between utility maximisation
and the Capital Asset Pricing Model.
Suppose we have L investors, each aiming to maximise their own ex-
pected utility, with utility functions of the form
1
ul (x) = al x − bl x2 ,
2
where al > 0, bl > 0 for l = 1, . . . , L. This reflects different investment
preferences for different investors. The utility function does not have to be
the same for all investors.
We denote by x∗l the optimal portfolio that will be chosen by investor l.
6.3 Utilities and CAPM 93
The present and future total values of the market are
L
X L
X
M(0) = Vx∗l (0), M(1) = Vx∗l (1). (6.19)
l=1 l=1

This is the total wealth of the investors in the market at times 0 and 1. We
denote the market return by
M(1) − M(0)
Km = , (6.20)
M(0)
and the risk-free return by R.
Theorem 6.19
Assume that M(0) , 0 and Var(Km ) , 0. Then the expected return on each
asset satisfies
E(K j ) = R + β j (E(Km ) − R) ,
for j = 1, . . . , n, where
Cov(K j , Km )
βj = .
Var(Km )
Proof Let the risk-free asset be designated by index j = 1, so that K1 = R.
For an investor with initial wealth V and portfolio x we have
Vx (1) = V(1 + Kw )
 n

 X 
= V 1 + w j K j 
j=1
  n
 n

  X  X 
= V 1 + 1 −
  w j  R +
 w j K j  . (6.21)
j=2 j=2

If x∗l is the optimal portfolio for investor l, and the initial wealth of this
investor is Vl = Vx∗l (0), then by (6.21), for j = 2, . . . , n, the first-order
conditions for a maximum give
∂ h  i h  i
0= E ul Vx∗l (1) = Vl E u0l (Vx∗l (1)) K j − R . (6.22)
∂w j
We use the relation Cov(X, Y) = E [XY] − E [X] E [Y], which holds for any
random variables X, Y, as Ω is finite:
  h  i
Cov u0l (Vx∗l (1)), K j − R = E u0l (Vx∗l (1)) K j − R
h i h i
−E u0l (Vx∗l (1)) E K j − R .
94 Utility functions
Comparing with (6.22), it follows that
h i h i  
E u0l (Vx∗l (1)) E K j − R = −Cov u0l (Vx∗l (1)), K j − R .

Since u0l (x) = al − bl x, the above can be written as


 h i  h i   
al − bl E Vx∗l (1) E K j − R = −Cov al − bl Vx∗l (1), K j − R
 
= bl Cov Vx∗l (1), K j ,

hence
al h i!  h i   
− E Vx∗l (1) E K j − R = Cov Vx∗l (1), K j .
bl
Taking
L i!
X al h
c= − E Vxl (1) ,

l=1
bl

summation over l gives


 h i   
c E K j − R = l=1
PL
Cov Vx∗l (1), K j
 
= Cov M(1), K j (by (6.19)) (6.23)
 
= M(0)Cov Km , K j . (by (6.20))

Let m = (m1 , . . . , mn ) denote the weights of the market portfolio, then

c (E [Km ] − R) = c (E [m1 K1 + · · · + mn Kn ] − R)
Xn  h i 
= cm j E K j − R
j=1
n
X  
= m j M(0)Cov Km , K j (by (6.23))
j=1
= M(0)Cov(Km , Km )
= M(0)Var(Km ).

Let us observe that since M(0) , 0 and Var(Km ) , 0, the above equality
implies that c , 0. As a result, combining the above with (6.23),
h i  
E Kj − R Cov Km , K j
= = β j,
E [Km ] − R Var(Km )
which completes the proof. 
6.4 Risk aversion 95
Above we have shown that we can connect the mean-variance criterion
for optimality of portfolios with the optimal expected utility if we assume
that investors use quadratic utility functions. However, an arbitrary utility
function can be approximated by a quadratic utility, if we consider its first
three Taylor terms. Thus the CAPM theorem can be considered as an ap-
proximation for the optimal portfolio choice for arbitrary utility functions.

6.4 Risk aversion


An investor is said to be risk averse if

u(E(X)) ≥ E(u(X)) for all X ∈ FCS .

An intuitive interpretation of this inequality is that both sides represent an


expected utility. On the left we have sure consumption available at the level
E(X), on the right we are faced with an uncertain wealth X. The inequality
says that the risk-averse investor will always choose the ‘sure thing’. We
say similarly that the investor is risk neutral if

u(E(X)) = E(u(X)) for all X ∈ FCS .

Exercise 6.8 Show that risk aversion is equivalent to u being con-


cave and illustrate the condition graphically.

If the investor is risk averse, we define the risk premium as a function


γ : FCS → R such that

u(E(X) − γ(X)) = E(u(X)).

The number E(X) − γ(X) is called the certainty equivalent of X. We see


that an investor is indifferent between two investments X, Y that have the
same certainty equivalent:

E(u(X)) = u(E(X) − γ(X)) = u(E(Y) − γ(Y)) = E(u(Y)).

We shall now find an approximate formula for γ. Assume that X takes


values X1 , . . . , Xn (note that n ≤ N) and that

P(X = Xi ) = pi .
96 Utility functions
Taking the second-order Taylor expansion at Xi of u around m = E(X) we
obtain
1
u(Xi ) ≈ u(m) + u0 (m)(Xi − m) + u00 (m)(Xi − m)2 .
2
Multiplying by pi and summing we get
1
E(u(X)) ≈ u(m) + u0 (m)E(X − m) + u00 (m)E(X − m)2 (6.24)
2
1 00
= u(m) + u (m)Var(X).
2
Taking the first-order Taylor expansion of u at m − γ(X) around m gives

u(m − γ(X)) ≈ u(m) − u0 (m)γ(X),

so (by the definition of the risk premium)

E(u(X)) = u(m − γ(X)) ≈ u(m) − u0 (m)γ(X). (6.25)

Comparing the right-hand sides of (6.24) and (6.25) we get


1
u(m) + u00 (m)Var(X) ≈ u(m) − u0 (m)γ(X),
2
which yields
1 u00 (E(X))
γ(X) ≈ − Var(X).
2 u0 (E(X))
The number
u00 (E(X))
ARA = −
u0 (E(X))
is called the absolute risk aversion coefficient.
The above discussion was formulated in terms of wealth. We can refor-
mulate the result in terms of returns. Let X = V(1 + K), where V is the
initial investment and K is the return with expectation µ and variance σ2 .
Using the fact that

E(X) = E(V(1 + K)) = V(1 + µ), (6.26)


Var(X) = Var(V(1 + K)) = V σ , 2 2

the risk premium is approximated using


V 2 u00 (V(1 + µ)) 2
γ(X) ≈ − σ. (6.27)
2 u0 (V(1 + µ))
6.4 Risk aversion 97
An investor is indifferent to the choice between securities with the same
certainty equivalent. Looking at the (σ, µ)-plane, by (6.26)–(6.27), the cer-
tainty equivalent can be approximated in terms of an indifference curve
V 2 u00 (V(1 + µ)) 2
E(X) − γ(X) ≈ V(1 + µ) − σ.
2 u0 (V(1 + µ))

Example 6.20
Assume that an investor has an exponential utility u(x) = −e−ax . Then
u0 (x) = ae−ax , u00 (x) = −a2 e−ax ,
which means that absolute risk aversion coefficient is constant
u00 (E(X))
ARA = − 0 = a.
u (E(X))
The certainty equivalent of X is then
 aV 2 
E(X) − γ(X) = V µ − σ + V. (6.28)
2
This yields the same type of indifference curve as considered in Example
2.13.

Exercise 6.9 Based on the data from Exercise 6.7 compute µ1 , µ2 ,


σ1 , σ2 and ρ12 . Find the expected return and standard deviation of
the market portfolio. Consider indifference curves given by (6.28).
Following the method from Example 2.13, find the point on the (σ, µ)-
plane, which has the highest certainty equivalent.

Exercise 6.10 Find the weights of the portfolio computed in Exer-


cise 6.9. Based on these compute the strategy which has the highest
certainty equivalent. Compare the result with the solution of Exercise
6.7, where we have found the optimal strategy which maximises the
expected utility. Explain why the two are not the same.
7
Value at Risk

7.1 Quantiles
7.2 Measuring downside risk
7.3 Computing VaR: examples
7.4 VaR in the Black–Scholes model
7.5 Proofs

Until now we have focused our attention on variance, or equivalently, stan-


dard deviation of the return, as a tool for measuring risk. The standard
deviation measures the spread of the random future return from its mean.
In portfolio selection we seek to minimise the variance while maximising
the return. However, an investor, seeking to measure the risk inherent in
an asset he holds, is naturally more concerned to place a bound on his po-
tential losses, while remaining relaxed about possible high levels of profit.
Thus one looks for risk measures which focus on the downside risk, that
is, measures concerned with the lower tail of the distribution of the return.
Variance and standard deviation are symmetric, so they are not good can-
didates in this search.
In looking for quantitative measures of the overall risk in a portfolio, we
seek a statistic which can be applied universally, enabling us to compare
the risks of different types of risky portfolio. Ideally, we look for a number
(or set of numbers) that expresses the potential loss with a given level of
confidence, enabling the risk manager to adjudge the risk as acceptable or
not.
In the wake of spectacular financial collapses in the early 1990s at Bar-
ings Bank and Orange County, Value at Risk (henceforth abbreviated as
VaR) became a standard benchmark for measuring financial risk. It has
the advantage of relative simplicity and ease of use when sufficient data
are available. Its principal drawback is that it does not provide information

98
7.1 Quantiles 99
about the potential impact of extreme (i.e. highly unlikely) events. In this
chapter we explore this popular risk measure. Our focus is on its compu-
tation, for discrete, continuous and mixed distributions, and this will high-
light a further defect, showing that VaR for a diversified position can be
higher than for investment in a single asset.
In the final section we give a detailed analysis, in a Black–Scholes con-
text, of hedging to minimise VaR with the judicious use of European put
options.

7.1 Quantiles
An investor holding an asset whose future value is uncertain may wish to
determine whether his discounted gain X on an investment has at least 95%
probability of remaining above a certain (usually negative) level. Value at
Risk at 5% answers this question by specifying the minimum loss incurred
in the worst 5% of possible outcomes. Its calculation is therefore closely
tied to the values of the distribution function F X of X. This leads us to
examine the so-called quantiles of F X more closely.
We begin with a simple example.

Example 7.1
Consider a two step binomial model with stock prices
121
%
110
% &
100 99
& %
90
&
81
Assume that the probability p of the price going up in a single step is
p = 0.8. In this example we neglect the time value of money and compute
the gain after the second step of buying a single share of stock as
X = S (2) − S (0),
100 Value at Risk

Figure 7.1 The upper and lower quantiles for various distribution functions.

which gives
with probability p2 = 0.64,


 21
X= −1 with probability 2p(1 − p) = 0.32,


 −19 with probability (1 − p)2 = 0.04.

We can see that the probability that our investment will lead to a loss
L = −X < 19 is
P(L < 19) = P(X > −19) = 0.96.
This means that with with probability 96% we will lose no more than 1. If
we agree, for instance, to ignore the worst 5% of potential outcomes, our
‘worst-case scenario’ would be a loss of 1. However, if we are only willing
to exclude the worst 2.5%, for example, the loss of 19 should be taken into
account.

An outcome at a given probability can be expressed using quantiles.


Let (Ω, F , P) be a probability space and let X : Ω → R be a random
variable. The cumulative distribution function F X : R → [0, 1], defined by
F X (x) = P(X ≤ x) is right-continuous and non-decreasing (see [PF] for
details).
7.1 Quantiles 101

↵ = 0.1
↵ = 0.025 ↵ = 0.04

19 1 21

Figure 7.2 The plot of the distribution function from Example 7.1.

Definition 7.2
For α ∈ (0, 1) the number

qα (X) = inf{x : α < F X (x)}, (7.1)

is called the upper α-quantile of X. The number

qα (X) = inf{x : α ≤ F X (x)}, (7.2)

is called the lower α-quantile of X. Any

q ∈ [qα (X), qα (X)],

is called an α-quantile of X.

The definition is best understood when looking at the graph of the cu-
mulative distribution function. In Figure 7.1 we can see that the upper and
the lower quantiles differ when the plot of F X (x) becomes flat at the value
F X (x) = α, otherwise they are equal.

Example 7.3
For X from Example 7.1 we can compute the upper and the lower α-
quantiles, for α ∈ {0.025, 0.04, 0.1}, as (see Figure 7.2)
q0.025 (X) = −19, q0.025 (X) = −19,
q0.04 (X) = −1, q0.04 (X) = −19,
q0.1 (X) = −1, q0.1 (X) = −1.

We list some basic properties of quantiles. The proofs are all elementary,
102 Value at Risk
but we defer the more technical parts to the end of the chapter to avoid
disturbing the flow of development.
Proposition 7.4
Let X, Y be random variables.
(i) X ≥ Y implies qα (X) ≥ qα (Y).
(ii) For any b ∈ R, qα (X + b) = qα (X) + b.
(iii) For b > 0, qα (bX) = bqα (X).
(iv) qα (−X) = −q1−α (X).
Proof See page 120. 
Lemma 7.5
If F X (x) is continuous and strictly increasing then
qα (X) = F X−1 (α).
Proof The given conditions on F X ensure that it is invertible, the in-
verse function α → F −1 (α) is continuous, and α < F X (x) is equivalent
to F X−1 (α) < x. This gives
qα (X) = inf{x : α < F X (x)} = inf{x : F X−1 (α) < x} = F X−1 (α),
which concludes our proof. 
Lemma 7.6
Let X be a random variable. If f : R → R is right-continuous and non-
decreasing then
qα ( f (X)) = f (qα (X)).
Proof See page 122. 

Exercise 7.1 Formulate and prove mirror results to Proposition 7.4


and Lemmas 7.5 and 7.6 for lower α-quantiles.

7.2 Measuring downside risk


We work in a single-step financial market model in which we invest at time
t = 0 and terminate our investment at t = T. We denote by X the discounted
value of the investor’s position at time T .
7.2 Measuring downside risk 103

Figure 7.3 −VaRα (X) is the upper α-quantile for X.

Definition 7.7
For α in (0, 1), we define the Value at Risk (VaR) of X, at confidence level
1 − α, as (see Figure 7.3)

VaRα (X) = −qα (X) = − inf{x : α < F X (x)}.

To gain some intuition, let us consider the following example.

Example 7.8
Let X be as in Example 7.1. By looking at the distribution function F X (x)
(see Figure 7.2) we can see that
VaR0.04 (X) = 1,
VaR0.025 (X) = 19.

Let us observe that since X denotes the gain from an investment, −X


denotes the loss. We can express VaR in terms of the loss as follows:

VaRα (X) = −qα (X)


= q1−α (−X) (by (iv) from Proposition 7.4)
= inf{x : 1 − α ≤ P(−X ≤ x)}
= inf{x : P(x < −X) ≤ α}.

In loose terms, this means that the probability of the loss exceeding VaRα
is no greater than α. In other words, at confidence level 1 − α, our loss is
no worse than VaRα .
Simple algebraic properties of VaR follow from those we proved for the
upper quantile:
104 Value at Risk
Proposition 7.9
Let X, Y be random variables.
(i) X ≥ Y implies VaRα (X) ≤ VaRα (Y),
(ii) For any a ∈ R, VaRα (X + a) = VaRα (X) − a,
(iii) For any a ≥ 0, VaRα (aX) = aVaRα (X).
Proof The proof follows from the properties of quantiles proved in Propo-
sition 7.4, and is left as an exercise. 

Exercise 7.2 Prove Proposition 7.9.

7.3 Computing VaR: examples


To familiarise ourselves with the definition of VaR let us consider a few
simple examples.
We shall assume that at time zero we invest V(0) to receive V(T ) at time
T . We use X to denote the discounted gain at time T
X = e−rT V(T ) − V(0),
where r is the risk-free rate for continuous compounding.

Example 7.10
Suppose that we invest V(0) risk-free. Then V(T ) = erT V(0), giving
X = e−rT V(T ) − V(0) = 0.
The distribution function of X is then
(
1 for x ≥ 0,
F X (x) =
0 for x < 0.
For any α ∈ (0, 1), qα (X) = 0, which gives
VaRα (X) = −qα (X) = 0.
7.3 Computing VaR: examples 105

Exercise 7.3 For the leveraged stockholding described in Exercise


1.5, compare the VaR of the discounted gain for the leveraged position
with that of the stock.

Example 7.11
Consider
(
−20 with probability 0.025,
X= (7.3)
−10 with probability 0.025,
and P(X > 0) = 0.95. For x < 0


 0 x ∈ (−∞, −20),
F X (x) = 

0.025 x ∈ [−20, −10),



 0.05 x ∈ [−10, 0).
Taking α = 0.05 we have
VaR0.05 (X) = −q0.05 (X) = 10.
For any α < 0.05,
VaRα (X) = −qα (X) = 20,
which demonstrates that VaRα can be sensitive to the choice of α.
Let us now change the value −20 in (7.3) to −2000. The VaR0.05 still
remains equal to 10! This illustrates that VaR does not take into consider-
ation unlikely events (i.e. with probability below the chosen threshold α),
whatever the severity of their outcome. This is an undesirable feature in a
risk measure.

Example 7.12
Consider two independent investments X1 , X2 with gains
(
0 with probability p,
Xi =
1 with probability 1 − p,
106 Value at Risk

for i = 1, 2. We can think of these as corporate bonds with the same price
and maturity date, of two independent companies that each have a proba-
bility of default with zero recovery equal to p.
If p < α then
VaRα (X1 ) = VaRα (X2 ) = 0.
If, instead, we buy half a unit of each of the two bonds, then our gain will
be equal to
with probability p2 ,

0
1 1


X1 + X2 = 

 1
with probability 2p(1 − p),
2 2  12


with probability (1 − p)2 .
If we choose α ∈ (p, p2 + 2p(1 − p)) then
!
1
F 12 X1 + 12 X2 = p2 + 2p(1 − p) > α
2
hence !
α 1 1 1
VaR X1 + X2 = .
2 2 2
We can see that
!
1 1
VaRα X1 + X2 > max {VaRα (X1 ), VaRα (X2 )} ,
2 2
which means that the risk of a diversified position, as measured by VaR,
is greater than the risk of investing all our funds in a single bond. This
runs counter to the principle that diversification should reduce risk, and
therefore illustrates a second serious drawback in using VaR to measure
risk. In the next chapter we will consider risk measures designed to remedy
these defects.

From examples explored so far we see that finding VaR in the case of
discrete distributions is an easy task. This is summarised in the following
lemma.

Lemma 7.13
Assume that X is a discrete random variable with P(X = xi ) = pi , pi =
PN
i=1
1, and x1 < x2 < · · · < xN . Then

VaRα (X) = −xkα ,


7.3 Computing VaR: examples 107
pi ≤ α.
P α −1
where kα ∈ N is the largest number such that ki=1
Proof Since X has discrete distribution and x1 < x2 < . . . < xN we can
see that
Xk
P(X ≤ xk ) = pi . (7.4)
i=1

We shall also use the fact that


X k k−1
X
min{k : α < pi } = max{k : pi ≤ α}. (7.5)
i=1 i=1

This gives
qα (X) = inf{x : α < P(X ≤ x)} (by (7.1))
= min{xk : α < P(X ≤ xk )} (since X ∈ {x1 , . . . , xN })
= min{xk : α < ki=1 pi }
P
(by (7.4))
= max{xk : k−1
i=1 pi ≤ α}
P
(by (7.5))
= xkα (by definition of kα ).
This concludes our proof, since VaRα (X) = −qα (X). 
We now turn to the computation of VaR for random variables with con-
tinuous distributions. For a standard normal random variable Z, with distri-
z2
Rx
bution function N(x) = √2π −∞ e dz, Lemma 7.5 yields
1 − 2

VaRα (Z) = −N −1 (α)


for any α ∈ (0, 1). We use this in the next example.

Example 7.14
Suppose that today’s price of the stock is equal to S (0). Assume also that
the price of the stock at time T is equal to S (T ) = S (0)em+σZ , with Z having
standard normal distribution N(0, 1). We shall compute VaRα (X) for
X = e−rT S (T ) − S (0).
By Lemma 7.5, qα (Z) = N −1 (α), where N is the standard normal cumu-
lative distribution function. Observing that
X = f (Z),
108 Value at Risk

where
f (ζ) = e−rT S (0)em+σζ − S (0)
is an increasing function,
VaRα (X) = −qα ( f (Z))
= − f (qα (Z)) (by Lemma 7.6)
(7.6)
= − f (N−1 (α)) (by Lemma 7.5)
= S (0) 1 − em−rT +σN (α) .
−1


In Example 7.14 we have exploited the fact that X was a non-decreasing


function of a random variable with standard normal distribution, for which
quantiles are easy to compute. This idea can be formulated in more general
terms as follows.

Lemma 7.15
Let f : R → R be a non-decreasing right-continuous function. Then

VaRα ( f (X)) = − f (qα (X)).

Proof By Lemma 7.6

VaRα ( f (X)) = −qα ( f (X)) = − f (qα (X)),

which concludes our proof. 

We now show that VaR can be computed using Monte Carlo simulations.
First we need some auxiliary results.
P
For a sequence of random variables {Yi }∞i=1 we write Yi → Y to denote
that Yi converges to Y in probability. (See [PF] for details of the standard
results and terminology from probability we use here.)

Lemma 7.16
Let X1 , X2 , . . . be a sequence of i.i.d. random variables, Xi : Ω → R, with
the same distribution as X. Let x ∈ R be fixed. If we take a sequence of
random variables F N (x) : Ω → R defined as
N
1 X
F N (x) = 1{X ≤x} ,
N i=1 i
P
then F N (x) → F X (x).
7.4 VaR in the Black–Scholes model 109
Proof Let us introduce the following notation: Yi = 1{Xi ≤x} and Y = 1{X≤x} .
PN P
By the weak law of large numbers (see [PF]), N1 i=1 Yi → E(Y), hence
N
1 X P
F N (x) = Yi → E(Y) = E 1{X≤x} = P (X ≤ x) = F X (x),

N i=1

as required. 
Suppose now that X̂1 , . . . , X̂N are results of simulations following the
same distribution as X and let
N
1 X
F̂ N (x) = 1 .
N i=1 {X̂i ≤x}

By Lemma 7.16, for any x ∈ R,


F X (x) = lim F̂ N (x). (7.7)
N→∞

Let YN denote the discrete random variable with distribution


1
P(YN = X̂i ) = for i = 1, . . . , N.
N
The distribution function FYN is equal to F̂ N . By (7.7), taking sufficiently
large N, VaRα (X) can be approximated using VaRα (YN ),
VaRα (X) ≈ VaRα (YN ). (7.8)
The VaRα (YN ) can easily be computed using Lemma 7.13. We shall im-
plement this method in the following section, to compute VaR in the n-
dimensional Black–Scholes market (see Example 7.24).

7.4 VaR in the Black–Scholes model


In the Black–Scholes model we have a single stock and a risk-free asset.
The time zero price of the stock is S (0) > 0. The stock price at time T is
given by

2
 √
µ− σ2 T +σ T Z
S (T ) = S (0)e , (7.9)
where µ and σ are positive real parameters, and Z is a random variable with
standard normal distribution N(0, 1). The parameter µ represents the drift
and the parameter σ represents the volatility of the stock. The risk-free rate
110 Value at Risk
is constant and equal to r > 0, with continuous compounding, meaning that
the time T price of the risk-free asset is
A(T ) = A(0)erT . (7.10)
For simplicity, we assume that
A(0) = 1.
A European put option with strike price K and maturity T has payoff
(K − S (T ))+ = max(K − S (T ), 0),
and costs
P(r, T, K, S (0), σ) = Ke−rT N(−d− ) − S (0)N(−d+ ), (7.11)
where
   
ln S K(0) + r + 12 σ2 T ln S K(0) + r − 12 σ2 T
d+ = √ , d− = √ , (7.12)
σ T σ T
and N is the standard normal cumulative distribution function. For more
details on the Black–Scholes model see [BSM].
Let H(t) denote the value of a put option at time t = 0, T
H(0) = P(r, T, K, S (0), σ),
H(T ) = (K − S (T ))+ . (7.13)
We start with a simple lemma.
Lemma 7.17
For S (T ) and H(T ) given by (7.9) and (7.13), respectively,

2
 √
α µ− σ2 T +σ T N −1 (α)
q (S (T )) = S (0)e , (7.14)
α α +
q (−H(T )) = − (K − q (S (T ))) . (7.15)
2 √
(µ− σ2
Proof By Lemma 7.5, qα (Z) = N −1 (α). Since z 7−→ S (0)e )T +σ T z
is
an increasing function, (7.14) follows from Lemma 7.6.
Similarly, since ζ 7−→ −(K − ζ)+ is a non-decreasing function, (7.15)
also follows from Lemma 7.6. 
Assume that we buy a single share of stock. The discounted gain from
this investment is
X = e−rT S (T ) − S (0).
7.4 VaR in the Black–Scholes model 111
By Lemma 7.15 we can see that
VaRα (X) = S (0) − e−rT qα (S (T )). (7.16)

Exercise 7.4 Compute VaR5% (X) for an investment in a stock with


parameters S (0) = 100, µ = 10%, σ = 0.2, r = 3% and T = 1.

We now consider an investment where at time zero we buy x shares of


stock and y units of the risk-free asset. For t = 0, T we use V(x,y) (t) to denote
the value of the portfolio at time t
V(x,y) (t) = xS (t) + yA(t).
We use X(x,y) to denote the discounted gain
X(x,y) = e−rT V(x,y) (T ) − V(x,y) (0).
Lemma 7.18
If x ≥ 0 then
VaRα X(x,y) = V(x,y) (0) − xe−rT qα (S (T )) − y.
 
(7.17)
Proof Since x ≥ 0, the discounted gain can be expressed as a non-decreas-
ing function of S (T ) :
X(x,y) = f (S (T )),
with
f (ζ) = e−rT (xζ + yA(T )) − V(x,y) (0)
= e−rT xζ + y − V(x,y) (0),
hence (7.17) follows from Lemma 7.15. 
Choosing any x ∈ (0, 1) and y = (1 − x)S (0) we can see that the initial
value of the investment is
V(x,y) (0) = S (0).
Let VaRα (X) be the Value at Risk for the investment in a single unit of
stock, given in (7.16). Then
VaRα X(x,y) = V(x,y) (0) − xe−rT qα (S (T )) − y
 
(from (7.17))
−rT α
= xS (0) − xe q (S (T )) (V(x,y) (0) = xS (0) + y)
= xVaRα (X) (from (7.16))
< VaRα (X).
112 Value at Risk
This means that diversifying an investment between the stock and the risk-
free asset reduces VaR (which is hardly a surprise!).

Exercise 7.5 Derive the formula for E(X(x,y) ). Taking the values S (0),
µ, σ, r and T as in Exercise 7.4, plot the set
VaRα (X(x,y) ), E(X(x,y) ) : x ∈ [0, 1], y = (1 − x)S (0) .
n  o

Exercise 7.6 Consider buying x > 0 shares of stock and entering


into θ ∈ [0, x] forward contracts to sell the stock at time T for the
forward price F = S (0)erT . Let
X(x,θ) = xe−rT S (T ) + θe−rT (F − S (T )) − xS (0)
denote the discounted gain of such an investment. Derive formulae for
E X(x,θ) and VaRα X(x,θ) .
 
Taking the values S (0), µ, σ, r and T as in Exercise 7.4, plot the set
VaRα (X(x,θ) ), E(X(x,θ) ) : x = 1, θ ∈ [0, 1] ,
 

and compare with the plot obtained in Exercise 7.5. Which is more
efficient, reducing VaR with bonds or with forward contracts?

Another natural idea to reduce VaR is to buy European put options. By


doing so one can protect against undesirable scenarios, while leaving one-
self open to the positive outcomes. Assume that at time zero we buy x
units of stock and z put options with strike price K. The value of such an
investment is
V(x,z) (t) = xS (t) + zH(t),

and the discounted gain is

X(x,z) = e−rT V(x,z) (T ) − V(x,z) (0)


= e−rT xS (T ) + z (K − S (T ))+ − V(x,z) (0).


Lemma 7.19
If 0 < z ≤ x then

VaRα X(x,z) = V(x,z) (0) − e−rT xqα (S (T )) + z (K − qα (S (T )))+ . (7.18)


 
7.4 VaR in the Black–Scholes model 113

23  

22  

21  

20  
80   85 90   95   100  

Figure 7.4 VaR5% X(x,z(K)) for different choices of K, for parameters V0 =



S (0) = 100, µ = 0.1, σ = 0.2, r = 0.03, T = 1 and x = 0.99.

Proof Since 0 < z ≤ x, we see that X(x,z) can be expressed as a non-


decreasing function of S (T ),
X(x,z) = f (S (T )),
with
f (ζ) = e−rT xζ + z (K − ζ)+ − V(x,z) (0).


By Lemma 7.15
VaRα X(x,z) = − f (qα (S (T )))


= e−rT −xqα (S (T )) − z (K − qα (S (T )))+ + V(x,z) (0),




which combined with (7.15) gives (7.18). 

Example 7.20
Assume that we want to invest V0 at time zero and buy x shares of stock.
In order to have V(x,z) (0) = V0 we need to buy
V0 − xS (0)
z = z(K) =
P(r, T, K, S (0), σ)
put options. Depending on the choice of the strike price K we obtain dif-
ferent values of
VaRα X(x,z(K)) = V0 − e−rT xqα (S (T )) + z(K) (K − qα (S (T )))+
 

(see Figure 7.4).


The choice of a high strike price makes the term (K − qα (S (T )))+ large,
but since options with a high strike prices are expensive, their number
114 Value at Risk

z(K) is small. On the other hand, if we choose a low strike price, then
we can buy a larger number z(K) of options, but each offers lower payoff
(K − qα (S (T )))+ . An optimal choice of the strike price K lies somewhere
between these extremes (see Figure 7.4).

Exercise 7.7 Let V0 = S (0) = 100, µ = 10%, σ = 0.2, r = 3%,


T = 1 and x = 0.99. Find K which minimises VaRα X(x,z(K)) .


Usually we do not have full freedom of choice for the strike price of a
put option and need to choose between options which are available on the
market. Let us assume that we can invest in n put options with strike prices
K1 , . . . , Kn and maturity T. We denote by Hi (t) the payoff of a put option
with strike price Ki ; in particular

Hi (0) = P(r, T, Ki , S (0), σ),


Hi (T ) = (Ki − S (T ))+ .

Assume that we buy x shares of stock and zi put options with strike
prices Ki , for i = 1, . . . , n. Let z, 1 and H(t) for t = 0, T be vectors in Rn
defined as
     
 z1   1   H1 (t) 
z =  ...  , 1 =  ...  , H(t) =  ...  .
     
     
zn 1 Hn (t)

The value of our investment at time t is

V(x,z) (t) = xS (t) + zT H(t).

We show how to compute VaR for

X(x,z) = e−rT V(x,z) (T ) − V(x,z) (0).

Proposition 7.21
n
If zi ≥ 0, for i = 1, . . . , n, and zi = zT 1 ≤ x, then
P
i=1

VaRα X(x,z) = V(x,z) (0) − e−rT xqα (S (T )) − zT qα (−H(T )) ,


  
(7.19)
7.4 VaR in the Black–Scholes model 115
where
 (K1 − qα (S (T )))+
 

qα (−H(T )) = − 
 ..  .

. (7.20)
(Kn − qα (S (T )))+
 

Proof The formula (7.20) follows from Lemma 7.17.


Since zT 1 ≤ x, the function
n
 
X
+
ζ 7−→ e  xζ + zi (Ki − ζ)  − V(x,z) (0)

−rT 


i=1

is non-decreasing, which by Lemma 7.6 implies that


n
 
+
X
α
 α α
VaR X(x,z) = V(x,z) (0) − e  xq (S (T )) + zi (Ki − q (S (T )))  ,
 −rT 


i=1

and this is (7.19). 


From now on we shall assume that x is fixed and investigate how to
minimise VaRα X(x,z) by choosing z. We assume that we have V0 at our

disposal for investment and hedging purposes. This means that we spend
c = V0 − xS (0)
on put options. We assume that we do not take short positions in stock or
puts, and that the number of options does not exceed the number of shares
of stock in our portfolio. These restrictions are imposed by common sense.
(Later in this chapter we give an example of what might happen if these
are violated.) Under such assumptions, by (7.19), minimising VaRα X(x,z)

is equivalent to the following problem:
min zT qα (−H(T )),
subject to: zT H(0) = c,
(7.21)
zT 1 ≤ x,
z0 , . . . , zn ≥ 0.
Since H(0) and qα (−H(T )) are fixed vectors in Rn , (7.21) is a typical linear
programming problem, which can be solved numerically.

Example 7.22
Consider the Black–Scholes model with parameters S (0) = 100, µ = 10%,
σ = 0.2 and r = 3%. Assume that we want to invest V0 = 1000 in stock
and put options with strike prices K1 = 75, K2 = 90, K3 = 110 with
116 Value at Risk

expiry T = 1. We shall solve the problem (7.21) for α = 0.05, considering


c = 0, 10, 30, 50 and 80.
We compute the prices of the put options using (7.11)
 
 0.406 
H(0) =  2.769  .
 
 
12.042
Using the fact that N −1 (0.05) = −1.645 we compute
σ2

qα (S (T )) = S (0)e(µ− 2 )T +σ T N −1 (α)
= 77.96
and
 
 0 
qα (−H(T )) =  −12.04  .
 
−32.04
 

The numerical solutions of (7.21) are given in the table below.

c x z1 z2 z3 VaRα

0 10 0.00 0.00 0.00 243.44


10 9.9 0.00 3.61 0.00 208.81
30 9.7 0.00 9.36 0.34 146.23
50 9.5 0.00 6.95 2.55 120.68
80 9.2 0.00 3.32 5.88 82.35

Evidently it does not make sense to buy put options with strike prices below
qα (S (T )). Looking at the table we can see that when c is small, then we buy
options which are cheaper. When c is large, we can afford to spend money
on options with higher strike price, which offer better protection. A full
picture is obtained when we look not only at VaR, but at the distribution of
X in Figure 7.5.

In the formulation of (7.21) we have added constraints that we do not


take short positions in puts, and that we do not buy more puts than stocks.
Exercising such common sense is often necessary when dealing with VaR.
If we allow for arbitrary number of put options, then blind reliance on
VaR to assess risk may mislead the investor into using catastrophic hedg-
ing strategies. For instance, puts with a high strike price, which are more
7.4 VaR in the Black–Scholes model 117

500   1  

250   0.5  

0  
100   125   150   -­‐250   250   500  

-­‐250  

c =  0   c =  10 c =  30 c =  50 c =  80


       

Figure 7.5 The discounted gain X(x,z) from Example 7.22 for various levels
of c (left), and its distribution function (right).

expensive and provide good protection, can be financed by taking short po-
sitions in puts whose strike price is below qα (S (T )). Such short positions in
puts are ignored in the computation of VaR since their exercise is unlikely.
Thus, we can obtain a position with a very small (even negative) VaR.

Example 7.23
Consider the data from Example 7.22. Suppose that we want to invest V0 =
1000 and decide to buy x = 20 shares of stock and hedge them with z2 = 0
and z3 = 20 put options with strike prices K2 and K3 , respectively. Clearly
V(0) does not provide enough funds to enter such a position. We decide
to finance our strategy by taking a short position in put options with strike
price K1
1
z1 = (V0 − xS (0) − z3 H3 (0)) = −3056.
H1 (0)
Clearly our strategy is not a good idea. Common sense dictates that the
short position in unhedged puts will be catastrophic if S (T ) < K1 . For
instance, if the future price of stock should fall to say 70, then the value of
the strategy would be
20 · 70 − 3056 · (75 − 70) + 20 · (110 − 70) = −13 080,
leading to a loss exceeding thirteen thousand. Since the probability of this
is small,
P(S (T ) < K1 ) < P(S (T ) ≤ qα (S (T ))) = α,
118 Value at Risk

such scenarios are ignored in the computation of VaR and we obtain


VaRα X(x,z) = −1135,


indicating a gain of over a thousand at the considered confidence level.


This can lull us into a false sense of security, which is visible when com-
paring VaR with the size of potential losses for S (T ) < K1 . This once again
illustrates the most serious shortcoming of VaR as a risk measure.

We finish the section by showing how to compute VaR for investments


in multiple assets. In such case a simple analytic formula for VaR is not
available and we make use of the Monte Carlo method discussed in (7.8).

Example 7.24
Consider n stocks S 1 , . . . , S n , whose prices at time T evolve according to
σ2j  n
  
X √ 
S j (T ) = S j (0) exp µ j −  T + c jl T Zl  ,

2 l=1

where Z1 , . . . , Zn are independent identically distributed random variables


(see [PF]) with standard normal distribution N(0, 1), c jl ∈ R for j, l =
1, . . . , n are fixed numbers, and
q
σ j = c2j1 + · · · + c2jn .

Such distributions are used in the n-dimensional version of the Black–


Scholes market (also see [BSM] for details).
Suppose that we split the investment V(0) amongst the securities, buying
x1 , . . . , xn shares of assets S 1 , . . . , S n , respectively. For i = 1, . . . , N and
l = 1, . . . , n we can simulate nN independent samples Ẑli from distribution
N(0, 1), and define
σ2j  n
  
X √ i 
Ŝ j (T ) = S j (0) exp µ j −  T + c jl T Ẑl  .
i

2 l=1

(See [NMFC] for details on how to perform such simulations.) We define


n
X
X̂i = e−rT x j Ŝ ij (T ) − V(0)
j=1
7.4 VaR in the Black–Scholes model 119
1

0.5

0.05
100 100

Figure 7.6 Monte Carlo simulation for VaR in the Black–Scholes market
from Example 7.24.

to obtain a sequence of simulated gains that can be used to estimate VaRα (X)
using (7.8).
In Figure 7.6 we have a plot of FYN obtained from N = 30 000 simula-
tions, for the following parameters:
S 1 (0) = 100, S 2 (0) = 200, S 3 (0) = 300,

µ1 = 10%, µ2 = 12%, µ3 = 14%,


   
 c11 c12 c13   0.1 0.05 0 
 c21 c22 c23  =  0.05 0.2 −0.1  ,
   
c31 c32 c33 0 −0.1 0.4
taking V(0) = 1000, r = 5%,
x1 = 3, x2 = 2, x3 = 1,
and T = 12
1
. On the plot we also see that VaRα (YN ) = 47.5 results from the
simulation.

Exercise 7.8 Recreate the numerical results from Example 7.24.


120 Value at Risk

7.5 Proofs
Proposition 7.4
Let X, Y be random variables.
(i) X ≥ Y implies qα (X) ≥ qα (Y).
(ii) For any b ∈ R, qα (X + b) = qα (X) + b.
(iii) For b > 0, qα (bX) = bqα (X).
(iv) qα (−X) = −q1−α (X).
Proof If X ≥ Y then
F X (x) = P(X ≤ x) ≤ P(Y ≤ x) = FY (x),
hence α < F X (x) implies that α < FY (x). This means that
{x : α < F X (x)} ⊂ {x : α < FY (x)}
which gives
qα (X) = inf{x : α < F X (x)} ≥ inf{x : α < FY (x)} = qα (Y).
The second property follows since with Y = X + b we have
FY (x + b) = P(X + b ≤ x + b) = F X (x),
so that
qα (X + b) = inf{x + b : α < FY (x + b)}
= inf{x : α < FY (x + b)} + b
= inf{x : α < F X (x)} + b
= qα (X) + b.
Since P(bX ≤ x) = P(X ≤ x/b) we see similarly that
FbX (x) = F X (x/b),
hence for b > 0
qα (bX) = inf{x : α < FbX (x)}
= inf{x : α < F X (x/b)}
= inf {by : α < F X (y)}
= b inf{y : α < F X (y)}
= bqα (X).
To prove (iv) we first need to show that for any b ∈ R
inf{x : b ≤ P (X ≤ x)} = inf{x : b ≤ P (X < x)}. (7.22)
7.5 Proofs 121
Since P (X < x) ≤ P (X ≤ x) , if b ≤ P (X < x) then b ≤ P (X ≤ x) , which
means that

{x : b ≤ P (X < x)} ⊂ {x : b ≤ P (X ≤ x)},

hence

inf{x : b ≤ P (X < x)} ≥ inf{x : b ≤ P (X ≤ x)}.

We shall now rule out the possibility that the above inequality is strict.
Suppose that

inf{x : b ≤ P (X ≤ x)} < x∗ < inf{x : b ≤ P (X < x)}, (7.23)

for some x∗ ∈ R. Then P (X < x∗ ) < b, and since x → P (X < x) is left-


continuous, we can find an x̂ ∈ R,

inf{x : b ≤ P (X ≤ x)} < x̂ < x∗ ,

for which

P (X < x̂) < b. (7.24)

Since x̂ is greater than inf{x : b ≤ P (X ≤ x)}, we have

b ≤ P (X ≤ x̂) ,

which contradicts (7.24). We thus must have an equality in (7.23), hence


(7.22).
To prove (iv) we shall also use the fact that

F−X (x) = P (−X ≤ x) = P (X ≥ −x) = 1 − P (X < −x) . (7.25)


122 Value at Risk
We can now compute
qα (−X) = inf{x : α < F−X (x)}
= − sup{−x : α < F−X (x)}
= − sup{−x : α < 1 − P (X < −x)}
(using (7.25))
= − sup{y : α < 1 − P (X < y)}
(taking y = −x)
= − sup{y : P (X < y) < 1 − α}
= − inf{y : 1 − α ≤ P (X < y)}
(since y → P (X < y) is non-decreasing)
= − inf{y : 1 − α ≤ P (X ≤ y)}
(using (7.22))
= − inf{y : 1 − α ≤ F X (y)}
= −q1−α (X),
as required. 
Lemma 7.6
Let X be a random variable. If f : R → R is right-continuous and non-
decreasing then
qα ( f (X)) = f (qα (X)).
Proof Since
F f (X) ( f (qα (X))) = P( f (X) ≤ f (qα (X)))
≥ P(X ≤ qα (X))
= F X (qα (X))
≥ α,
we see that
f (qα (X)) ≥ qα ( f (X)).
If we can show that y ≥ qα ( f (X)) whenever y > f (qα (X)) , then f (qα (X))
is the largest α-quantile for f (X).
Take any y > f (qα (X)). Since f is right-continuous and non-decreasing,
the set f −1 (−∞, y) is an open interval of the form (−∞, a), for some a ∈ R.
This gives
(−∞, qα (X)] ⊂ {x : f (x) ≤ f (qα (X))} ⊂ {x : f (x) < y} = (−∞, a),
7.5 Proofs 123
which means that there exists an x∗ for which qα (X) < x∗ < a. Since
qα (X) < x∗
α < F X (x∗ ),
hence, with Y = f (X),
FY (y) = P(Y ≤ y) ≥ P(Y < y) = P(X < a) ≥ P(X ≤ x∗ ) = F X (x∗ ) > α,
which implies that y ≥ qα (Y) = qα ( f (X)). 
8
Coherent measures of risk

8.1 Average Value at Risk


8.2 Quantiles and representations of AVaR
8.3 AVaR in the Black–Scholes model
8.4 Coherence
8.5 Proofs

In the previous chapter Value at Risk was shown to have two potentially
undesirable features:
• VaR provides no information on the size of potential losses in scenarios
with probability less than α.
• VaR recorded for a diversified position may exceed that recorded for a
position with all funds held in one security.
On the other hand, VaR has the advantage of simplicity: it produces a single
number to quantify the risk of holding a given risky position. However, it
does this by taking account only of the α-quantile, rather than of the whole
distribution.
While VaR has retained much of its popularity with practitioners, many
observers have commented that the 2007/8 banking crisis revealed that fi-
nancial markets can be unduly optimistic in their evaluations of risk. This
chapter takes its title from a seminal paper by Artzner, Delbaen, Eber and
Heath in 1999,1 which highlighted the defects of VaR and proceeded to set
out, as axioms, four algebraic properties for risk measures to be coherent,
as well describing a wide class of such measures. This approach has since
won many adherents and spawned a very considerable research literature,
including further generalisations.
We introduce particular examples of coherent measures, beginning with
1
P. Artzner, F. Delbaen, J.-M. Eber, D. Heath, Coherent measures of risk, Mathematical
Finance 9, (1999), 203–228.

124
8.1 Average Value at Risk 125


}
Figure 8.1 α times AVaRα (X) is the area for the loss corresponding to the
tail of the distribution.

the most natural adaptation of VaR, widely known as AVaR. We will derive
equivalent expressions for this risk measure, show that it is sub-additive,
compare it with other risk measures proposed as alternatives to VaR, and
outline its generalisation to spectral measures. We will also examine AVaR
in the Black–Scholes model by revisiting, with AVaR replacing VaR, the
hedging techniques with European puts described in Section 7.4.

8.1 Average Value at Risk


We first examine how one might modify the definition of VaR to produce a
measure of risk that retains simplicity without having the first shortcoming
of VaR described above, by taking account of the entire α-tail of the distri-
bution. This is mostly simply provided by calculating VaRβ for all β ≤ α in
(0, 1) and taking their average.
We assume that X denotes the (discounted) gain of some investment
project.
Definition 8.1
The Average Value at Risk of X is given by
1 α 1 α β
Z Z
α β
AVaR (X) = VaR (X)dβ = − q (X)dβ.
α 0 α 0
In Figure 8.1 the integral in the definition of AVaRα (X) is marked as the
shaded area for the loss corresponding to the tail of the distribution.
The properties of quantiles given in Proposition 7.4 from the previous
chapter show that
1 α β 1 α
Z Z
α
AVaR (X) = − q (X)dβ = q1−β (−X)dβ.
α 0 α 0
126 Coherent measures of risk
Unlike VaRα , this takes into account the impact of all the losses that oc-
cur with probability at most α: it provides an estimate of the losses implied
by events in the α-tail of the distribution of X. Informally, AVaRα provides
the ‘expected loss, conditioned on the worst 100α%’ of outcomes, whereas
VaRα provides the maximum loss in the ‘best 100(1 − α)%’ of outcomes.
Recall that since the distribution function F X of X is non-decreasing, it
can have at most countably many jump discontinuities. This has the advan-
tage that AVaRα (X) does not depend on the choice of the upper or lower
α-quantile, unlike the definition of VaRα (X).
It seems natural to call AVaR the average value-at-risk, although the
terms ‘conditional value at risk’ (CVaR ) or ‘expected shortfall’ (ES) are
also widely used in the literature for quantities that turn out to be equivalent
to AVaR.
Since β ≤ α implies qβ (X) ≤ qα (X) it is clear that AVaR dominates VaR:
1 α β 1 α α
Z Z
α
AVaR (X) = − q (X)dβ ≥ − q (X)dβ = −qα (X) = VaRα (X).
α 0 α 0
It is immediate from its definition that AVaR will share the properties of
VaR we recorded in Proposition 7.9.

Proposition 8.2
For X ≤ Y and any real number m we have:
(i) AVaRα (X) ≥ AVaRα (Y);
(ii) AVaRα (X + m) = AVaRα (X) − m;
(iii) for λ ≥ 0, AVaRα (λX) = λAVaRα (X).

Exercise 8.1 Verify properties (i)–(iii) in Proposition 8.2.

By its definition, AVaR provides a remedy for the first shortcoming of


VaR noted earlier, since it takes into account the whole α-tail of the distri-
bution. The second problem we noted was that VaR can suggest increased
risk when portfolios are diversified. To show that AVaR does not share this
defect we need to show that it is sub-additive; in other words, that AVaR
has the following property:

Theorem 8.3 (Sub-additivity of AVaR)


For any portfolios X, Y

AVaRα (X + Y) ≤ AVaRα (X) + AVaRα (Y).


8.2 Quantiles and representations of AVaR 127
This property is not evident directly from our definition of AVaR, and
the next section is devoted to proving this claim. The proof is given in
Corollary 8.11, which follows from Theorem 8.10.

8.2 Quantiles and representations of AVaR


In this section we derive an alternative formulation for AVaR, which will be
used for the proof of Theorem 8.3. It will also prove useful for calculations
in various examples.
We start with a technical lemma.
Lemma 8.4
Let X : Ω → R be a random variable. Assume that U is a uniformly
distributed random variable on (0, 1). Then the random variable Y, defined
by Y(x) = qU(x) (X), has the same distribution as X.
Proof See page 154. 

Exercise 8.2 Prove that Lemma 8.4 holds also for Y(ω) = qU(ω) (X).

Now, with fU denoting the uniform density on (0, 1)


(
1 if x ∈ (0, 1),
fU (x) =
0 otherwise,
we have
Z Z 1
E(Y) = q (X) fU (s)ds =
s
q s (X)ds.
R 0

Hence Lemma 8.4 implies that for any integrable random variable X we
have
Z 1
q s (X)ds = E(Y) = E(X), (8.1)
0

since the distributions of X and Y are the same.

Exercise 8.3 Show that (8.1) holds also when we replace q s (X) with
q s (X).
128 Coherent measures of risk
We now apply (8.1) to obtain an alternative description of AVaR.
Proposition 8.5
For any α ∈ (0, 1)
1h
AVaRα (X) = − E(X1{X<qα (X)} ) + qα (X)(α − P(X < qα (X)) .
i
(8.2)
α
Proof Let x− denote the negative part of x, i.e. x− = − min{x, 0}. Since
f (x) = −x− is a non-decreasing function, by Lemma 7.6 for any random
variable Y and any β ∈ (0, 1),
qβ (−Y − ) = qβ ( f (Y)) = f (qβ (Y)) = −(qβ (Y))− . (8.3)

Let us write qα (X) = qα for ease of notation. The claim now follows by
computing
1 α β
Z
α
AVaR (X) = − q (X)dβ
α 0
Z α
1
=− (qβ (X) − qα )dβ − qα
α 0
1 1
Z
=− −(qβ (X) − qα )− dβ − qα (for β ≤ α, qβ (X) ≤ qα )
α 0
1 1
Z
=− −(qβ (X − qα ))− dβ − qα (by Proposition 7.4)
α 0
1 1 β
Z
=− q (−(X − qα )− )dβ − qα (using (8.3))
α 0
1
= − E(−(X − qα )− ) − qα (using (8.1))
αZ
1
=− (X − qα )dP − qα
α {X<qα }
"Z Z #
1 α α
=− XdP − q dP + αq
α {X<qα } {X<qα }
1h
= − E(X1{X<qα } ) + qα (α − P(X < qα )) .
i
α

We can now formulate a corollary that allows us to compute AVaR for
discretely distributed random variables.
8.2 Quantiles and representations of AVaR 129
Corollary 8.6
Assume that X is a discrete random variable with P(X = xi ) = pi , p1 + · · · +
pN = 1, and x1 < x2 < · · · < xN . Then
k −1  
α α −1
kX
α 1  X  
AVaR (X) = −  pi xi + xkα α −
 pi  ,
α i=1 i=1

where kα ∈ N is the largest number such that i=1 pi ≤ α.


Pkα −1

Proof By Lemma 7.13, qα (X) = −VaRα (X) = xkα , hence


α −1
kX
P (X < qα (X)) = pi ,
i=1
α −1
kX
E(X1{X<qα (X)} ) = pi xi ,
i=1

and the claim follows from Proposition 8.5. 


Similarly as for VaR, Corollary 8.6 can be used to estimate AVaR using a
Monte Carlo simulation. If X̂1 , . . . , X̂N are results of simulations following
the same distribution as X, we define YN as the discrete random variable
with distribution
1
P(YN = X̂i ) = for i = 1, . . . , N.
N
Since the distribution function FYN converges to F X as N tends to infinity,
for sufficiently large N we can approximate AVaRα (X) by AVaRα (YN ),
AVaRα (X) ≈ AVaRα (YN ). (8.4)
Each AVaRα (YN ) can easily be computed using Corollary 8.6. We shall
implement this method in the following section, to compute AVaR in the
n-dimensional Black–Scholes market (see Example 8.21).
From Proposition 8.5 we also have the following:
Corollary 8.7
If X is a random variable whose distribution function F X is strictly increas-
ing and continuous, then
AVaRα (X) = −E(X|X ≤ qα (X)).
Proof See page 155. 
For general distributions we need to allow for the possibility that F X has
a jump at α. The following lemma is helpful here.
130 Coherent measures of risk
Lemma 8.8
For α ∈ (0, 1), let qα = qα (X) and set
if P(X = qα ) = 0,
(
α 1{X<qα }
1X = (8.5)
1{X<qα } + κ1{X=qα } if P(X = qα ) > 0,
where
α − P(X < qα )
κ= . (8.6)
P(X = qα )
Then
E(1αX ) = α, (8.7)
and for all ω ∈ Ω,
1αX (ω) ∈ [0, 1]. (8.8)
Proof See page 156. 
The reason for the definition of 1αX becomes clear in the next proposition,
which allows us to express AVaRα as an expectation.
Proposition 8.9
For any α ∈ (0, 1),
1
AVaRα (X) = − E(X1αX ).
α
Proof As above, write qα (X) = qα . If P(X = qα ) = 0, then P(X < qα ) =
P(X ≤ qα ) = α, so that the second term on the right in (8.2) vanishes and
1 1
AVaRα (X) = − E(X1{X<qα } ) = − E(X1αX ).
α α
If P(X = qα ) > 0, then using the fact that
Z Z
XdP = qα dP = qα P(X = qα ), (8.9)
{X=qα } {X=qα }

we compute
α
E(X1αX ) = E X1{X<qα } + X α−P(X<q
 )

P(X=qα )
1{X=qα }
α
= E(X1{X<qα } ) + {X=qα } X α−P(X<q
R )
P(X=qα )
dP
α
= E(X1{X<qα } ) + α−P(X<q )
R
P(X=qα ) {X=qα }
XdP
= E(X1{X<qα } ) + qα (α − P(X < qα )) (using (8.9))
α
= −αAVaR (X), (using (8.2))
as required. 
8.2 Quantiles and representations of AVaR 131
Let us observe that the random variable Z(ω) = α1 1αX (ω) is integrable,
bounded above by α1 and has expectation 1, as shown in Lemma 8.8. We
can therefore define a new probability measure, which we denote by QαX ,
as
Z
α
QX (A) = ZdP.
A

In other words, Z is a Radon–Nikodym derivative, and the usual notation


is to write
dQαX
Z= .
dP
(See [PF] for the definition of the Radon–Nikodym derivative and for more
details.)
This shows that, using the measure QαX , the expression for AVaRα takes
a surprisingly simple form
dQα
Z Z
α 1 α 1 α
AVaR (X) = − E X1X = − X1X dP = − X X dP = −EQαX (X).
α α Ω Ω dP
This will lead to a simple proof of its sub-additivity. First we need a repre-
sentation result.
Recall that a probability measure Q is absolutely continuous with respect
to P, which we denote as Q  P, when P(A) = 0 implies Q(A) = 0. By the
Radon–Nikodym theorem (see [PF]), for any Q absolutely continuous with
respect to P there exists a Radon–Nikodym derivative dQ dP
, meaning that
Z
dQ
Q(A) = dP.
A dP

Theorem 8.10
For α ∈ (0, 1) let
( )
dQ 1
Pα = Q : Q is a probability measure, Q  P, ≤ .
dP α
Then
sup{−EQ (X) : Q ∈ Pα } = AVaRα (X).

Proof Let us write qα = qα (X). Since


dQαX 1
(ω) = 1αX (ω),
dP α
132 Coherent measures of risk
looking at the definition of 1αX in (8.5), we see that
dQαX 1
(ω) = for ω ∈ {X < qα }, (8.10)
dP α
dQαX 1
(ω) = κ for ω ∈ {X = qα }, (8.11)
dP α
α
dQX
(ω) = 0 for ω ∈ {X > qα }. (8.12)
dP
Let Q be an arbitrary measure in Pα . We compute
EQ (X) = Ω X dQ
R
dP
dP
= {X<qα } X dQ dP + {X=qα } X dQ dP + {X>qα } X dQ
R R R
dP dP dP
dP
dQα
 dQ 
= {X<qα } X dP − α1 dP + {X<qα } X dPX dP
R R
(see (8.10))
α
dQ
 dQ 
+ {X=qα } X dP − α1 κ dP + {X=qα } X dPX dP
R R
(see (8.11))
dQα
+ {X>qα } X dQ dP + {X>qα } X dPX dP
R R
dP
(see (8.12))
 dQ   dQ 
= {X<qα } X dP − α1 dP + {X=qα } X dP − α1 κ dP
R R

dQα
+ {X>qα } X dQ dP + Ω X dPX dP.
R R
dP

We now examine one by one the four integrals in the above expression.
By definition, dQ
dP
≤ α1 , hence on {X < qα }
!
dQ 1
(X − qα ) − ≥ 0,
dP α
giving
Z ! Z !
dQ 1 α dQ 1
X − dP ≥ q − dP. (8.13)
{X<qα } dP α {X<qα } dP α
Evidently,
Z ! Z !
dQ 1 dQ 1
X − κ dP = qα − κ dP. (8.14)
{X=qα } dP α {X=qα } dP α
dQ
Since dP
≥ 0,
Z Z
dQ dQ
X dP ≥ qα dP. (8.15)
α
{X>q } dP {X>qα } dP
Finally, for the last of the four integrals we see that
dQα
Z
X X dP = EQαX (X). (8.16)
Ω dP
8.2 Quantiles and representations of AVaR 133
Substituting (8.13)–(8.16) into our formula for EQ (X) we obtain
Z ! Z !
dQ 1 dQ 1
EQ (X) ≥ qα − dP + qα − κ dP
{X<qα } dP α {X=qα } dP α
Z
dQ
+ qα dP + EQαX (X)
α
{X>q } dP
Z Z
α 1 1
=− q dP − qα κdP
{X<qα } α {X=qα } α
Z
dQ
+ qα dP + EQαX (X)
Ω dP
1
= −qα P(X < qα ) − qα κP(X = qα ) + qα + EQαX (X)
1
α α
= EQαX (X) . (using (8.6))
We have shown that −EQ (X) ≤ −EQα (X). Since QαX ∈ Pα , this implies that
sup{−EQ (X) : Q ∈ Pα } = −EQαX (X) = AVaRα (X),
as required. 
We are finally ready to prove Theorem 8.3. The result follows from The-
orem 8.10 and we formulate it as a corollary.
Corollary 8.11
AVaR is sub-additive:
AVaRα (X + Y) ≤ AVaRα (X) + AVaRα (Y).
Proof We use the fact that for two functions f, g : U → R, where U is an
arbitrary set,
sup { f (x) + g(x)} ≤ sup f (x) + sup g(x). (8.17)
x∈U x∈U x∈U

Let us fix X and Y. We can apply (8.17) taking U = Pα , f (Q) = −EQ (X),
and g(Q) = −EQ (Y) to obtain
AVaRα (X + Y) = sup{−EQ (X + Y) : Q ∈ Pα }
= sup{EQ (−X) + EQ (−Y) : Q ∈ Pα }
≤ sup{EQ (−X) : Q ∈ Pα } + sup{EQ (−Y) : Q ∈ Pα }
(using (8.17))
= AVaR (X) + AVaRα (Y),
α

as required. 
134 Coherent measures of risk
The next exercise provides an alternative direct proof of sub-additivity.
The idea is the same as in the proof of Theorem 8.10.

Exercise 8.4 Let AVaR be defined by (8.2). Given a probability space


(Ω, F , P) and random variables X, Y : Ω → R with Z = X + Y, show
that
1αZ − 1αX ≥ 0 if X > qα (X)
1αZ − 1αX ≤ 0 if X < qα (X)
and similarly with X replaced by Y. Exploit this fact to show that AVaR
is sub-additive: AVaRα (Z) ≤ AVaRα (X) + AVaRα (Y).

We now consider a further risk measure whose definition is similar to


the description of AVaR we found in Proposition 8.9.
Definition 8.12
We define the (upper) tail conditional expectation (TCE) of X as
TCEα (X) = −E (X|X ≤ qα (X)) = −E (X|X ≤ −VaRα (X)) . (8.18)
The next exercise shows that TCE shares the three properties already
verified for VaR and AVaR.

Exercise 8.5 Show that for X ≤ Y and any real number m we have:
(i) TCEα (X) ≥ TCEα (Y);
(ii) TCEα (X + m) = TCEα (X) − m;
(iii) for λ ≥ 0, TCEα (λX) = λTCEα (X).

When F X is continuous then α = P(X ≤ qα (X)) = P(X < qα (X)). Hence


for continuous F X we have
TCEα (X) = AVaRα (X). (8.19)
Comparing (8.2) with (8.18) we see that TCEα has a simpler expression
than AVaRα . A natural question is therefore whether TCE is sub-additive
in general. The next example shows that this is not true.
8.2 Quantiles and representations of AVaR 135

Example 8.13
Let Ω = {ω1 , ω2 , ω3 } and
P({ω1 }) = P({ω2 }) = 0.03,
P({ω3 }) = 0.94.
Let α = 0.05 and define random variables X, Y by setting
X(ω1 ) = −100, X(ω2 ) = 0, X(ω3 ) = 0,
Y(ω1 ) = 0, Y(ω2 ) = −100, Y(ω3 ) = 0.
We claim that
TCEα (X + Y) > TCEα (X) + TCEα (Y).
Since
qα (X) = inf{x : F X (x) > 0.05} = 0,
and {X ≤ 0} = Ω, we see that
TCEα (X) = −E (X|X ≤ qα (X)) = −E (X|Ω) = −E (X)
= − [0.03 × (−100) + 0.97 × 0] = 3.
By an identical computation, also
TCEα (Y) = 3.
On the other hand, Z = X + Y has
qα (Z) = inf{x : FZ (x) > 0.05} = −100,
and {Z ≤ qα (Z)} = {ω1 , ω2 }, hence
TCEα (Z) = −E(Z|Z ≤ qα (Z))
1
=− (Z(ω1 )P({ω1 }) + Z(ω1 )P({ω1 }))
P(Z ≤ qα (Z))
1
=− (−100 × 0.03 − 100 × 0.03)
0.06
= 100.
This demonstrates a serious shortcoming of the tail-conditional expectation
as a risk measure. Since
!
α 1 1 1 1
TCE X + Y = TCEα (X + Y) ≥ [TCEα (X) + TCEα (Y)] ,
2 2 2 2
136 Coherent measures of risk

the diversified position consisting of investing one-half of our funds in each


of X and Y is riskier than placing the whole fund in one or the other.

The example shows that TCE shares the same defect as VaR. Fortunately
AVaR, even though its computation is slightly more involved, has much
more desirable properties.

Exercise 8.6 Consider the same X and Y as in Example 8.13. Com-


pute AVaRα (X), AVaRα (Y) and AVaRα (X + Y), and compare with the
above Example.

8.3 AVaR in the Black–Scholes model


In this section we discuss how to compute AVaR in the setting of the Black–
Scholes model. Let us recall that, under the assumptions of the model, the
future stock price at time T is

2
 √ 
µ− σ2 T +σ T Z
S (T ) = S (0)e , (8.20)

where S (0), µ ∈ R, σ > 0, and Z is a random variable with standard normal


distribution N(0, 1).
Before computing AVaR, we start with a technical lemma.

Lemma 8.14
For any q ∈ R

1 √ 
S (0)eµT N q − σ T ,

E (S (T )|Z ≤ q) =
N(q)

where N(q) is the standard normal cumulative distribution function, i.e.


Z q
1 x2
N(q) = √ e− 2 dx.
−∞ 2π
8.3 AVaR in the Black–Scholes model 137
Proof Since P(Z ≤ q) = N(q) > 0,
Z q   √ 
1 2
µ− σ2 T +σ T x 1 x2
E (S (T )|Z ≤ q) = S (0)e √ e− 2 dx
P(Z ≤ q) −∞ 2π
 Z q
1 − x2 −2σ2 √T x

1 2
µ− σ2 T
= S (0)e √ e dx
N(q) −∞ 2π
  Z q √
1 2
µ− σ T 1 x2 −2σ T x+σ2 T σ2 T
+ 2
= S (0)e 2 √ e− 2 dx
N(q) −∞ 2π

(x−σ T )2
Z q
1 1
= S (0)eµT √ e− 2 dx
N(q) −∞ 2π
Z q−σ √T
1 1 x2
= S (0)eµT √ e− 2 dx
N(q) −∞ 2π
1 µT
 √ 
= S (0)e N q − σ T ,
N(q)
as required. 

We are now ready to compute AVaR for an investment in stock.

Lemma 8.15
For the discounted gain

X = e−rT S (T ) − S (0)

we have
1 √ 
AVaRα (X) = S (0) − S (0)e(µ−r)T N qα (Z) − σ T .

α
Proof By Lemma 7.17 we know that

2
 √ 
α µ− σ2 T +σ T qα (Z)
q (S (T )) = S (0)e , (8.21)

therefore

{X ≤ qα (X)} = {e−rT S (T ) − S (0) ≤ qα (e−rT S (T ) − S (0))}


= {e−rT S (T ) − S (0) ≤ e−rT qα (S (T )) − S (0)}
(by Proposition 7.4)
α
= {S (T ) ≤ q (S (T ))}
= {Z ≤ qα (Z)}. (compare (8.20) with (8.21))
138 Coherent measures of risk
Since X has continuous distribution, this gives
AVaRα (X) = TCEα (X)
= −E (X|X ≤ qα (X))
= −E e−rT S (T ) − S (0)|Z ≤ qα (Z)
 

= S (0) − e−rT E (S (T )|Z ≤ qα (Z))


1 √ 
= S (0) − S (0)e(µ−r)T N qα (Z) − σ T ,

(by Lemma 8.14)
α
as required. 

Exercise 8.7 Consider holding x > 0 shares of stock S and investing


a cash sum y risk-free at time 0. The values of this trading strategy
(x, y) at times 0, T are
V(x,y) (0) = xS (0) + y,
V(x,y) (T ) = xS (T ) + yerT .
Compute AVaRα (X(x,y) ) for
X(x,y) = e−rT V(x,y) (T ) − V(x,y) (0).
Show that if y > 0, then AVaRα (X(x,y) ) is smaller than AVaR of a posi-
tion where V(x,y) (0) would be invested only in stock.

Exercise 8.8 Consider buying x > 0 shares of stock S and taking a


long position in θ ∈ [0, x] forward contracts to sell the stock at time T,
for the forward price F = S (0)erT . The value of the trading strategy
(x, y) is
V(x,θ) (0) = S (0),
V(x,θ) (T ) = S (T ) + θ(F − S (T )).
Compute AVaRα (X(x,θ) ) for
X(x,θ) = e−rT V(x,θ) (T ) − V(x,θ) (0).
Show that AVaRα (X(x,θ) ) is smaller than AVaR of a position without
the forward contract.
8.3 AVaR in the Black–Scholes model 139
We now turn our attention to hedging AVaR with European put options.
Assume that at time zero we buy x shares of stock and z European put
options with strike price K and exercise date T . The value of the investment
is given at t = 0, T by
V(x,z) (t) = xS (t) + zH(t),
where H(T ) is the put option payoff
H(T ) = (K − S (T ))+ ,
and H(0) is the put option price
H(0) = P(r, T, K, S (0), σ) = Ke−rT N(−d− ) − S (0)N(−d+ ), (8.22)
where
 
ln S K(0) + r + 12 σ2 T
d+ = d+ (r, T, K, S (0), σ) = √ ,
σ T
 
ln S K(0) + r − 12 σ2 T
d− = d− (r, T, K, S (0), σ) = √ .
σ T
The discounted gain of the investment is
X(x,z) = e−rT V(x,z) (T ) − V(x,z) (0).
Our aim will be to compute AVaRα (X(x,z) ). First we need to introduce
some notation. We write

d−µ = d− (µ, T, K, S (0), σ), d+µ = d−µ + σ T√,
d−µ,α = max d−µ , −qα (Z) , d+µ,α = d−µ,α + σ T ,


and
Pα (K) = Ke−µT N(−d−µ,α ) − S (0)N −d+µ,α .

(8.23)

Proposition 8.16
If z ∈ [0, x], then
1 √ 
AVaRα X(x,z) = V(x,z) (0) − e(µ−r)T xS (0)N qα (Z) − σ T + zPα (K) .
 h  i
α
Proof We first observe that
X(x,z) = e−rT V(x,z) (T ) − V(x,z) (0)
= e−rT xS (T ) + z (K − S (T ))+ − V(x,z) (0).

(8.24)
140 Coherent measures of risk

1 1

1 1

Figure 8.2 F X(x,z) for various z. The dotted line represents X(x,z) for S (T ) = K.

Since z ≤ x, we see that


s → e−rT xs + z (K − s)+ − V(x,z) (0)

(8.25)
is a non-decreasing function of s. Also
σ2 √
! !
ξ → S (0) exp µ − T + σ Tξ
2
is increasing. Combining these two facts, by Lemma 7.6,
X(x,z) ≤ qα (X(x,z) ) = {S (T ) ≤ qα (S (T ))} = {Z ≤ qα (Z)} .

(8.26)
We first prove the claim for z < x. Then (8.25) is strictly increasing,
therefore
P(X(x,y) < qα (X(x,y) )) = P(S (T ) ≤ qα (S (T ))) = α,
and by Proposition 8.5,
AVaRα (X(x,z) ) = −E X(x,z) |X(x,z) ≤ qα (X)


= −E X(x,z) |Z ≤ qα (Z)

(by (8.26))
α
= V(x,z) (0) − e −rT
xE (S (T )|Z ≤ q (Z)) (see (8.24))
−rT
zE (K − S (T ))+ |Z ≤ qα (Z) .

−e (8.27)
We now compute the last term in (8.27). By (8.20),
{S (T ) ≤ K} = Z ≤ −d−µ ,

8.3 AVaR in the Black–Scholes model 141
hence,

E (K − S (T ))+ |Z ≤ qα (Z)


= E (K − S (T )) 1{Z≤−d−µ } |Z ≤ qα (Z)
 

α µ
√ !
1 min(q (Z),−d− )
Z  
2
µ− σ2 T +σ T x 1 2
= K − S (0)e √ e−x dx
α −∞ 2π
Z −d−µ,α
1 1 2
= K √ e−x dx (min(a, b) = − max(−a, −b))
α −∞ 2π
Z µ,α √
1 −d−
 
µ− σ2 T +σ T x 1
2
2
− S (0)e √ e−x dx
α −∞ 2π
1 1
= KN(−d−µ,α ) − P(Z ≤ −d−µ,α )E S (T )|Z ≤ −d−µ,α

α α
1 1 √ 
= KN(−d− ) − S (0)eµT N −d−µ,α − σ T
µ,α

(by Lemma 8.14)
α α
1
= eµT Ke−µT N(−d−µ,α ) − S (0)N −d+µ,α .
 
α
Substituting the above into (8.27) and applying Lemma 8.14 gives the
claim.
We now need to consider the case when z = x. Since for any β ∈ (0, 1)
(see Figure 8.2)
lim qβ (X(x,z) ) = qβ (X(x,x) ),
z%x

we obtain
−1 α β
Z
lim AVaRα X(x,z) = lim

q (X(x,z) )dβ
z%x z%x α 0
−1 α β
Z
= q (X(x,x) )dβ
α 0
= AVaRα X(x,x) .


Hence the result follows from the fact that the formula for AVaRα (X(x,z) ) in
the claim is continuous with respect to z. 

Exercise 8.9 Show that if x = z and K ≥ qα (S (T )), then


AVaRα (X(x,z) ) = VaRα (X(x,z) ).
142 Coherent measures of risk

26

24

22

20
80 90 100

Figure 8.3 AVaR of a fixed position in x stocks, hedged with puts (parame-
ters of the model are as in Exercise 8.11).

Example 8.17
Suppose that we spend V0 to buy a fixed number x of stocks, together with
z put options. The number of options we can buy depends on the choice of
the strike price K,
V0 − xS (0)
z = z(K) = .
P(r, T, K, S (0), σ)
We consider AVaRα (X(x,z(K)) ) for K such that z(K) ≤ x.
In Figure 8.3 we see that the smallest AVaR is attained for the smallest
considered strike price, for which z(K) = x. On the plot we also see that
AVaR dominates VaR, and that the two are equal when z(K) = x.

Exercise 8.10 Show that


E(X(x,z) ) = e(µ−r)T xS (0) + zP(µ, T, K, S (0), σ) − V(x,z) (0).
 

Example 8.18
From Example 8.17 we see that AVaR is minimised when we buy the same
number of shares of stock and European put options. Suppose therefore
that we invest V0 to buy x shares of stock and x puts. Here x depends on the
choice of the strike price K (since the higher the strike, the more expensive
8.3 AVaR in the Black–Scholes model 143

30
6

20
4

10 2

50 100 150 10 20 30

Figure 8.4 AVaR of a position in the same number of stocks and puts, for
data from Exercise 8.11.

the put), and follows from the constraint


xS (0) + xP(r, T, K, S (0), σ) = V0 ,
which gives
V0
x = x(K) = .
S (0) + P(r, T, K, S (0), σ)
By making plots of
K, AVaRα X(x(K),x(K)) |K ≥ 0
 

and
AVaRα X(x(K),x(K)) , E(X(x(K),x(K)) ) |K ≥ 0 ,
  

we obtain the graphs shown in Figure 8.4.


On the left-hand plot we can see that a high strike price reduces the
AVaR to zero. From the right-hand plot we see, however, that this is done
at the expense of also reducing the discounted expected gain to zero.
For K = 0 the associated AVaR and expected gain is the same as the one
for an investment in stock (represented by the dot in the right-hand plot).

Exercise 8.11 Consider V0 = 100, S (0) = 100, µ = 10%, r = 3%


and α = 0.05. As in Example 8.18, assume that we buy the same num-
ber of shares of stock and European put options. Recreate numerically
the plot from Figure 8.4.
144 Coherent measures of risk
Add to the right-hand plot in Figure 8.4 the set of points
AVaRα (X(x,y) ), E(X(x,y) ) |y ≥ 0, xS (0) + y = V0 ,
n  o

attainable by investing in stock and the risk-free asset.


What is more efficient, hedging AVaR with puts or diversifying be-
tween the stock and the risk-free asset?

Exercise 8.12 In a similar fashion to Exercise 8.11, compare hedg-


ing with puts and hedging with forward contracts.

We now consider what happens when we do not have full freedom of


choice of the strike price. Assume that we can invest in n European put
options with maturity T and strike prices K1 , . . . , Kn . We denote the value
at time t of the option with strike price Ki as Hi (t) and write
H(t) = (H1 (t), . . . , Hn (t)) .
Assume that we buy x shares of stock and zi puts with strike prices Ki , for
i = 1, . . . , n. The position in puts is determined by the vector
z = (z1 , . . . , zn ) ∈ Rn .
The value of our investment at time t is
V(x,z) (t) = xS (t) + zT H(t),
and the discounted gain is
X(x,z) = e−rT V(x,z) (T ) − V(x,z) (0).
Proposition 8.19
If zi ≥ 0 for i = 1, . . . , n and z1 + · · · + zn ≤ x, then
1 √ 
AVaRα (X(x,z) ) = V(x,z) (0) − e(µ−r)T xS (0)N qα (Z) − σ T + zT Pα ,
h  i
α
(8.28)
where Pα = (Pα (K1 ), . . . , Pα (Kn )).
The proof of the proposition follows along the same line as the proof of
Proposition 8.16. We leave it as an exercise.
8.3 AVaR in the Black–Scholes model 145

Exercise 8.13 Prove Proposition 8.19.

Let us now assume that x is fixed. We investigate how to minimise


AVaRα X(x,z) by choosing z. We assume that we invest V0 , which means

that we can spend
c = V0 − xS (0)
on put options. We assume that we do not take short positions in stock or
puts, and that the total number of options does not exceed the number of
shares of stock in our portfolio.
Under such assumptions, by (8.28), minimising AVaRα X(x,z) is equiv-

alent to the problem:
Find
min zT Pα ,
subject to: zT H(0) = c,
(8.29)
zT 1 ≤ x,
z0 , . . . , zn ≥ 0.
This is a linear programming problem, which can be solved numerically.

Example 8.20
Consider the Black–Scholes model with parameters S (0) = 100, µ = 10%,
σ = 0.2 and r = 3%. Assume that we spend V0 = 1000, investing in
stock and put options with strike prices K1 = 75, K2 = 90, K3 = 110 and
expiry T = 1. We shall solve the problem (8.29) for α = 0.05, considering
c = 0, 10, 30, 50 and 80.
The choice of x depends on c, since
xS (0) + c = V0 .
We compute the vectors H(0) and Pα using (8.22) and (8.23), respec-
tively,
   
 0.406   0.140 
H(0) =  2.769  , Pα =  0.819  .
   
  
12.042 1.724
The solutions to the problem (8.29) are shown in the table:
146 Coherent measures of risk

c x z1 z2 z3 AVaRα

0 10 0.00 0.00 0.00 302.24


10 9.9 7.37 2.53 0.00 242.61
30 9.7 0.00 9.36 0.34 146.23
50 9.5 0.00 6.95 2.55 120.68
80 9.2 0.00 3.32 5.88 82.35

From the table we can see that for larger c we can afford to buy options
with higher strike prices, which provide better protection, but are at the
same time more expensive.

We finish the section by showing how to compute AVaR for investments


in multiple assets. In such case a simple analytic formula for AVaR is not
available and we make use of the Monte Carlo method discussed in (8.4).

Example 8.21
Consider the n-dimensional Black–Scholes market from Example 7.24.
Using the same Monte Carlo simulation that was used to compute VaR in
Example 7.24, we can compute the AVaR for the position using (8.4) and
Corollary 8.6. We thus obtain AVaRα (YN ) = 61.75 from the simulation.

Exercise 8.14 Recreate the numerical result from Example 8.21.

8.4 Coherence
In this section we provide an axiomatic description of a certain class of
measures of risk. It will be apparent that this class contains AVaR, but not
VaR.
By a risk measure we mean a number ρ(X) ∈ R that is assigned to a
8.4 Coherence 147
random variable X to represent its risk. The following axioms are seen as
natural requirements for a satisfactory risk measure.
Definition 8.22
A risk measure ρ is coherent if it is:
(i) monotone: X ≤ Y implies ρ(X) ≥ ρ(Y);
(ii) cash-invariant: ρ(X + m) = ρ(X) − m;
(iii) positively homogeneous: for all λ ≥ 0, ρ(λX) = λρ(X);
(iv) sub-additive: for any X, Y,
ρ(X + Y) ≤ ρ(X) + ρ(Y).
Note that, by (ii), ρ(X+ρ(X)) = 0, so that ρ(X) is the minimum amount of
additional investment we need to add to X to ensure that the final position
eliminates risk, as measured by ρ. In other words,
ρ(X) = inf{m ∈ R : ρ(X + m) ≤ 0}.
More generally, a position X is said to be acceptable if ρ(X) ≤ 0.

Exercise 8.15 Show that if a risk measure ρ satisfies (ii)–(iv) above,


then it is monotone if and only if X ≥ 0 implies ρ(X) ≤ 0.

Exercise 8.16 Show that any coherent risk measure ρ is convex: for
λ ∈ [0, 1]
ρ(λX + (1 − λ)Y) ≤ λρ(X) + (1 − λ)ρ(Y).
Show conversely that if a risk measure ρ is convex and positively ho-
mogeneous, then it is coherent.

The following proposition describes a method of creating new coherent


risk measures from an existing family of such measures, including convex
combinations as a special case. We leave the simple proof as an exercise.
Proposition 8.23
Given a family of coherent risk measures {ρα : α ∈ (0, 1)} and a Borel
probability measure µ on (0, 1), then
Z
ρµ (X) = ρα (X)dµ(α)
(0,1)
148 Coherent measures of risk
is a coherent risk measure.

Exercise 8.17 Prove Proposition 8.23.

Motivated by the representation we found for AVaRα we can immedi-


ately identify a large class of coherent risk measures by the following con-
struction.

Definition 8.24
Suppose that R is a family of probability measures satisfying R ⊂ {Q :
Q  P}. We define a risk measure ρR by setting

ρR (X) = sup{−EQ (X) : Q ∈ R}.

We show that ρR is indeed a coherent risk measure.

Proposition 8.25
For any family R of probability measures absolutely continuous with re-
spect to P,
ρR (X) = sup{−EQ (X) : Q ∈ R}

defines a coherent risk measure.

Proof Given any probability measure Q  P, if X ≤ Y then −EQ (X) ≥


−EQ (Y), hence

ρR (X) = sup{−EQ (X) : Q ∈ R} ≥ sup{−EQ (Y) : Q ∈ R} = ρR (Y).

If m ∈ R, then since EQ (X + m) = EQ (X) + m

ρR (X + m) = sup{−EQ (X + m) : Q ∈ R}
= sup{−EQ (X) : Q ∈ R} − m
= ρR (X) − m.

We have −EQ (λX) = −λEQ (X), so for λ ≥ 0, taking the supremum over
Q in R gives ρR (λX) = λρR (X).
Finally, to prove sub-additivity, we use the fact that for two functions
f, g : U → R, where U is an arbitrary set,

sup { f (x) + g(x)} ≤ sup f (x) + sup g(x). (8.30)


x∈U x∈U x∈U

Let us fix X and Y. We apply (8.30) taking U = R, f (Q) = −EQ (X), and
8.4 Coherence 149
g(Q) = −EQ (Y). Thus

ρR (X + Y) = sup −EQ (X + Y)

Q∈R

= sup −EQ (X) − EQ (Y)



Q∈R

≤ sup −EQ (X) + sup −EQ (Y)


 
(from 8.30)
Q∈R Q∈R
= ρR (X) + ρR (Y),

as required. 

AVaR was
n our first example o of such a coherent risk measure: taking
R = Pα = Q : Q  P, dQ
dP
≤ 1
α
gives AVaRα , as we saw in Theorem 8.10.
We now consider some further examples.

Example 8.26
Take Rmin = {P}, which gives ρmin = −EP (X). This is a coherent risk mea-
sure by Proposition 8.25, but is not very useful. We see that if EP (X) ≥ 0
then ρmin (X) is negative, indicating that any random variable with positive
expectation is acceptable.

Example 8.27
At the other extreme, we obtain a risk measure that is too stringent for
practical use if we define
ρmax (X) = −ess inf X.
The right-hand side means that we can have X(ω) < −ess inf X only on
a P-null set. The requirement ρmax (X) ≤ 0 therefore means that this risk
measure allows negative positions X(ω) only for a P-null set of ω in Ω.
Hence ρmax (X) = inf{m ∈ R : X + m ≥ 0 P-a.s.}.

Exercise 8.18 Show that ρmax is coherent.

A potentially more useful risk measure is given by fixing α ∈ (0, 1) and


150 Coherent measures of risk
taking R to include all conditional distributions P(·|A), as is done in the
following definition.
Definition 8.28
Let
( )
P(B ∩ A)
Rα = QA |A is measurable, P(A) > α, and QA (B) = P(B|A) = .
P(A)
We call
WCEα (X) = sup −EQA (X)|QA ∈ Rα


the worst conditional expectation (WCE) at level α.


By its definition and Proposition 8.25, WCEα is a coherent risk measure.

Exercise 8.19 Consider the probability space (Ω, F , P) and the ran-
dom variables X, Y defined in Example 8.13. Verify that WCEα (X) =
WCEα (Y) = 50, and AVaRα (X) = AVaRα (Y) = 60, when α = 0.05.
Verify that in this example, WCE is additive. Compare the risk mea-
sures VaR, TCE, WCE and AVaR for X.

We obtain the following inequalities from our definitions of risk mea-


sures explored so far (recall that TCEα and WCEα were defined in Defini-
tions 8.12 and 8.28, respectively).
Proposition 8.29
For any X we have
AVaRα (X) ≥ WCEα (X) ≥ TCEα (X) ≥ VaRα (X).
When F X is continuous at α, the first three quantities coincide.
Proof Since
Z
P(B ∩ A) 1
QA (B) = = 1A dP,
P(A) P(A) B

we see that
dQA 1A
= .
dP P(A)
Taking any A satisfying P(A) > α, we see that dQ
dP
A
≤ α1 , so
( )
dQ 1
QA ∈ Pα = Q : Q  P, ≤ ,
dP α
8.4 Coherence 151
hence
AVaRα (X) = sup −EQ (X) ≥ −EQA (X) = WCEα (X).
 
sup
Q∈Pα QA ,P(A)>α

This proves the first inequality.


For the second, let ε > 0 be given. Since qα (X) = inf{x : F X (x) > α} and
F X is non-decreasing,
α < P(X ≤ qα (X) + ε),
so that Aε = {X ≤ qα (X) + ε} has probability P(Aε ) > α, which means that
WCEα (X) ≥ −EQAε (X) = −E (X|X ≤ qα (X) + ε) ,
for all ε > 0. Letting ε ↓ 0 we have WCEα (X) ≥ TCEα (X).
The final inequality follows by taking B = {X ≤ −VaRα (X)} and com-
puting
TCEα (X) = −E (X|X α
R ≤ −VaR (X)) (by 8.18)
= − P(B) RB XdP
1

1
≥ − P(B) B
−VaRα (X)dP (on B, X ≤ −VaRα (X))
α
= VaR (X). (since VaRα (X) is a constant)
In (8.19) we have shown that when F X is continuous, AVaRα (X) =
TCEα (X), hence both equal WCEα (X). 
One potential difficulty with AVaR is that it restricts attention to the α-
tail of the distribution function F X rather than taking the whole distribution
of X into account. Moreover, in taking averages it assigns the same weight
to any qβ (X) for β < α. A natural route to more general risk measures is to
assign different weights to different β.
Definition 8.30
Let ϕ : (0, 1) → R be a non-negative, non-increasing function satisfying
Z 1
ϕ(x)dx = 1.
0

We define
Z 1
ϕ
ρ (X) = − qβ (X)ϕ(β)dβ
0

as the spectral risk measure for ϕ.


152 Coherent measures of risk

Example 8.31
For α ∈ (0, 1) we recover AVaRα (X) by choosing ϕ(β) = α1 1[0,α] (β), since
1 α β
Z 1 Z
β
− q (X)ϕ(β)dβ = − q (X)dβ = AVaRα (X).
0 α 0

The function ϕ is also called a risk-aversion function, since it reflects


the investor’s attitude to risk by assigning weights (adding to 1) to the val-
ues in the distribution F X . In the case of AVaRα (X) these weights are sim-
ply uniformly distributed over the left α-tail of F X , and are zero elsewhere.
The requirement that the weighting function ϕ should be non-negative is
obvious. That it is non-increasing suggests that a rational investor would
be more concerned about worse outcomes in an assessment of risk. Thus
a coherent risk measure should assign greater weight to worse potential
outcomes.

Theorem 8.32
A spectral risk measure ρϕ is coherent.

Proof We recast ρϕ in the form ρµ as defined in Proposition 8.23, which


will prove coherence. For this, we consider the family {ρα ; α ∈ (0, 1)} of co-
herent risk measures with ρα = AVaRα and construct an appropriate prob-
ability measure µ on (0, 1).
First, given a function ϕ as in Definition 8.30, define a set function ν on
intervals in (0, 1) by letting, for 0 < x < 1,

v((x, 1)) = ϕ(x). (8.31)

and, for 0 < a < b < 1, setting ν((a, b]) = ϕ(a) − ϕ(b). This defines ν as an
additive set function on intervals (a, b] ⊂ (0, 1), which extends to a unique
measure ν on all Borel sets A in (0, 1). Now set
Z
µ(A) = xdν(x).
A

For pairs (x, y), read the inequalities 0 < y < x < 1 from left to right and
right to left respectively, to obtain

1(0,x) (y) = 1(y,1) (x). (8.32)


8.4 Coherence 153
Hence, using Fubini’s theorem, we obtain
Z
µ((0, 1)) = xdν(x)
(0,1)
Z Z !
= 1(0,x) (y)dy dν(x)
(0,1) (0,1)
Z Z !
= 1(0,x) (y)dν(x) dy (Fubini’s theorem)
(0,1) (0,1)
Z Z !
= 1(y,1) (x)dν(x) dy (by (8.32))
(0,1) (0,1)
Z Z !
= dν(x) dy
(0,1) (y,1)
Z
= ϕ(y)dy (by (8.31))
(0,1)
= 1. (by Definition 8.30)
Hence µ is a probability measure on (0, 1), so that ρµ is coherent by Propo-
sition 8.23.
We have dµ(α) = αdν(α), and
ρµ (X) = (0,1) AVaRα (X)dµ(α)
R

= (0,1) AVaRα (X)αdv(α)


R

= (0,1) − (0,α) qβ (X)dβ dv(α)


R  R 

= − (0,1) (0,1) 1(0,α) (β)qβ (X)dβ dv(α)


R R 

= − (0,1) (0,1) 1(0,α) (β)qβ (X)dv(α) dβ


R R 
(Fubini’s theorem)
β
R 
= − (0,1) q (X) (0,1) 1(β,1) (α)dv(α) dβ
R
(by (8.32))
= − (0,1) qβ (X) (β,1) dv(α) dβ
R R 

= − (0,1) qβ (X)ϕ(β)dβ
R
(by (8.31))
= ρϕ (X),
hence the theorem is proved. 
The flexibility inherent in the choice of ϕ means that individual’s sub-
jective risk profiles can be mapped onto spectral risk measures to obtain
different assessments of risk. We content ourselves with just one example.
154 Coherent measures of risk

Example 8.33
Recall the exponential utility function u(x) = −e−ax introduced in Chapter
6, where a is the investor’s absolute risk aversion coefficient. We obtain
the corresponding weighting function in the form ϕ(x) = ke−ax , since with
k > 0 we have ϕ ≥ 0 and ϕ is (strictly) decreasing on [0, 1]. To ensure
that it is an admissible risk spectrum, we simply need to choose k such that
R1
0
ϕ(t)dt = 1, which forces k = 1−ea −a . The spectral risk measure
Z
ϕ a
ρ (X) = (−qβ (X))e−aβ dβ
1 − e−a (0,1)
thus takes account of the investor’s risk aversion by giving most weight to
the worst outcomes.

8.5 Proofs
Lemma 8.4
Let X : Ω → R be a random variable. Assume that U is a uniformly
distributed random variable on (0, 1). Then the random variable Y, defined
by Y(x) = qU(x) (X), has the same distribution as X.

Proof Let us use a notation g : (0, 1) → R for

g(α) = qα (X).

Then Y = g(U).
Since U is a uniformly distributed random variable on (0, 1), for any
Borel set A ⊂ (0, 1) the probability that U is in A is

Prob(U ∈ A) = m(A),

where m stands for the Lebesgue measure.


Let y ∈ R be fixed. There can exist at most one α such that

g(α) = qα (X) = y.

(There is a possibility that such α does not exist. This is when y lies below
the flat part of the distribution function F X (y); see Figure 7.1 on page 100.)
This means that the pre-image g−1 (y) consists of at most a single point,
8.5 Proofs 155
hence
Prob(g(U) = y) = Prob(U ∈ g−1 (y)) = m(g−1 (y)) = 0. (8.33)
By the definition of the upper quantile, i.e.
qα (X) = inf{x : α < F X (x)}, (8.34)
we see that if α < F X (x) then qα (X) ≤ x. This means that
{α : α < F X (y)} ⊂ {α : qα (X) ≤ y} = {α : g(α) ≤ y} , (8.35)
hence
FY (y) = Prob (Y ≤ y)
= Prob(g(U) ≤ y)
≥ Prob(U < F X (y)) (by (8.35))
= F X (y).
Again, by the definition of qα (X) (see (8.34)), we see that if qα (X) < x
then α < F X (x), hence
{α : g(α) < y} = {α : qα (X) < y} ⊂ {α : α < F X (y)} . (8.36)
This gives
FY (y) = Prob (Y ≤ y)
= Prob(g(U) ≤ y)
= Prob(g(U) < y) + Prob(g(U) = y)
= Prob(g(U) < y) (by (8.33))
≤ Prob(U < F X (y)) (by (8.36))
= F X (y).
We have shown that FY (y) = F X (y), which concludes our proof. 
Corollary 8.7
If X is a random variable whose distribution function F X is strictly increas-
ing and continuous, then
AVaRα (X) = −E(X|X ≤ qα (X)).
Proof Since F X is continuous, for any q ∈ R,
P(X = q) = 0. (8.37)
By Lemma 7.5
qα (X) = F X−1 (α), (8.38)
156 Coherent measures of risk
hence
P (X < qα (X)) = P(X ≤ qα (X)) − P(X = qα (X))
= P(X ≤ qα (X)) (using (8.37)) (8.39)
= F X (qα (X))
= α. (using (8.38))
Substituting into (8.2) gives
1h
AVaRα (X) = − E(X1{X<qα (X)} ) + qα (X)(α − P(X < qα (X))
i
α
1
= − E(X1{X<qα (X)} )
α
1
= − E(X1{X≤qα (X)} ) (since by (8.37), P(X = qα (X)) = 0)
α
1
=− E(X1{X≤qα (X)} ) (from (8.38))
P (X ≤ qα (X))
= −E(X|X ≤ qα (X)),
as required. 
Lemma 8.8
For α ∈ (0, 1), let qα = qα (X) and set
if P(X = qα ) = 0,
(
α 1{X<qα }
1X =
1{X<qα } + κ1{X=qα } if P(X = qα ) > 0,
α−P(X<qα )
where κ = P(X=qα )
. Then
E(1αX ) = α, (8.40)
and for all ω ∈ Ω,
1αX (ω) ∈ [0, 1].
Proof For (8.40) observe that if P(X = qα ) = 0 then
E(1αX ) = P(X < qα ) = P(X ≤ qα ) = α,
while if P(X > qα ) > 0 we have
E(1αX ) = P(X < qα ) + α − P(X < qα ) = α.
To prove the second claim we start by observing that when P(X = qα ) =
0, then 1αX = 1{X<qα } ∈ {0, 1}. If P(X = qα ) > 0 and ω < {X = qα } then
1αX (ω) = 1{X<qα } (ω) ∈ {0, 1}.
8.5 Proofs 157
The only non-trivial case is when P(X = qα ) > 0 and ω ∈ {X = qα }.
In such a case (using the standard notation F X (x− ) = limy%x F X (y)),
P(X = qα ) = F X (qα ) − F X (qα− ),
so that for ω ∈ {X = qα },
α − F X (qα− )
1αX (ω) = κ = . (8.41)
F X (qα ) − F X (qα− )
By definition
qα = inf{x : α < F X (x)},
and we see that for any q < qα we have α ≥ F X (q), hence
α ≥ F X (qα− ).
For any q > qα , α < F X (q), and by right continuity of F X we have
α ≤ F X (qα ).
We have shown that α ∈ [F X (qα− ), F X (qα )], hence the quotient from (8.41)
lies in [0, 1]. 
Index

acceptable position, 147 measure


arbitrage opportunity, 81 absolutely continuous, 131, 148
attainable set, 18, 49 minimum variance
geometry of, 52, 53 line, 57, 61, 65
Average Value at Risk (AVaR), 125 portfolio, 53, 56
beta, 68 Monte Carlo simulation, 129
Black–Scholes model, 109, 136 No Arbitrage Principle, 76, 81
bond, 5 portfolio
capital market line, 25, 64 market, 63
CAPM, 67, 68, 92 minimum variance, 24, 53, 56
certainty equivalent, 95 return, 13
complete market, 84 value, 12
conditional value at risk, 126 variance, 17
constrained minimisation, 40 weights, 12, 48
covariance matrix, 24, 49 positive semidefinite, 43
Cramer’s rule, 24 preference relation, 78
distribution completeness of, 78
binomial, 3 transitivity of, 78
lognormal, 3 quantile, 101
standard normal, 3, 107 lower, 101
uniform, 127 upper, 101
diversification, 12, 106 Radon–Nikodym
dividend, 4 derivative, 131
drift, 109 theorem, 131
efficient return, 4
frontier, 20, 66 abnormal, 74
portfolio, 20 expected, 4, 49
subset, 8 market, 93
European put option, 110, 139 required, 72
expected shortfall, 126 risk-free, 5
feasible consumption set, 78 variance of, 6
forward contract, 112 returns
forward price, 112, 138 correlation coefficient of, 16
gradient, 41 covariance of, 15, 51
Hessian matrix, 43, 54 risk
implicit function theorem, 44 downside, 9
indifference curve, 29, 97 measures, 10
Jacobian matrix, 41 premium, 64, 70
Jensen index, 72 specific, 68
Lagrange multipliers, 41 systematic, 68
Lagrangian, 41 risk aversion, 95
line of best fit, 74 coefficient of absolute, 96, 154
market equilibrium, 72 risk measure
market portfolio, 25, 63 coherent, 147

159
160 Index
convex, 147 Taylor formula, 46
spectral, 151 two-fund theorem, 60
risk-aversion utility, 78
function, 152 expected, 86
risk-free asset, 22, 25
exponential, 80
risk-neutral probability, 84
scenario, 2 function, 79
security logarithmic, 80
characteristic line, 74 maximisation, 76, 80, 84
dominated, 8 power, 80
market line, 72 quadratic, 80, 92
semi-variance, 9 von Neumann–Morgenstern, 80
Sharpe ratio, 73 utility function
short-selling, 14, 49, 53 exponential, 92, 97, 154
standard deviation, 6
logarithmic, 87, 91
state, 2
state prices, 82 Value at Risk (VaR), 10, 98, 103
strike price, 110 volatility, 109
sub-additive, 126, 133 weighting function, 152, 154
tail conditional expectation (TCE), worst conditional expectation (WCE),
134 150

You might also like