Cambridge Books Online
Cambridge Books Online
http://ebooks.cambridge.org/
Mathematical Finance
Chapter
Let us consider a situation where we select randomly and independently n samples from
a population whose distribution has mean μ and variance σ 2 . The set of such samples,
denoted as (x1 , x2 , . . . , xn ), is referred to as a random sample of size n. Random sam-
ples are an important foundation of statistical theory, because a majority of the results
known in mathematical statistics rely on assumptions that are consistent with a random
sample. Let us start with a simple question: How can we estimate the population mean
μ X and population variance σ X2 from the random sample (x1 , x2 , . . . , xn )?
The sample mean (also called the empirical average) x is defined as
1
n
x= xi . (6.1)
n
i=1
1
n
X= Xi . (6.2)
n
i=1
In fact the term “sample mean” is often used in statistical theory to describe the variable
X , but the quantity we can actually observe is its instance x.
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.1 Sample mean and sample variance 139
1
n
E[X ] = E[X i ]. (6.3)
n
i=1
E[X ] = μ X , (6.4)
1 1
n n
X − E[X ] = (X i − μ X ) = Yi , (6.6)
n n
i=1 i=1
where
Yi X i − μ X , i = 1, 2, . . . , n.
Therefore,
⎡ 2 ⎤
1 1
n n n n
Var [X ] = E ⎣ Yi ⎦= 1 E[Yi
2
] + E[Yi Y j ]. (6.7)
n n2 n2
i=1 i=1 i=1 j=1( j=i)
Since the random variables {Yi ; 1 ≤ i ≤ n} are statistically independent with zero mean
and variance σ X2 , we have
σ X2
Var [X ] = . (6.8)
n
Thus, the variance of the sample mean variable is the population variance divided by
the sample size.
The deviations of the individual observations from the sample mean provide infor-
mation about the dispersion of the xi about x. We define the sample variance
sx2 by
1
n
sx2 (xi − x)2 . (6.9)
n−1
i=1
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
140 Fundamentals of statistical data analysis
1
n
S X2 (X i − X )2 , (6.10)
n−1
i=1
which is also commonly called the sample variance. We find, after some rearrangement
(Problem 6.1),
1 2
n n n
1
S X2 = Yi − Yi Y j . (6.11)
n n(n − 1)
i=1 i=1 j=1( j=i)
1
n
E[S X2 ] = E[Yi2 ] = σ X2 . (6.12)
n
i=1
The reason for using n − 1 rather than n as the divisor in (6.9) is to make E[S X2 ] equal
to σ X2 ; that is, to make s 2 an unbiased estimate of σ X2 . The positive square root of the
sample variance, sx , is called the sample standard deviation.
When the observed data takes on discrete values, we can just count the number of occur-
rences for the individual values. Suppose that the sample size n is given and k(≤ n)
distinct values exist. Let n j be the number of times that the jth value is observed,
1 ≤ j ≤ k. Then the fraction
nj
fj = , j = 1, 2, . . . , k, (6.13)
n
j c j − c j−1 , j = 1, 2, . . . , k,
need not be equal. Let n j denote the number of observations that fall in the jth class
interval. Then the relative frequency of the jth class takes the same form as (6.13).
The grouped distribution may be represented graphically as the following “staircase
function” in an (x, h)-coordinate system:
fj nj
h(x) = = , for x ∈ (c j−1 , c j ], j = 1, 2, . . . , k. (6.14)
j n
j
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.3 Graphical presentations 141
Such a diagram is called a histogram and can be regarded as an estimate of the PDF
of the population. If the class lengths
j are all the same, the shape of the histogram
remains unchanged whether we use the relative frequency of the classes { f j } or the
frequency counts of the classes {n j } as the ordinate. Such diagrams are also called
histograms.
The choice of the class intervals in the histogram representation is by no means trivial.
Certainly, we should choose them in such a way that the characteristic features of the
distribution are emphasized and chance variations are obscured. If the class lengths
are too small, chance variations dominate because each interval includes only a small
number of observations. On the other hand, if the class lengths are too large, a great deal
of information concerning the characteristics of the distribution will be lost.
Let {xk : 1 ≤ k ≤ n} denote n observations in the order observed and let {x(i) : 1 ≤
i ≤ n} denote the same observations ranked in order of magnitude. The frequency H (x)
of observations that are smaller than or equal to x is called the cumulative relative
frequency, and is given by
⎧
⎨ 0, for x < x(1) ,
H (x) = i
, for x(i) ≤ x < x(i+1) , i = 1, 2, . . . , n − 1, (6.15)
⎩ n
1, for x ≥ x(n) ,
which is the empirical analog of the CDF FX (x). If we use the unit step function
1, for x ≥ 0,
u(x) = (6.16)
0, for x < 0,
the above H (x) can be more concisely written as
1 1
n n
H (x) = u(x − x(i) ) = u(x − xk ), − ∞ < x < ∞. (6.17)
n n
i=1 k=1
Interestingly enough, use of the unit step function makes it unnecessary to obtain the
rank-ordered data in order to find H (x). The graphical plot of H (x) is a nondecreasing
step curve, which increases from zero to one in “jumps” of 1/n at points x = x(1) ,
x(2) , . . ., x(n) . If several observations take on the same value, the jump is a multiple
of 1/n.
When grouped data are presented as a cumulative relative frequency distribution, it
is usually called the cumulative histogram. The cumulative histogram is far less sensi-
tive to variations in class lengths than the histogram. This is because the accumulation is
essentially equivalent to integration along the x-axis, which filters out the chance vari-
ations contained in the histogram. The cumulative relative frequency distribution or the
cumulative histogram is, therefore, quite helpful in portraying the gross features of data.
Reducing primary data to the sample mean, sample variance, and histogram can reveal
a great amount of information concerning the nature of the population distribution. But
sometimes important features of the underlying distribution are obscured or hidden by
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
142 Fundamentals of statistical data analysis
the data reduction procedures. In this section we will discuss some graphical methods
that are often valuable in an exploratory analysis of measurement data. They are: (a) his-
tograms on the probability or log-normal probability papers; (b) the survivor functions
on log-linear and log-log papers; and (c) the dot diagram and correlation coefficient.
P = F(x) (6.18)
provides the dependence of the cumulative distribution on the variable x. The inverse
function
x P = F −1 (P) (6.19)
gives the value of the variable x that corresponds to the given cumulative probability
P. The value x P is called the P-fractile. Some authors use the terms percentile or
quantile instead of the term fractile.
The distribution function of the standard normal distribution N (0, 1) is often
denoted by (·) as defined in (4.46):
u 2
1 t
(u) = √ exp − dt. (6.20)
2π −∞ 2
Then the fractile, u P , of the distribution N (0, 1) is derived as
Suppose that for a given cumulative relative frequency H (x) we wish to test whether
this empirical distribution resembles a normal distribution; that is, to test whether
x −μ
H (x) ∼= (6.22)
σ
holds for some parameters μ and σ , where the symbol ∼ = means “to have the distribution
of.” Testing this relation is equivalent to testing the relation
x −μ
u H (x) ≈ . (6.23)
σ
1 This term should not be confused with a similar term “fractal diagram” known in fractal geometry.
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.3 Graphical presentations 143
According to the definition of H (x), the plot of u H (x) versus x forms a step (or staircase)
curve:
⎧
⎨ −∞, for x < x(1) ,
u H (x) = u , for x(i) ≤ x < x(i+1) , i = 1, 2, . . . , n − 1, (6.24)
⎩ i/n
∞, for x ≥ x(n) .
u = u H (x) (6.25)
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
144 Fundamentals of statistical data analysis
3 P ⫻ 100
99
2
95
90
1
70
uP
0 50
30
−1
10
5
−2 2
0.5
−3
−3 −2 −1 0 1 2 3
X
(a)
3 P ⫻ 100
99
2
95
90
1
70
uP
0 50
30
−1
10
5
−2 2
0.5
−3
−3 −2 −1 0 1 2 3
X
(b)
Figure 6.1 The fractile diagram of normal variates: (a) step curve; (b) dot diagram.
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.3 Graphical presentations 145
3 P ⫻ 100
99
2
95
90
1
70
uP
0 50
30
−1
10
5
−2 2
0.5
−3
100 101 102
X
(a)
3 P ⫻ 100
99
2
95
90
1
70
uP
0 50
30
−1
10
5
−2 2
0.5
−3
100 101 102
X
(b)
Figure 6.2 The fractile diagram of log-normal variates: (a) step curve; (b) dot diagram.
is often called the survivor function, or the survival function in reliability theory. It
is equivalent to the complementary distribution function FXc (t) defined earlier.
The natural logarithm of (6.27) is known as the log-survivor function or the log-
survival function (Cox and Lewis [71]):
The log-survivor function will show the details of the tail end of the distribution more
effectively than the distribution itself.
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
146 Fundamentals of statistical data analysis
If, for instance, FX (x) is an exponential distribution with mean 1/α, then its log-
survivor function is a straight line: S X (t) = log e−αx = −αx.
If FX (x) is a mixed exponential distribution (or hyperexponential distribution)
then its log-survivor function has two asymptotic straight lines, since
where H (t) represents the cumulative relative frequency (ungrouped data) or the
cumulative histogram (grouped data). In the ungrouped case we find from (6.15) that
i
log 1 − , 1 ≤ i ≤ n, (6.32)
n
should be plotted against x(i) , where the subscript (i) represents the rank as in (6.15).
In order to avoid difficulties at i = n, we may sometimes modify (6.32) into
i
log 1 − , 1 ≤ i ≤ n. (6.33)
n+1
As an example, Figure 6.3 plots the log-survivor function using a sample of size 1000
drawn from the above hyperexponential distribution with parameters
Out of the 1000 samples taken, 18 sample points that exceed x = 10 fall outside the
scale of the figure; hence they are not shown. The asymptotes of (6.30) can be easily
recognized from this log-survivor function.
Characteristically, the log-survivor function of the mixed exponential distribution
(6.29) is convex with a linear tail. Observations of (or departures from) such char-
acteristic shapes are used to postulate a functional form for a distribution. See Gaver
et al. [115] and Lewis and Shedler [225].
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.3 Graphical presentations 147
−1
−2
−3
log[1−H(t )]
−4
−5
−6
−7
−8
−9
−10
0 2 4 6 8 10
t
Figure 6.3 The log-survivor function of a mixed-exponential (or hyperexponential) distribution with π1 = 0.0526,
π2 = 1 − π1 , α1 = 0.1, and α2 = 2.0.
−1
−2
−3
log[1−H(t )]
−4
−5
−6
−7
−8
−9
−10
1 2 3 4 5 6 7 8 9 10
t
Figure 6.4 The log-survivor function of a mixed Pareto distribution, β1 = β2 = 1, π2 = 1 − π1 ,
α1 = 1.5, α2 = 5, and π1 = 0.2.
β1α1 β2α2
S X (t) = π1 + (−π1 α , 0 < max{β1 , β2 } ≤ t.
) (6.35)
t α1 t 2
As an example, Figure 6.4 plots the log-survivor function of 500 samples drawn from
the mixed Pareto distribution with β1 = β2 = 1, α1 = 1.5, α2 = 5, and π1 = 0.2.
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
148 Fundamentals of statistical data analysis
f X (t) f X (t)
h X (t) = = (6.36)
S X (t) 1 − FX (t)
is called the hazard function or the failure rate, because h X (t) dt represents the prob-
ability that the life will end in the interval (t, t + dt], given that X has survived up to
age t; i.e., X ≥ t. If X represents the service time of a customer, as in queueing theory,
h X (t) is called the completion rate function.
The hazard functions of the exponential, Weibull, Pareto, and log-normal distribu-
tions are given as follows:
⎧
⎪
⎪ λ, t ≥ 0, for exponential,
⎪
⎪ ' (α−1
⎪
⎪ α
⎪
⎪
t
, t ≥ 0, for Weibull,
⎨ βα β
h X (t) = t,
t ≥ β, for Pareto, (6.37)
⎪
⎪ ( log t−μY )2
⎪
⎪ t −1 exp −
⎪
⎪
2
2σY
⎪ # ∞ exp − (u−μY )2 du , t > 0, for log-normal,
⎪
⎩ log t 2σY2
R = X −t (6.40)
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.3 Graphical presentations 149
4
Pareto: α = 3, β = 1
3.5 Weibull: α = 1.5, β = 1
2.5
RX (t )
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4
t
Figure 6.5 The mean residual life curves of a Pareto distribution with α = 3.0 and β = 1.0, and a Weibull
distribution with α = 1.5 and β = 1.0.
the residual life conditioned on X > t. Then the mean residual life function is
given by
#∞
t S X (u) du
R X (t) = E[R|X > t] = . (6.41)
S X (t)
as expected. Figure 6.5 shows mean residual life curves of a Pareto distribution and a
Weibull distribution.
(xi , yi ), 1 ≤ i ≤ n, (6.43)
is to plot the points (xi , yi ) one by one as coordinates. Such a diagram is called a dot
or scatter diagram. The density of dots in a given region is proportional to the relative
frequency of the pairs (X, Y ) in the region.
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
150 Fundamentals of statistical data analysis
Example 6.1: Scatter diagram of Internet distances [134, 364]. The approximate geo-
graphic distance between a pair of Internet hosts can be inferred by sending probe
packets between the two hosts and measuring the round-trip delays experienced by the
probes. The relationship between geographic distance g and round-trip delay d from a
given Internet host to Internet hosts can be characterized by a scatter diagram consist-
ing of points (g, d). Owing to the inherent randomness in round-trip delays over the
Internet, delay measurements taken between a given pair of hosts separated by a fixed
geographic distance g at different times yield different delays d.
The scatter diagram in Figure 6.6 was obtained by sending probe packets from a
host at Stanford University to 79 other hosts on the Internet across the USA [364]. The
line labeled baseline provides a lower bound on the d as a function of g based on the
observation that the packet propagation speed over the Internet is at most the speed
of light through an optical fiber. If the refractive index of the fiber is denoted by η,
the propagation speed of the optical signal is v = c/η, where c is the speed of light in
vacuo. Typically, the value of η is slightly less than 1.5, so we make the approxima-
tion v ≈ 2c/3. If the round-trip delay between a pair of hosts is measured to be d, the
corresponding (one-way) geographical distance is upper bounded by ĝ = vd/2 ≈ cd/3.
When the unit of time is milliseconds and the unit of geographical distance is kilometers,
c ≈ 300 km/ms, so d and ĝ can be related approximately by
1
d≈ ĝ, (6.44)
100
which is the equation of baseline in Figure 6.6.
Since packets generally traverse multiple hops between two hosts and experience
queueing and processing delays at each hop, the measured round-trip delay will
100
90
80
70
delay d (km)
60
50
40
30
20
10 baseline
bestline
0
0 500 1000 1500 2000 2500 3000 3500 4000 4500
geographical distance g (km)
Figure 6.6 Scatter diagram of delay measurements from Internet host at Stanford University to 79 other
hosts across the USA [364].
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.3 Graphical presentations 151
typically be much larger than the delay predicted by the equation of the baseline
in (6.44). Gueye et al. [134] propose a tighter linear bound determined by solving a
linear programming problem that minimizes the slope and y-intercept of the line subject
to the constraints imposed by the set of scatter points. This deterministic bound corre-
sponds to the line labeled bestline in Figure 6.6. An alternative approach that retains
more of the statistical information captured by the scatter points is discussed in [364].
The most frequently used measure of statistical association between a pair of vari-
ables is the correlation coefficient. For a given pair of random variables X and Y , the
covariance of X and Y , written Cov[X, Y ] or σ X Y , is defined as
σX Y
ρX Y = . (6.46)
σ X σY
− 1 ≤ ρ X Y ≤ 1. (6.47)
We say that X and Y are properly linearly dependent if there exist nonzero
constants a and b such that a X − bY is a constant c; that is,
P[a X − bY = c] = 1. (6.48)
Therefore,
Var[a X − bY − c] = 0, (6.49)
ρ X Y = +1 or − 1 (6.50)
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
152 Fundamentals of statistical data analysis
1
n
sx y = (xi − x)(yi − y)
n−1
i=1
(6.52)
1
n
nx y
= xi yi − ,
n−1 n−1
i=1
where x and y are the sample means of {xi } and {yi } respectively. The sample
correlation coefficient is defined accordingly:
sx y
rx y = , (6.53)
sx s y
where sx2 and s y2 are the sample variances of {xi } and {yi } respectively.
Most textbooks on probability theory and mathematical statistics do not seem to deal
with graphical presentations of real data. We consider that this is an unfortunate state
of affairs. Various types of graphical presentations of collected data should be explored
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.6 Problems 153
6.6 Problems
x = x n and s 2 = sn2 .
(a) Derive the following recursive formula for the sample mean:
xi − x i−1
x i = x i−1 + , i ≥1
i
with the initial value
x 0 = 0.
(b) Similarly, show the recursive formula for the sample variance:
i −2 2 (xi − x i−1 )2
si2 = si−1 + , i > 1,
i −1 i
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
154 Fundamentals of statistical data analysis
s02 = s12 = 0.
6.4 Expectation and variance of the cumulative histogram. Find expressions for
the expectation and variance of H j (the cumulative histogram in the jth interval) in
terms of the underlying distribution function FX (x). Explain why the shape of the
cumulative histogram is rather insensitive to the choice of class lengths {
j }.
(a) constant a;
(b) uniformly distributed in [a, b].
6.7 Hazard function and distribution functions. Show that the distribution function
FX (x) is given in terms of the corresponding hazard function h X (x) as follows:
#x
FX (x) = 1 − e− 0 h X (t) dt
, x ≥ 0, (6.54)
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.6 Problems 155
and hence
#x
f X (x) = h X (x)e− 0 h X (t) dt
. (6.55)
6.11∗ Mean residual life function and the hazard function. Show that the mean
residual life function R X (t) is a monotone-decreasing function if and only if the hazard
function h X (t) is monotone increasing.
Hint: Consider the conditional survivor function of R = X − t, given that X is greater
than t, defined by
and find its relations with the hazard function h X (t) and the mean residual life function.
6.12∗ Conditional survivor and mean residual life functions for standard Weibull
distribution.
(a) Find the conditional survivor function S X (r |t) (see Problem 6.11) of the standard
Weibull distribution.
(b) Find the mean residual life function R X (t) for the standard Weibull distribution.
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
156 Fundamentals of statistical data analysis
6.14 Mean residual life functions – continued. Find an expression for the mean
residual life function R X (t) for each of the following distributions:
(a) Pareto distribution with parameters α > 1 and β > 0.
(b) Two-parameter Weibull distribution with parameters α and β.
6.15∗ Covariance between two RVs. Suppose that RVs X and Y are functionally
related according to
Y = cos X.
6.17 Correlation coefficient – continued. Show that if ρ = ±1, then (6.51) holds.
6.18∗ Sample covariance. Show that the sample covariance s X Y defined by (6.52) is
an unbiased estimate of the covariance σ X Y .
6.19 Recursive formula for sample covariance. Generalize the recursive computa-
tion formula of Problem 6.2 to the sample covariance.
Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016