0% found this document useful (0 votes)
67 views

Cambridge Books Online

Uploaded by

Nouman Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Cambridge Books Online

Uploaded by

Nouman Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Cambridge Books Online

http://ebooks.cambridge.org/

Probability, Random Processes, and Statistical Analysis

Applications to Communications, Signal Processing, Queueing Theory and

Mathematical Finance

Hisashi Kobayashi, Brian L. Mark, William Turin

Book DOI: http://dx.doi.org/10.1017/CBO9780511977770

Online ISBN: 9780511977770

Hardback ISBN: 9780521895446

Chapter

6 - Fundamentals of statistical data analysis pp. 138-156

Chapter DOI: http://dx.doi.org/10.1017/CBO9780511977770.007

Cambridge University Press


6 Fundamentals of statistical
data analysis

The theory of statistics involves interpreting a set of finite observations as a sample


point drawn at random from a sample space. The study of statistics has the following
three objectives: (i) to make the best estimate of important parameters of the popula-
tion; (ii) to assess the uncertainty of the estimate; and (iii) to reduce a bulk of data to
understandable forms. In much the same way as an examination of the properties of
probability distribution functions forms the basic theory of probability, the foundation
of statistical analysis is to examine the empirical distributions and certain descriptive
measures associated with them. This chapter provides basic concepts of statistical data
analysis.

6.1 Sample mean and sample variance

Let us consider a situation where we select randomly and independently n samples from
a population whose distribution has mean μ and variance σ 2 . The set of such samples,
denoted as (x1 , x2 , . . . , xn ), is referred to as a random sample of size n. Random sam-
ples are an important foundation of statistical theory, because a majority of the results
known in mathematical statistics rely on assumptions that are consistent with a random
sample. Let us start with a simple question: How can we estimate the population mean
μ X and population variance σ X2 from the random sample (x1 , x2 , . . . , xn )?
The sample mean (also called the empirical average) x is defined as

1
n
x= xi . (6.1)
n
i=1

Each sample xi can be viewed as an instance or realization of the associated RV X i .


Thus, the sample mean x is an instance of the sample mean variable X defined by

1
n
X= Xi . (6.2)
n
i=1

In fact the term “sample mean” is often used in statistical theory to describe the variable
X , but the quantity we can actually observe is its instance x.

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.1 Sample mean and sample variance 139

Taking the expectation of both sides in (6.2), we have

1
n
E[X ] = E[X i ]. (6.3)
n
i=1

Since X 1 , X 2 , . . ., X n are independent and identically distributed (i.i.d.) RVs, we have


E[X 1 ] = E[X 2 ] = · · · = E[X n ] = μ X . Substituting these values into (6.3), we find
that the expectation of the sample mean variable satisfies

E[X ] = μ X , (6.4)

which asserts that x of (6.1) is an unbiased estimate of μ X . An unbiased estimate is


one that is, on the average, right on target.
Consider the variance of X :

Var [X ] = E[(X − E[X ])2 ], (6.5)

where X − E[X ] = X − μ X can be rewritten as

1 1
n n
X − E[X ] = (X i − μ X ) = Yi , (6.6)
n n
i=1 i=1

where

Yi  X i − μ X , i = 1, 2, . . . , n.

Therefore,
⎡ 2 ⎤
1  1  
n n n n
Var [X ] = E ⎣ Yi ⎦= 1 E[Yi
2
] + E[Yi Y j ]. (6.7)
n n2 n2
i=1 i=1 i=1 j=1( j =i)

Since the random variables {Yi ; 1 ≤ i ≤ n} are statistically independent with zero mean
and variance σ X2 , we have

σ X2
Var [X ] = . (6.8)
n

Thus, the variance of the sample mean variable is the population variance divided by
the sample size.
The deviations of the individual observations from the sample mean provide infor-
mation about the dispersion of the xi about x. We define the sample variance
sx2 by

1 
n
sx2  (xi − x)2 . (6.9)
n−1
i=1

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
140 Fundamentals of statistical data analysis

This quantity can be viewed as an instance of the sample variance variable

1 
n
S X2  (X i − X )2 , (6.10)
n−1
i=1

which is also commonly called the sample variance. We find, after some rearrangement
(Problem 6.1),

1 2  
n n n
1
S X2 = Yi − Yi Y j . (6.11)
n n(n − 1)
i=1 i=1 j=1( j =i)

Taking expectations, we have

1
n
E[S X2 ] = E[Yi2 ] = σ X2 . (6.12)
n
i=1

The reason for using n − 1 rather than n as the divisor in (6.9) is to make E[S X2 ] equal
to σ X2 ; that is, to make s 2 an unbiased estimate of σ X2 . The positive square root of the
sample variance, sx , is called the sample standard deviation.

6.2 Relative frequency and histograms

When the observed data takes on discrete values, we can just count the number of occur-
rences for the individual values. Suppose that the sample size n is given and k(≤ n)
distinct values exist. Let n j be the number of times that the jth value is observed,
1 ≤ j ≤ k. Then the fraction

nj
fj = , j = 1, 2, . . . , k, (6.13)
n

is, as defined in (2.1), the relative frequency of the jth value.


When the underlying random variable X is a continuous variable, we often adopt the
method of “grouping” or “classifying” the data: the range of observations is divided
into k intervals, called class intervals, at points c0 , c1 , c2 , . . ., ck . Let us designate the
interval (c j−1 , c j ] as the jth class, 1 ≤ j ≤ k. Note that the lengths of the class intervals


j  c j − c j−1 , j = 1, 2, . . . , k,

need not be equal. Let n j denote the number of observations that fall in the jth class
interval. Then the relative frequency of the jth class takes the same form as (6.13).
The grouped distribution may be represented graphically as the following “staircase
function” in an (x, h)-coordinate system:

fj nj
h(x) = = , for x ∈ (c j−1 , c j ], j = 1, 2, . . . , k. (6.14)

j n
j

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.3 Graphical presentations 141

Such a diagram is called a histogram and can be regarded as an estimate of the PDF
of the population. If the class lengths
j are all the same, the shape of the histogram
remains unchanged whether we use the relative frequency of the classes { f j } or the
frequency counts of the classes {n j } as the ordinate. Such diagrams are also called
histograms.
The choice of the class intervals in the histogram representation is by no means trivial.
Certainly, we should choose them in such a way that the characteristic features of the
distribution are emphasized and chance variations are obscured. If the class lengths
are too small, chance variations dominate because each interval includes only a small
number of observations. On the other hand, if the class lengths are too large, a great deal
of information concerning the characteristics of the distribution will be lost.
Let {xk : 1 ≤ k ≤ n} denote n observations in the order observed and let {x(i) : 1 ≤
i ≤ n} denote the same observations ranked in order of magnitude. The frequency H (x)
of observations that are smaller than or equal to x is called the cumulative relative
frequency, and is given by

⎨ 0, for x < x(1) ,
H (x) = i
, for x(i) ≤ x < x(i+1) , i = 1, 2, . . . , n − 1, (6.15)
⎩ n
1, for x ≥ x(n) ,
which is the empirical analog of the CDF FX (x). If we use the unit step function

1, for x ≥ 0,
u(x) = (6.16)
0, for x < 0,
the above H (x) can be more concisely written as
1 1
n n
H (x) = u(x − x(i) ) = u(x − xk ), − ∞ < x < ∞. (6.17)
n n
i=1 k=1

Interestingly enough, use of the unit step function makes it unnecessary to obtain the
rank-ordered data in order to find H (x). The graphical plot of H (x) is a nondecreasing
step curve, which increases from zero to one in “jumps” of 1/n at points x = x(1) ,
x(2) , . . ., x(n) . If several observations take on the same value, the jump is a multiple
of 1/n.
When grouped data are presented as a cumulative relative frequency distribution, it
is usually called the cumulative histogram. The cumulative histogram is far less sensi-
tive to variations in class lengths than the histogram. This is because the accumulation is
essentially equivalent to integration along the x-axis, which filters out the chance vari-
ations contained in the histogram. The cumulative relative frequency distribution or the
cumulative histogram is, therefore, quite helpful in portraying the gross features of data.

6.3 Graphical presentations

Reducing primary data to the sample mean, sample variance, and histogram can reveal
a great amount of information concerning the nature of the population distribution. But
sometimes important features of the underlying distribution are obscured or hidden by

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
142 Fundamentals of statistical data analysis

the data reduction procedures. In this section we will discuss some graphical methods
that are often valuable in an exploratory analysis of measurement data. They are: (a) his-
tograms on the probability or log-normal probability papers; (b) the survivor functions
on log-linear and log-log papers; and (c) the dot diagram and correlation coefficient.

6.3.1 Histogram on probability paper


6.3.1.1 Testing the normal distribution hypothesis
As we stated in Section 4.2.4, RVs occurring in physical situations often have the nor-
mal (or Gaussian) distribution, or can at least be treated approximately as normal RVs.
As we shall see in subsequent sections, most statistical analysis techniques are based on
the assumption of normality of measured variables. Thus, when we collect measurement
data and obtain some empirical distribution, the first thing we might do is to exam-
ine whether the underlying distribution is normal. A fractile diagram1 (Hald [139]) is
useful for this purpose. For a given distribution function F(x)

P = F(x) (6.18)

provides the dependence of the cumulative distribution on the variable x. The inverse
function

x P = F −1 (P) (6.19)

gives the value of the variable x that corresponds to the given cumulative probability
P. The value x P is called the P-fractile. Some authors use the terms percentile or
quantile instead of the term fractile.
The distribution function of the standard normal distribution N (0, 1) is often
denoted by (·) as defined in (4.46):
 u 2

1 t
(u) = √ exp − dt. (6.20)
2π −∞ 2
Then the fractile, u P , of the distribution N (0, 1) is derived as

u P = −1 (P). (6.21)

Suppose that for a given cumulative relative frequency H (x) we wish to test whether
this empirical distribution resembles a normal distribution; that is, to test whether

x −μ
H (x) ∼= (6.22)
σ
holds for some parameters μ and σ , where the symbol ∼ = means “to have the distribution
of.” Testing this relation is equivalent to testing the relation
x −μ
u H (x) ≈ . (6.23)
σ

1 This term should not be confused with a similar term “fractal diagram” known in fractal geometry.

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.3 Graphical presentations 143

According to the definition of H (x), the plot of u H (x) versus x forms a step (or staircase)
curve:

⎨ −∞, for x < x(1) ,
u H (x) = u , for x(i) ≤ x < x(i+1) , i = 1, 2, . . . , n − 1, (6.24)
⎩ i/n
∞, for x ≥ x(n) .

Therefore, the staircase function plot

u = u H (x) (6.25)

provides an estimate of the straight line


x −μ
u= (6.26)
σ
in the same way that the cumulative frequency distribution y = H (x) forms an estimate
of the CDF y = F(x). The graphical plot of the function (6.25) in an (x, u)-coordinate
system is called the fractile diagram.
Instead of plotting (x, u P ) on ordinary graph paper, we may plot (x, P) directly on
special graph paper called probability paper. On the ordinate axis of a probability
paper, the corresponding values of P = (u) are marked, rather than the u values.
Probability paper is used in the same manner as other special graph papers, such as
logarithmic paper. Figure 6.1 (a) shows a probability paper with step curve u = u H (x) ,
based on n = 50 sample points drawn from a normal distribution with zero mean and
unit variance. Instead of the step curve, we often plot n points (x(i) , (i − 12 )/n) which
are situated at the midpoints of the vertical parts of the step curve. The advantages are
that it is easier to plot n points than to draw a step curve, and that possible systematic
deviations from a straight line are more easily detected from this dot diagram. The result
of this procedure is shown in Figure 6.1 (b).
If the distribution in question is normal, the points of the fractile diagrams should
vary randomly about a straight line. In a small sample, say n < 20, the permissible
random variation of points in the fractile diagram is so large that it is generally difficult
to examine whether systematic deviations from a straight line exist.

6.3.1.2 Testing the log-normal distribution hypothesis


Some random variables we deal with are often modeled by a log-normal distribution
(see Section 7.4). In order to test whether a log-normal distribution fits given empir-
ical data, we should plot the step curve or dot diagram on log-normal paper, which
is a simple modification of the above probability paper. The ordinate axis is the same
as in the probability paper, i.e., u P = −1 (P), whereas the horizontal axis is changed
from the linear scale (in the probability paper) to the logarithmic scale, i.e., log10 x =
log x/ log 10.2 If the empirical data exhibits a straight line on this log-normal probabil-
ity paper, then a log-normal distribution should be a good candidate to represent this
variable. In Figure 6.2 (a) and (b) we plot the step curve and dot diagram respectively

2 We use log to mean the natural logarithm; i.e., log or ln.


e

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
144 Fundamentals of statistical data analysis

3 P ⫻ 100

99
2
95
90
1
70
uP

0 50
30
−1
10
5
−2 2
0.5
−3
−3 −2 −1 0 1 2 3
X
(a)

3 P ⫻ 100

99
2
95
90
1
70
uP

0 50
30
−1
10
5
−2 2
0.5
−3
−3 −2 −1 0 1 2 3
X
(b)
Figure 6.1 The fractile diagram of normal variates: (a) step curve; (b) dot diagram.

of a simulated set of n = 50 sample points xi , where xi = e yi and yi is drawn from


N (2, 4); i.e., μY = 2 and σY = 2. From the results'to be discussed
( in Section 7.4, we
find that μ X = eμY +(σY /2) = e4 and σ X2 = e2μY +σY eσY − 1 = e8 (e4 − 1).
2 2 2

6.3.2 Log-survivor function curve


Suppose that a random variable X represents the life of some item (e.g., light-bulb) or
the interval between failures of some machine. Given the distribution function FX (x) of
the RV X , the probability that X survives time duration t,

S X (t)  P[X > t] = 1 − FX (t), (6.27)

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.3 Graphical presentations 145

3 P ⫻ 100

99
2
95
90
1
70
uP

0 50
30
−1
10
5
−2 2
0.5
−3
100 101 102
X
(a)

3 P ⫻ 100

99
2
95
90
1
70
uP

0 50
30
−1
10
5
−2 2
0.5
−3
100 101 102
X
(b)
Figure 6.2 The fractile diagram of log-normal variates: (a) step curve; (b) dot diagram.

is often called the survivor function, or the survival function in reliability theory. It
is equivalent to the complementary distribution function FXc (t) defined earlier.
The natural logarithm of (6.27) is known as the log-survivor function or the log-
survival function (Cox and Lewis [71]):

log S X (t) = log(1 − FX (t)). (6.28)

The log-survivor function will show the details of the tail end of the distribution more
effectively than the distribution itself.

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
146 Fundamentals of statistical data analysis

If, for instance, FX (x) is an exponential distribution with mean 1/α, then its log-
survivor function is a straight line: S X (t) = log e−αx = −αx.
If FX (x) is a mixed exponential distribution (or hyperexponential distribution)

FX (x) = π1 (1 − e−α1 x ) + π2 (1 − e−α2 x ), α1 > α2 , π1 + π2 = 1, (6.29)

then its log-survivor function has two asymptotic straight lines, since

log S X (t) = log(π1 e−α1 t + π2 e−α2 t )



−α1 t + log π1 , for small t,
≈ (6.30)
−α2 t + log π2 , for large t.

The sample log-survivor function or empirical log-survivor function is similarly


defined as

log[1 − H (t)], (6.31)

where H (t) represents the cumulative relative frequency (ungrouped data) or the
cumulative histogram (grouped data). In the ungrouped case we find from (6.15) that

i
log 1 − , 1 ≤ i ≤ n, (6.32)
n

should be plotted against x(i) , where the subscript (i) represents the rank as in (6.15).
In order to avoid difficulties at i = n, we may sometimes modify (6.32) into

i
log 1 − , 1 ≤ i ≤ n. (6.33)
n+1
As an example, Figure 6.3 plots the log-survivor function using a sample of size 1000
drawn from the above hyperexponential distribution with parameters

π1 = 0.0526, π2 = 1 − π1 , α1 = 0.1, and α2 = 2.0. (6.34)

Out of the 1000 samples taken, 18 sample points that exceed x = 10 fall outside the
scale of the figure; hence they are not shown. The asymptotes of (6.30) can be easily
recognized from this log-survivor function.
Characteristically, the log-survivor function of the mixed exponential distribution
(6.29) is convex with a linear tail. Observations of (or departures from) such char-
acteristic shapes are used to postulate a functional form for a distribution. See Gaver
et al. [115] and Lewis and Shedler [225].

6.3.2.1 Testing the Pareto distribution hypothesis


As discussed in Section 4.2.6, a simple way to examine whether the tail of an empirical
distribution fits the power law of the Pareto distribution is to plot the log-survivor
function on paper with the log-log scale, whereas the log-survivor function curve
discussed above is plotted in the log-linear scale.

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.3 Graphical presentations 147

−1

−2

−3
log[1−H(t )]

−4

−5

−6

−7

−8

−9

−10
0 2 4 6 8 10
t
Figure 6.3 The log-survivor function of a mixed-exponential (or hyperexponential) distribution with π1 = 0.0526,
π2 = 1 − π1 , α1 = 0.1, and α2 = 2.0.

−1

−2

−3
log[1−H(t )]

−4

−5

−6

−7

−8

−9

−10
1 2 3 4 5 6 7 8 9 10
t
Figure 6.4 The log-survivor function of a mixed Pareto distribution, β1 = β2 = 1, π2 = 1 − π1 ,
α1 = 1.5, α2 = 5, and π1 = 0.2.

Analogous to the mixed exponential distribution, a mixed Pareto distribution is


considered:

β1α1 β2α2
S X (t) = π1 + (−π1 α , 0 < max{β1 , β2 } ≤ t.
) (6.35)
t α1 t 2
As an example, Figure 6.4 plots the log-survivor function of 500 samples drawn from
the mixed Pareto distribution with β1 = β2 = 1, α1 = 1.5, α2 = 5, and π1 = 0.2.

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
148 Fundamentals of statistical data analysis

6.3.3 Hazard function and mean residual life curves


Other graphical plots that can be derived from the histogram or distribution function are
the hazard function curve and the mean residual life curve. These notions are also
related to reliability theory and renewal process theory, which will be briefly discussed
in Section 14.3.
Suppose that X represents the life of some item, with the distribution function FX (x).
The function defined by

f X (t) f X (t)
h X (t) = = (6.36)
S X (t) 1 − FX (t)

is called the hazard function or the failure rate, because h X (t) dt represents the prob-
ability that the life will end in the interval (t, t + dt], given that X has survived up to
age t; i.e., X ≥ t. If X represents the service time of a customer, as in queueing theory,
h X (t) is called the completion rate function.
The hazard functions of the exponential, Weibull, Pareto, and log-normal distribu-
tions are given as follows:


⎪ λ, t ≥ 0, for exponential,

⎪ ' (α−1

⎪ α


t
, t ≥ 0, for Weibull,
⎨ βα β
h X (t) = t,
t ≥ β, for Pareto, (6.37)

⎪ ( log t−μY )2

⎪ t −1 exp −


2
2σY
⎪ # ∞ exp − (u−μY )2 du , t > 0, for log-normal,

⎩ log t 2σY2

where in the log-normal distribution the parameters μY and σY are given as


 
1 σ X2
μY = log μ X − log 1 + 2
2 μX
and
 
σ X2
σY2 = log 1 + .
μ2X
From (6.36) we can express the survivor function in terms of the hazard function:
#x
S X (x) = e− 0 h X (t) dt
, x ≥ 0, (6.38)

from which we have


d log S X (t)
h X (t) = − , t ≥ 0. (6.39)
dt
The last equation, of course, could have been readily derived from (6.36).
Given that the service time variable X is greater than t, we call the difference

R = X −t (6.40)

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.3 Graphical presentations 149

4
Pareto: α = 3, β = 1
3.5 Weibull: α = 1.5, β = 1

2.5
RX (t )

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4
t
Figure 6.5 The mean residual life curves of a Pareto distribution with α = 3.0 and β = 1.0, and a Weibull
distribution with α = 1.5 and β = 1.0.

the residual life conditioned on X > t. Then the mean residual life function is
given by
#∞
t S X (u) du
R X (t) = E[R|X > t] = . (6.41)
S X (t)

At t = 0, the mean residual life becomes


 ∞
R X (0) = S X (u) du = E[X ], (6.42)
0

as expected. Figure 6.5 shows mean residual life curves of a Pareto distribution and a
Weibull distribution.

6.3.4 Dot diagram and correlation coefficient


In analyzing a simulation model or an operational system, we usually measure a number
of variables, and we wish to find possible statistical associations among them. Thus, the
search for correlations between two or more quantities is one of the most important
functions in the output analysis of the measurement and evaluation process. A typical
method of graphically examining correlations between two variables X and Y based on
n observations of the pair

(xi , yi ), 1 ≤ i ≤ n, (6.43)

is to plot the points (xi , yi ) one by one as coordinates. Such a diagram is called a dot
or scatter diagram. The density of dots in a given region is proportional to the relative
frequency of the pairs (X, Y ) in the region.

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
150 Fundamentals of statistical data analysis

Example 6.1: Scatter diagram of Internet distances [134, 364]. The approximate geo-
graphic distance between a pair of Internet hosts can be inferred by sending probe
packets between the two hosts and measuring the round-trip delays experienced by the
probes. The relationship between geographic distance g and round-trip delay d from a
given Internet host to Internet hosts can be characterized by a scatter diagram consist-
ing of points (g, d). Owing to the inherent randomness in round-trip delays over the
Internet, delay measurements taken between a given pair of hosts separated by a fixed
geographic distance g at different times yield different delays d.
The scatter diagram in Figure 6.6 was obtained by sending probe packets from a
host at Stanford University to 79 other hosts on the Internet across the USA [364]. The
line labeled baseline provides a lower bound on the d as a function of g based on the
observation that the packet propagation speed over the Internet is at most the speed
of light through an optical fiber. If the refractive index of the fiber is denoted by η,
the propagation speed of the optical signal is v = c/η, where c is the speed of light in
vacuo. Typically, the value of η is slightly less than 1.5, so we make the approxima-
tion v ≈ 2c/3. If the round-trip delay between a pair of hosts is measured to be d, the
corresponding (one-way) geographical distance is upper bounded by ĝ = vd/2 ≈ cd/3.
When the unit of time is milliseconds and the unit of geographical distance is kilometers,
c ≈ 300 km/ms, so d and ĝ can be related approximately by
1
d≈ ĝ, (6.44)
100
which is the equation of baseline in Figure 6.6.
Since packets generally traverse multiple hops between two hosts and experience
queueing and processing delays at each hop, the measured round-trip delay will

100

90

80

70
delay d (km)

60

50

40

30

20

10 baseline
bestline
0
0 500 1000 1500 2000 2500 3000 3500 4000 4500
geographical distance g (km)
Figure 6.6 Scatter diagram of delay measurements from Internet host at Stanford University to 79 other
hosts across the USA [364].

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.3 Graphical presentations 151

typically be much larger than the delay predicted by the equation of the baseline
in (6.44). Gueye et al. [134] propose a tighter linear bound determined by solving a
linear programming problem that minimizes the slope and y-intercept of the line subject
to the constraints imposed by the set of scatter points. This deterministic bound corre-
sponds to the line labeled bestline in Figure 6.6. An alternative approach that retains
more of the statistical information captured by the scatter points is discussed in [364].

The most frequently used measure of statistical association between a pair of vari-
ables is the correlation coefficient. For a given pair of random variables X and Y , the
covariance of X and Y , written Cov[X, Y ] or σ X Y , is defined as

σ X Y  Cov[X, Y ] = E[(X − μ X )(Y − μY )] = E[X Y ] − μ X μY . (6.45)

We say X and Y are uncorrelated if σ X Y = 0.


If X and Y are statistically independent, then they are uncorrelated, but the converse
is not true: the condition σ X Y = 0 does not imply that X and Y are independent (see
Problem 6.15). The correlation coefficient ρ X Y between X and Y is defined as

σX Y
ρX Y = . (6.46)
σ X σY

The correlation coefficient always satisfies the condition

− 1 ≤ ρ X Y ≤ 1. (6.47)

We say that X and Y are properly linearly dependent if there exist nonzero
constants a and b such that a X − bY is a constant c; that is,

P[a X − bY = c] = 1. (6.48)

Therefore,

Var[a X − bY − c] = 0, (6.49)

from which we have

ρ X Y = +1 or − 1 (6.50)

depending on whether ab is positive or negative. Conversely, if ρ = ±1, then it implies


(Problem 6.17) that

(X − μ X ) Y − μY
P ∓ + = 0 = 1. (6.51)
σX σY

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
152 Fundamentals of statistical data analysis

The sample covariance of the two variables based on observations {(xi , yi ); 1 ≤ i ≤ n}


is defined as

1 
n
sx y = (xi − x)(yi − y)
n−1
i=1
(6.52)
1 
n
nx y
= xi yi − ,
n−1 n−1
i=1

where x and y are the sample means of {xi } and {yi } respectively. The sample
correlation coefficient is defined accordingly:

sx y
rx y = , (6.53)
sx s y

where sx2 and s y2 are the sample variances of {xi } and {yi } respectively.

6.4 Summary of Chapter 6


n
Sample mean: x= 1
n i=1 x i (6.1)
σ2
Variance of the sample mean: Var[X ] = nX (6.8)
1 n
Sample variance: sx2  n−1 i=1 (x i − x)
2 (6.9)
Unbiasedness of the sample variance: E[S X ] = σ X
2 2 (6.12)
Relative frequency: f j = n j /n, j = 1, 2, . . . , k (6.13)
f
Histogram: h(x) =
jj , for x ∈ (c j−1 , c j ] (6.14)
Fractile diagram: (x, u H X (x) ) (6.24)
Log-survivor function: log S X (t) = log(1 − FX (t)) (6.28)
Sample log-survivor: log(1 − H X (t)) (6.31)
Hazard function: h X (t) = Sf XX (t) f X (t)
(t) = 1−FX (t) (6.36)
#∞
S X (u) du
Mean residual life function: R X (t) = E[R|X > t] = t
S X (t) (6.41)
Dot diagram: (xi , yi ); 1 ≤ i ≤ n (6.43)
Covariance: σ X Y = E[X Y ] − μ X μY (6.45)
Uncorrelated if: σX Y = 0
Correlation coefficient: ρ X Y = σσXXσYY (6.46)
s
Sample correlation coefficient: r x y = sxxsyy (6.53)

6.5 Discussion and further reading

Most textbooks on probability theory and mathematical statistics do not seem to deal
with graphical presentations of real data. We consider that this is an unfortunate state
of affairs. Various types of graphical presentations of collected data should be explored

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.6 Problems 153

before we can narrow down proper directions of mathematical modeling or analysis of


the system in question.
Hald [139] seems to be one of the few textbooks that discusses the fractile diagram.
A monograph by Cox and Lewis [71] presents several empirical log-survivor functions
as well as scatter diagrams. Much of the material given in this chapter is taken from the
first author’s earlier book [197] on system modeling and analysis, in which additional
examples of graphical plots based on computer performance data are found.
The exploratory data analysis (EDA) approach developed by Tukey [328] and oth-
ers indeed exploits various graphical techniques as well as quantitative techniques in
analyzing data to formulate plausible hypotheses. Two graphical techniques introduced
by Tukey are the box plot and the stem-and-leaf diagram. A box plot, also known
as a box-and-whiskers plot, graphically depicts the sample minimum, lower quartile,
medium, upper quartile, and sample maximum, and may also indicate outliers of a data
set. A stem-and-leaf plot, also called a stemplot, tabulates the data in ascending order in
two columns. The first consists of the stems of the data set in ascending order, while the
second consists of the leaves corresponding to each stem. Typically, a leaf contains the
last digit of the associated sample value while the stem contains the remaining digits.
Exploratory data analysis complements the conventional statistical theory, which places
more emphasis on formal testing of a hypothesis and estimation of model parameters,
two subjects to be studied in Chapter 18.

6.6 Problems

Section 6.1: Sample mean and sample variance


6.1∗ Derivation of (6.11). Derive (6.11)
6.2 Recursive formula for sample mean and variance. Let x i and si2 be the sample
mean and sample variance based on data (x1 , x2 , . . . , xi ), where i ≤ n. Then the last
value of the sequence – that is, x n and sn2 – are the desired quantities:

x = x n and s 2 = sn2 .

(a) Derive the following recursive formula for the sample mean:

xi − x i−1
x i = x i−1 + , i ≥1
i
with the initial value

x 0 = 0.

(b) Similarly, show the recursive formula for the sample variance:

i −2 2 (xi − x i−1 )2
si2 = si−1 + , i > 1,
i −1 i

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
154 Fundamentals of statistical data analysis

with the initial values

s02 = s12 = 0.

Section 6.2: Relative frequency and histograms


6.3 Expectation and variance of the histogram. Consider the histogram value h(x)
in the jth class interval x ∈ (c j−1 , c j ] given by (6.14), which we denote as h j (x), where
x = (x1 , x2 , . . . , xn ) is the n random samples. Then h j (X) is a random variable, where
the argument x is replaced by the corresponding RV X = (X 1 , X 2 , . . . , X n ).

(a) Show that the expectation of the RV h j (X) is given by


FX (c j ) − FX (c j−1 )
E[h j (X)] = ≈ f X (c j ).

j
(b) Show that the variance of h j (X) is
[FX (c j ) − FX (c j−1 )][1 − FX (c j ) + FX (c j−1 )] f X (ci )
Var[h j (X)] = ≈ .
n
2j n
j

6.4 Expectation and variance of the cumulative histogram. Find expressions for
the expectation and variance of H j (the cumulative histogram in the jth interval) in
terms of the underlying distribution function FX (x). Explain why the shape of the
cumulative histogram is rather insensitive to the choice of class lengths {
j }.

Section 6.3: Graphical presentations


6.5 Log-survivor function curve of Erlang distributions. Plot the sample log-
survivor function by generating 1000 values of a random variable X that has the
two-stage Erlang distribution of mean one. Do the same for the four-stage Erlang
distribution.
Hint: To generate samples drawn from the k-stage Erlang distribution, apply the trans-
form method of Example 5.7 in Section 5.4.2 to generate k samples drawn from an
exponential distribution.
6.6∗ Log-survivor functions and hazard functions of a constant and uniform RVs.
Find the expression for the log-survivor function and the completion rate function, when
the service time is

(a) constant a;
(b) uniformly distributed in [a, b].

6.7 Hazard function and distribution functions. Show that the distribution function
FX (x) is given in terms of the corresponding hazard function h X (x) as follows:
#x
FX (x) = 1 − e− 0 h X (t) dt
, x ≥ 0, (6.54)

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.6 Problems 155

and hence
#x
f X (x) = h X (x)e− 0 h X (t) dt
. (6.55)

6.8 Hazard function of a k-stage hyperexponential distribution. Consider the


k-stage hyperexponential (or mixed exponential) distribution defined in (4.166) of
Chapter 4. Show that its hazard function h X (t) is monotone decreasing. Find
limt→∞ h X (t).
6.9 Hazard function of the Pareto distribution. Find the hazard function of the
Pareto distribution.
6.10 Hazard function of the Weibull distribution. The Weibull distribution is often
used in modeling reliability problems.
(a) Find the hazard function h X (t) of the standard Weibull distribution. What functional
form does h X (t) take for α = 1 and α = 2?
(b) Plot the hazard function of the standard Weibull distribution for α = 0.1, 0.5, 1, 2,
and 5, and confirm that they agree with the curves of Figure 4.5.

6.11∗ Mean residual life function and the hazard function. Show that the mean
residual life function R X (t) is a monotone-decreasing function if and only if the hazard
function h X (t) is monotone increasing.
Hint: Consider the conditional survivor function of R = X − t, given that X is greater
than t, defined by

S X (r |t)  P[R > r |X > t], (6.56)

and find its relations with the hazard function h X (t) and the mean residual life function.
6.12∗ Conditional survivor and mean residual life functions for standard Weibull
distribution.
(a) Find the conditional survivor function S X (r |t) (see Problem 6.11) of the standard
Weibull distribution.
(b) Find the mean residual life function R X (t) for the standard Weibull distribution.

6.13 Mean residual life functions.


(a) For the hyperexponential distribution (6.29), show that
1
lim R X (t) = .
t→∞ α2
(b) Consider the standard gamma distribution defined in (4.32):
x β−1 e−x
f X (x; β)  , x ≥ 0; β > 0.
(β)
Show that R X (t) is a monotone-increasing (decreasing) function if β < 1 (β > 1).
Find R X (0) and limt→∞ R X (t).

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
156 Fundamentals of statistical data analysis

6.14 Mean residual life functions – continued. Find an expression for the mean
residual life function R X (t) for each of the following distributions:
(a) Pareto distribution with parameters α > 1 and β > 0.
(b) Two-parameter Weibull distribution with parameters α and β.

6.15∗ Covariance between two RVs. Suppose that RVs X and Y are functionally
related according to

Y = cos X.

Let the probability density function of X be given by


 1
f X (x) = 2π , − π < x < π,
0, elsewhere.
Find Cov[X, Y ].
6.16 Correlation coefficient. Given two RVs X and Y , define a new RV


X − μX Y − μY 2
Z= t + ,
σX σY
where t is a real constant.
(a) Compute E[Z ].
(b) Show that −1 ≤ ρ X Y ≤ 1, where ρ X Y is the correlation coefficient between X
and Y .

6.17 Correlation coefficient – continued. Show that if ρ = ±1, then (6.51) holds.
6.18∗ Sample covariance. Show that the sample covariance s X Y defined by (6.52) is
an unbiased estimate of the covariance σ X Y .
6.19 Recursive formula for sample covariance. Generalize the recursive computa-
tion formula of Problem 6.2 to the sample covariance.

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016

You might also like