0% found this document useful (0 votes)

69 views20 pages

Cambridge Books Online

Uploaded by

Nouman Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views20 pages

Cambridge Books Online

Uploaded by

Nouman Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Cambridge Books Online

http://ebooks.cambridge.org/

Probability, Random Processes, and Statistical Analysis

Applications to Communications, Signal Processing, Queueing Theory and

Mathematical Finance

Hisashi Kobayashi, Brian L. Mark, William Turin

Book DOI: http://dx.doi.org/10.1017/CBO9780511977770

Online ISBN: 9780511977770

Hardback ISBN: 9780521895446

Chapter

6 - Fundamentals of statistical data analysis pp. 138-156

Chapter DOI: http://dx.doi.org/10.1017/CBO9780511977770.007

Cambridge University Press

6 Fundamentals of statistical
data analysis

The theory of statistics involves interpreting a set of finite observations as a sample

point drawn at random from a sample space. The study of statistics has the following
three objectives: (i) to make the best estimate of important parameters of the popula-
tion; (ii) to assess the uncertainty of the estimate; and (iii) to reduce a bulk of data to
understandable forms. In much the same way as an examination of the properties of
probability distribution functions forms the basic theory of probability, the foundation
of statistical analysis is to examine the empirical distributions and certain descriptive
measures associated with them. This chapter provides basic concepts of statistical data
analysis.

6.1 Sample mean and sample variance

Let us consider a situation where we select randomly and independently n samples from
a population whose distribution has mean μ and variance σ 2 . The set of such samples,
denoted as (x1 , x2 , . . . , xn ), is referred to as a random sample of size n. Random sam-
ples are an important foundation of statistical theory, because a majority of the results
known in mathematical statistics rely on assumptions that are consistent with a random
sample. Let us start with a simple question: How can we estimate the population mean
μ X and population variance σ X2 from the random sample (x1 , x2 , . . . , xn )?
The sample mean (also called the empirical average) x is defined as

1
n
x= xi . (6.1)
n
i=1

Each sample xi can be viewed as an instance or realization of the associated RV X i .

Thus, the sample mean x is an instance of the sample mean variable X defined by

1
n
X= Xi . (6.2)
n
i=1

In fact the term “sample mean” is often used in statistical theory to describe the variable
X , but the quantity we can actually observe is its instance x.

Downloaded from Cambridge Books Online by IP 109.171.137.60 on Sun Aug 21 20:02:24 BST 2016.
http://dx.doi.org/10.1017/CBO9780511977770.007
Cambridge Books Online © Cambridge University Press, 2016
6.1 Sample mean and sample variance 139

Taking the expectation of both sides in (6.2), we have

1
n
E[X ] = E[X i ]. (6.3)
n
i=1

Since X 1 , X 2 , . . ., X n are independent and identically distributed (i.i.d.) RVs, we have

E[X 1 ] = E[X 2 ] = · · · = E[X n ] = μ X . Substituting these values into (6.3), we find
that the expectation of the sample mean variable satisfies

E[X ] = μ X , (6.4)

which asserts that x of (6.1) is an unbiased estimate of μ X . An unbiased estimate is

one that is, on the average, right on target.
Consider the variance of X :

Var [X ] = E[(X − E[X ])2 ], (6.5)

where X − E[X ] = X − μ X can be rewritten as

1 1
n n
X − E[X ] = (X i − μ X ) = Yi , (6.6)
n n
i=1 i=1

where

Yi X i − μ X , i = 1, 2, . . . , n.

Therefore,
⎡ 2 ⎤
1 1
n n n n
Var [X ] = E ⎣ Yi ⎦= 1 E[Yi
2
] + E[Yi Y j ]. (6.7)
n n2 n2
i=1 i=1 i=1 j=1( j=i)

Since the random variables {Yi ; 1 ≤ i ≤ n} are statistically independent with zero mean
and variance σ X2 , we have

σ X2
Var [X ] = . (6.8)
n

Thus, the variance of the sample mean variable is the population variance divided by
the sample size.
The deviations of the individual observations from the sample mean provide infor-
mation about the dispersion of the xi about x. We define the sample variance
sx2 by

1
n
sx2 (xi − x)2 . (6.9)
n−1
i=1

This quantity can be viewed as an instance of the sample variance variable

1
n
S X2 (X i − X )2 , (6.10)
n−1
i=1

which is also commonly called the sample variance. We find, after some rearrangement
(Problem 6.1),

1 2
n n n
1
S X2 = Yi − Yi Y j . (6.11)
n n(n − 1)
i=1 i=1 j=1( j=i)

Taking expectations, we have

1
n
E[S X2 ] = E[Yi2 ] = σ X2 . (6.12)
n
i=1

The reason for using n − 1 rather than n as the divisor in (6.9) is to make E[S X2 ] equal
to σ X2 ; that is, to make s 2 an unbiased estimate of σ X2 . The positive square root of the
sample variance, sx , is called the sample standard deviation.

6.2 Relative frequency and histograms

When the observed data takes on discrete values, we can just count the number of occur-
rences for the individual values. Suppose that the sample size n is given and k(≤ n)
distinct values exist. Let n j be the number of times that the jth value is observed,
1 ≤ j ≤ k. Then the fraction

nj
fj = , j = 1, 2, . . . , k, (6.13)
n

is, as defined in (2.1), the relative frequency of the jth value.

When the underlying random variable X is a continuous variable, we often adopt the
method of “grouping” or “classifying” the data: the range of observations is divided
into k intervals, called class intervals, at points c0 , c1 , c2 , . . ., ck . Let us designate the
interval (c j−1 , c j ] as the jth class, 1 ≤ j ≤ k. Note that the lengths of the class intervals

j c j − c j−1 , j = 1, 2, . . . , k,

need not be equal. Let n j denote the number of observations that fall in the jth class
interval. Then the relative frequency of the jth class takes the same form as (6.13).
The grouped distribution may be represented graphically as the following “staircase
function” in an (x, h)-coordinate system:

fj nj
h(x) = = , for x ∈ (c j−1 , c j ], j = 1, 2, . . . , k. (6.14)

j n
j

Such a diagram is called a histogram and can be regarded as an estimate of the PDF
of the population. If the class lengths
j are all the same, the shape of the histogram
remains unchanged whether we use the relative frequency of the classes { f j } or the
frequency counts of the classes {n j } as the ordinate. Such diagrams are also called
histograms.
The choice of the class intervals in the histogram representation is by no means trivial.
Certainly, we should choose them in such a way that the characteristic features of the
distribution are emphasized and chance variations are obscured. If the class lengths
are too small, chance variations dominate because each interval includes only a small
number of observations. On the other hand, if the class lengths are too large, a great deal
of information concerning the characteristics of the distribution will be lost.
Let {xk : 1 ≤ k ≤ n} denote n observations in the order observed and let {x(i) : 1 ≤
i ≤ n} denote the same observations ranked in order of magnitude. The frequency H (x)
of observations that are smaller than or equal to x is called the cumulative relative
frequency, and is given by
⎧
⎨ 0, for x < x(1) ,
H (x) = i
, for x(i) ≤ x < x(i+1) , i = 1, 2, . . . , n − 1, (6.15)
⎩ n
1, for x ≥ x(n) ,
which is the empirical analog of the CDF FX (x). If we use the unit step function

1, for x ≥ 0,
u(x) = (6.16)
0, for x < 0,
the above H (x) can be more concisely written as
1 1
n n
H (x) = u(x − x(i) ) = u(x − xk ), − ∞ < x < ∞. (6.17)
n n
i=1 k=1

Interestingly enough, use of the unit step function makes it unnecessary to obtain the
rank-ordered data in order to find H (x). The graphical plot of H (x) is a nondecreasing
step curve, which increases from zero to one in “jumps” of 1/n at points x = x(1) ,
x(2) , . . ., x(n) . If several observations take on the same value, the jump is a multiple
of 1/n.
When grouped data are presented as a cumulative relative frequency distribution, it
is usually called the cumulative histogram. The cumulative histogram is far less sensi-
tive to variations in class lengths than the histogram. This is because the accumulation is
essentially equivalent to integration along the x-axis, which filters out the chance vari-
ations contained in the histogram. The cumulative relative frequency distribution or the
cumulative histogram is, therefore, quite helpful in portraying the gross features of data.

6.3 Graphical presentations

Reducing primary data to the sample mean, sample variance, and histogram can reveal
a great amount of information concerning the nature of the population distribution. But
sometimes important features of the underlying distribution are obscured or hidden by

the data reduction procedures. In this section we will discuss some graphical methods
that are often valuable in an exploratory analysis of measurement data. They are: (a) his-
tograms on the probability or log-normal probability papers; (b) the survivor functions
on log-linear and log-log papers; and (c) the dot diagram and correlation coefficient.

6.3.1 Histogram on probability paper

6.3.1.1 Testing the normal distribution hypothesis
As we stated in Section 4.2.4, RVs occurring in physical situations often have the nor-
mal (or Gaussian) distribution, or can at least be treated approximately as normal RVs.
As we shall see in subsequent sections, most statistical analysis techniques are based on
the assumption of normality of measured variables. Thus, when we collect measurement
data and obtain some empirical distribution, the first thing we might do is to exam-
ine whether the underlying distribution is normal. A fractile diagram1 (Hald [139]) is
useful for this purpose. For a given distribution function F(x)

P = F(x) (6.18)

provides the dependence of the cumulative distribution on the variable x. The inverse
function

x P = F −1 (P) (6.19)

gives the value of the variable x that corresponds to the given cumulative probability
P. The value x P is called the P-fractile. Some authors use the terms percentile or
quantile instead of the term fractile.
The distribution function of the standard normal distribution N (0, 1) is often
denoted by (·) as defined in (4.46):
u 2

1 t
(u) = √ exp − dt. (6.20)
2π −∞ 2
Then the fractile, u P , of the distribution N (0, 1) is derived as

u P = −1 (P). (6.21)

Suppose that for a given cumulative relative frequency H (x) we wish to test whether
this empirical distribution resembles a normal distribution; that is, to test whether

x −μ
H (x) ∼= (6.22)
σ
holds for some parameters μ and σ , where the symbol ∼ = means “to have the distribution
of.” Testing this relation is equivalent to testing the relation
x −μ
u H (x) ≈ . (6.23)
σ

1 This term should not be confused with a similar term “fractal diagram” known in fractal geometry.

According to the definition of H (x), the plot of u H (x) versus x forms a step (or staircase)
curve:
⎧
⎨ −∞, for x < x(1) ,
u H (x) = u , for x(i) ≤ x < x(i+1) , i = 1, 2, . . . , n − 1, (6.24)
⎩ i/n
∞, for x ≥ x(n) .

Therefore, the staircase function plot

u = u H (x) (6.25)

provides an estimate of the straight line

x −μ
u= (6.26)
σ
in the same way that the cumulative frequency distribution y = H (x) forms an estimate
of the CDF y = F(x). The graphical plot of the function (6.25) in an (x, u)-coordinate
system is called the fractile diagram.
Instead of plotting (x, u P ) on ordinary graph paper, we may plot (x, P) directly on
special graph paper called probability paper. On the ordinate axis of a probability
paper, the corresponding values of P = (u) are marked, rather than the u values.
Probability paper is used in the same manner as other special graph papers, such as
logarithmic paper. Figure 6.1 (a) shows a probability paper with step curve u = u H (x) ,
based on n = 50 sample points drawn from a normal distribution with zero mean and
unit variance. Instead of the step curve, we often plot n points (x(i) , (i − 12 )/n) which
are situated at the midpoints of the vertical parts of the step curve. The advantages are
that it is easier to plot n points than to draw a step curve, and that possible systematic
deviations from a straight line are more easily detected from this dot diagram. The result
of this procedure is shown in Figure 6.1 (b).
If the distribution in question is normal, the points of the fractile diagrams should
vary randomly about a straight line. In a small sample, say n < 20, the permissible
random variation of points in the fractile diagram is so large that it is generally difficult
to examine whether systematic deviations from a straight line exist.

6.3.1.2 Testing the log-normal distribution hypothesis

Some random variables we deal with are often modeled by a log-normal distribution
(see Section 7.4). In order to test whether a log-normal distribution fits given empir-
ical data, we should plot the step curve or dot diagram on log-normal paper, which
is a simple modification of the above probability paper. The ordinate axis is the same
as in the probability paper, i.e., u P = −1 (P), whereas the horizontal axis is changed
from the linear scale (in the probability paper) to the logarithmic scale, i.e., log10 x =
log x/ log 10.2 If the empirical data exhibits a straight line on this log-normal probabil-
ity paper, then a log-normal distribution should be a good candidate to represent this
variable. In Figure 6.2 (a) and (b) we plot the step curve and dot diagram respectively

2 We use log to mean the natural logarithm; i.e., log or ln.

3 P ⫻ 100

99
2
95
90
1
70
uP

0 50
30
−1
10
5
−2 2
0.5
−3
−3 −2 −1 0 1 2 3
X
(a)

3 P ⫻ 100

99
2
95
90
1
70
uP

0 50
30
−1
10
5
−2 2
0.5
−3
−3 −2 −1 0 1 2 3
X
(b)
Figure 6.1 The fractile diagram of normal variates: (a) step curve; (b) dot diagram.

of a simulated set of n = 50 sample points xi , where xi = e yi and yi is drawn from

N (2, 4); i.e., μY = 2 and σY = 2. From the results'to be discussed
( in Section 7.4, we
find that μ X = eμY +(σY /2) = e4 and σ X2 = e2μY +σY eσY − 1 = e8 (e4 − 1).
2 2 2

6.3.2 Log-survivor function curve

Suppose that a random variable X represents the life of some item (e.g., light-bulb) or
the interval between failures of some machine. Given the distribution function FX (x) of
the RV X , the probability that X survives time duration t,

S X (t) P[X > t] = 1 − FX (t), (6.27)

3 P ⫻ 100

99
2
95
90
1
70
uP

0 50
30
−1
10
5
−2 2
0.5
−3
100 101 102
X
(a)

3 P ⫻ 100

99
2
95
90
1
70
uP

0 50
30
−1
10
5
−2 2
0.5
−3
100 101 102
X
(b)
Figure 6.2 The fractile diagram of log-normal variates: (a) step curve; (b) dot diagram.

is often called the survivor function, or the survival function in reliability theory. It
is equivalent to the complementary distribution function FXc (t) defined earlier.
The natural logarithm of (6.27) is known as the log-survivor function or the log-
survival function (Cox and Lewis [71]):

log S X (t) = log(1 − FX (t)). (6.28)

The log-survivor function will show the details of the tail end of the distribution more
effectively than the distribution itself.

If, for instance, FX (x) is an exponential distribution with mean 1/α, then its log-
survivor function is a straight line: S X (t) = log e−αx = −αx.
If FX (x) is a mixed exponential distribution (or hyperexponential distribution)

FX (x) = π1 (1 − e−α1 x ) + π2 (1 − e−α2 x ), α1 > α2 , π1 + π2 = 1, (6.29)

then its log-survivor function has two asymptotic straight lines, since

log S X (t) = log(π1 e−α1 t + π2 e−α2 t )

−α1 t + log π1 , for small t,
≈ (6.30)
−α2 t + log π2 , for large t.

The sample log-survivor function or empirical log-survivor function is similarly

defined as

log[1 − H (t)], (6.31)

where H (t) represents the cumulative relative frequency (ungrouped data) or the
cumulative histogram (grouped data). In the ungrouped case we find from (6.15) that

i
log 1 − , 1 ≤ i ≤ n, (6.32)
n

should be plotted against x(i) , where the subscript (i) represents the rank as in (6.15).
In order to avoid difficulties at i = n, we may sometimes modify (6.32) into

i
log 1 − , 1 ≤ i ≤ n. (6.33)
n+1
As an example, Figure 6.3 plots the log-survivor function using a sample of size 1000
drawn from the above hyperexponential distribution with parameters

π1 = 0.0526, π2 = 1 − π1 , α1 = 0.1, and α2 = 2.0. (6.34)

Out of the 1000 samples taken, 18 sample points that exceed x = 10 fall outside the
scale of the figure; hence they are not shown. The asymptotes of (6.30) can be easily
recognized from this log-survivor function.
Characteristically, the log-survivor function of the mixed exponential distribution
(6.29) is convex with a linear tail. Observations of (or departures from) such char-
acteristic shapes are used to postulate a functional form for a distribution. See Gaver
et al. [115] and Lewis and Shedler [225].

6.3.2.1 Testing the Pareto distribution hypothesis

As discussed in Section 4.2.6, a simple way to examine whether the tail of an empirical
distribution fits the power law of the Pareto distribution is to plot the log-survivor
function on paper with the log-log scale, whereas the log-survivor function curve
discussed above is plotted in the log-linear scale.

−1

−2

−3
log[1−H(t )]

−4

−5

−6

−7

−8

−9

−10
0 2 4 6 8 10
t
Figure 6.3 The log-survivor function of a mixed-exponential (or hyperexponential) distribution with π1 = 0.0526,
π2 = 1 − π1 , α1 = 0.1, and α2 = 2.0.

−1

−2

−3
log[1−H(t )]

−4

−5

−6

−7

−8

−9

−10
1 2 3 4 5 6 7 8 9 10
t
Figure 6.4 The log-survivor function of a mixed Pareto distribution, β1 = β2 = 1, π2 = 1 − π1 ,
α1 = 1.5, α2 = 5, and π1 = 0.2.

Analogous to the mixed exponential distribution, a mixed Pareto distribution is

considered:

β1α1 β2α2
S X (t) = π1 + (−π1 α , 0 < max{β1 , β2 } ≤ t.
) (6.35)
t α1 t 2
As an example, Figure 6.4 plots the log-survivor function of 500 samples drawn from
the mixed Pareto distribution with β1 = β2 = 1, α1 = 1.5, α2 = 5, and π1 = 0.2.

6.3.3 Hazard function and mean residual life curves

Other graphical plots that can be derived from the histogram or distribution function are
the hazard function curve and the mean residual life curve. These notions are also
related to reliability theory and renewal process theory, which will be briefly discussed
in Section 14.3.
Suppose that X represents the life of some item, with the distribution function FX (x).
The function defined by

f X (t) f X (t)
h X (t) = = (6.36)
S X (t) 1 − FX (t)

is called the hazard function or the failure rate, because h X (t) dt represents the prob-
ability that the life will end in the interval (t, t + dt], given that X has survived up to
age t; i.e., X ≥ t. If X represents the service time of a customer, as in queueing theory,
h X (t) is called the completion rate function.
The hazard functions of the exponential, Weibull, Pareto, and log-normal distribu-
tions are given as follows:
⎧
⎪
⎪ λ, t ≥ 0, for exponential,
⎪
⎪ ' (α−1
⎪
⎪ α
⎪
⎪
t
, t ≥ 0, for Weibull,
⎨ βα β
h X (t) = t,
t ≥ β, for Pareto, (6.37)
⎪
⎪ ( log t−μY )2
⎪
⎪ t −1 exp −
⎪
⎪
2
2σY
⎪ # ∞ exp − (u−μY )2 du , t > 0, for log-normal,
⎪
⎩ log t 2σY2

where in the log-normal distribution the parameters μY and σY are given as

1 σ X2
μY = log μ X − log 1 + 2
2 μX
and

σ X2
σY2 = log 1 + .
μ2X
From (6.36) we can express the survivor function in terms of the hazard function:
#x
S X (x) = e− 0 h X (t) dt
, x ≥ 0, (6.38)

from which we have

d log S X (t)
h X (t) = − , t ≥ 0. (6.39)
dt
The last equation, of course, could have been readily derived from (6.36).
Given that the service time variable X is greater than t, we call the difference

R = X −t (6.40)

4
Pareto: α = 3, β = 1
3.5 Weibull: α = 1.5, β = 1

2.5
RX (t )

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4
t
Figure 6.5 The mean residual life curves of a Pareto distribution with α = 3.0 and β = 1.0, and a Weibull
distribution with α = 1.5 and β = 1.0.

the residual life conditioned on X > t. Then the mean residual life function is
given by
#∞
t S X (u) du
R X (t) = E[R|X > t] = . (6.41)
S X (t)

At t = 0, the mean residual life becomes

∞
R X (0) = S X (u) du = E[X ], (6.42)
0

as expected. Figure 6.5 shows mean residual life curves of a Pareto distribution and a
Weibull distribution.

6.3.4 Dot diagram and correlation coefficient

In analyzing a simulation model or an operational system, we usually measure a number
of variables, and we wish to find possible statistical associations among them. Thus, the
search for correlations between two or more quantities is one of the most important
functions in the output analysis of the measurement and evaluation process. A typical
method of graphically examining correlations between two variables X and Y based on
n observations of the pair

(xi , yi ), 1 ≤ i ≤ n, (6.43)

is to plot the points (xi , yi ) one by one as coordinates. Such a diagram is called a dot
or scatter diagram. The density of dots in a given region is proportional to the relative
frequency of the pairs (X, Y ) in the region.

Example 6.1: Scatter diagram of Internet distances [134, 364]. The approximate geo-
graphic distance between a pair of Internet hosts can be inferred by sending probe
packets between the two hosts and measuring the round-trip delays experienced by the
probes. The relationship between geographic distance g and round-trip delay d from a
given Internet host to Internet hosts can be characterized by a scatter diagram consist-
ing of points (g, d). Owing to the inherent randomness in round-trip delays over the
Internet, delay measurements taken between a given pair of hosts separated by a fixed
geographic distance g at different times yield different delays d.
The scatter diagram in Figure 6.6 was obtained by sending probe packets from a
host at Stanford University to 79 other hosts on the Internet across the USA [364]. The
line labeled baseline provides a lower bound on the d as a function of g based on the
observation that the packet propagation speed over the Internet is at most the speed
of light through an optical fiber. If the refractive index of the fiber is denoted by η,
the propagation speed of the optical signal is v = c/η, where c is the speed of light in
vacuo. Typically, the value of η is slightly less than 1.5, so we make the approxima-
tion v ≈ 2c/3. If the round-trip delay between a pair of hosts is measured to be d, the
corresponding (one-way) geographical distance is upper bounded by ĝ = vd/2 ≈ cd/3.
When the unit of time is milliseconds and the unit of geographical distance is kilometers,
c ≈ 300 km/ms, so d and ĝ can be related approximately by
1
d≈ ĝ, (6.44)
100
which is the equation of baseline in Figure 6.6.
Since packets generally traverse multiple hops between two hosts and experience
queueing and processing delays at each hop, the measured round-trip delay will

100

70
delay d (km)

10 baseline
bestline
0
0 500 1000 1500 2000 2500 3000 3500 4000 4500
geographical distance g (km)
Figure 6.6 Scatter diagram of delay measurements from Internet host at Stanford University to 79 other
hosts across the USA [364].

typically be much larger than the delay predicted by the equation of the baseline
in (6.44). Gueye et al. [134] propose a tighter linear bound determined by solving a
linear programming problem that minimizes the slope and y-intercept of the line subject
to the constraints imposed by the set of scatter points. This deterministic bound corre-
sponds to the line labeled bestline in Figure 6.6. An alternative approach that retains
more of the statistical information captured by the scatter points is discussed in [364].

The most frequently used measure of statistical association between a pair of vari-
ables is the correlation coefficient. For a given pair of random variables X and Y , the
covariance of X and Y , written Cov[X, Y ] or σ X Y , is defined as

σ X Y Cov[X, Y ] = E[(X − μ X )(Y − μY )] = E[X Y ] − μ X μY . (6.45)

We say X and Y are uncorrelated if σ X Y = 0.

If X and Y are statistically independent, then they are uncorrelated, but the converse
is not true: the condition σ X Y = 0 does not imply that X and Y are independent (see
Problem 6.15). The correlation coefficient ρ X Y between X and Y is defined as

σX Y
ρX Y = . (6.46)
σ X σY

The correlation coefficient always satisfies the condition

− 1 ≤ ρ X Y ≤ 1. (6.47)

We say that X and Y are properly linearly dependent if there exist nonzero
constants a and b such that a X − bY is a constant c; that is,

P[a X − bY = c] = 1. (6.48)

Therefore,

Var[a X − bY − c] = 0, (6.49)

from which we have

ρ X Y = +1 or − 1 (6.50)

depending on whether ab is positive or negative. Conversely, if ρ = ±1, then it implies

(Problem 6.17) that

(X − μ X ) Y − μY
P ∓ + = 0 = 1. (6.51)
σX σY

The sample covariance of the two variables based on observations {(xi , yi ); 1 ≤ i ≤ n}

is defined as

1
n
sx y = (xi − x)(yi − y)
n−1
i=1
(6.52)
1
n
nx y
= xi yi − ,
n−1 n−1
i=1

where x and y are the sample means of {xi } and {yi } respectively. The sample
correlation coefficient is defined accordingly:

sx y
rx y = , (6.53)
sx s y

where sx2 and s y2 are the sample variances of {xi } and {yi } respectively.

6.4 Summary of Chapter 6

n
Sample mean: x= 1
n i=1 x i (6.1)
σ2
Variance of the sample mean: Var[X ] = nX (6.8)
1 n
Sample variance: sx2 n−1 i=1 (x i − x)
2 (6.9)
Unbiasedness of the sample variance: E[S X ] = σ X
2 2 (6.12)
Relative frequency: f j = n j /n, j = 1, 2, . . . , k (6.13)
f
Histogram: h(x) =
jj , for x ∈ (c j−1 , c j ] (6.14)
Fractile diagram: (x, u H X (x) ) (6.24)
Log-survivor function: log S X (t) = log(1 − FX (t)) (6.28)
Sample log-survivor: log(1 − H X (t)) (6.31)
Hazard function: h X (t) = Sf XX (t) f X (t)
(t) = 1−FX (t) (6.36)
#∞
S X (u) du
Mean residual life function: R X (t) = E[R|X > t] = t
S X (t) (6.41)
Dot diagram: (xi , yi ); 1 ≤ i ≤ n (6.43)
Covariance: σ X Y = E[X Y ] − μ X μY (6.45)
Uncorrelated if: σX Y = 0
Correlation coefficient: ρ X Y = σσXXσYY (6.46)
s
Sample correlation coefficient: r x y = sxxsyy (6.53)

6.5 Discussion and further reading

Most textbooks on probability theory and mathematical statistics do not seem to deal
with graphical presentations of real data. We consider that this is an unfortunate state
of affairs. Various types of graphical presentations of collected data should be explored

before we can narrow down proper directions of mathematical modeling or analysis of

the system in question.
Hald [139] seems to be one of the few textbooks that discusses the fractile diagram.
A monograph by Cox and Lewis [71] presents several empirical log-survivor functions
as well as scatter diagrams. Much of the material given in this chapter is taken from the
first author’s earlier book [197] on system modeling and analysis, in which additional
examples of graphical plots based on computer performance data are found.
The exploratory data analysis (EDA) approach developed by Tukey [328] and oth-
ers indeed exploits various graphical techniques as well as quantitative techniques in
analyzing data to formulate plausible hypotheses. Two graphical techniques introduced
by Tukey are the box plot and the stem-and-leaf diagram. A box plot, also known
as a box-and-whiskers plot, graphically depicts the sample minimum, lower quartile,
medium, upper quartile, and sample maximum, and may also indicate outliers of a data
set. A stem-and-leaf plot, also called a stemplot, tabulates the data in ascending order in
two columns. The first consists of the stems of the data set in ascending order, while the
second consists of the leaves corresponding to each stem. Typically, a leaf contains the
last digit of the associated sample value while the stem contains the remaining digits.
Exploratory data analysis complements the conventional statistical theory, which places
more emphasis on formal testing of a hypothesis and estimation of model parameters,
two subjects to be studied in Chapter 18.

6.6 Problems

Section 6.1: Sample mean and sample variance

6.1∗ Derivation of (6.11). Derive (6.11)
6.2 Recursive formula for sample mean and variance. Let x i and si2 be the sample
mean and sample variance based on data (x1 , x2 , . . . , xi ), where i ≤ n. Then the last
value of the sequence – that is, x n and sn2 – are the desired quantities:

x = x n and s 2 = sn2 .

(a) Derive the following recursive formula for the sample mean:

xi − x i−1
x i = x i−1 + , i ≥1
i
with the initial value

x 0 = 0.

(b) Similarly, show the recursive formula for the sample variance:

i −2 2 (xi − x i−1 )2
si2 = si−1 + , i > 1,
i −1 i

with the initial values

s02 = s12 = 0.

Section 6.2: Relative frequency and histograms

6.3 Expectation and variance of the histogram. Consider the histogram value h(x)
in the jth class interval x ∈ (c j−1 , c j ] given by (6.14), which we denote as h j (x), where
x = (x1 , x2 , . . . , xn ) is the n random samples. Then h j (X) is a random variable, where
the argument x is replaced by the corresponding RV X = (X 1 , X 2 , . . . , X n ).

(a) Show that the expectation of the RV h j (X) is given by

FX (c j ) − FX (c j−1 )
E[h j (X)] = ≈ f X (c j ).

j
(b) Show that the variance of h j (X) is
[FX (c j ) − FX (c j−1 )][1 − FX (c j ) + FX (c j−1 )] f X (ci )
Var[h j (X)] = ≈ .
n
2j n
j

6.4 Expectation and variance of the cumulative histogram. Find expressions for
the expectation and variance of H j (the cumulative histogram in the jth interval) in
terms of the underlying distribution function FX (x). Explain why the shape of the
cumulative histogram is rather insensitive to the choice of class lengths {
j }.

Section 6.3: Graphical presentations

6.5 Log-survivor function curve of Erlang distributions. Plot the sample log-
survivor function by generating 1000 values of a random variable X that has the
two-stage Erlang distribution of mean one. Do the same for the four-stage Erlang
distribution.
Hint: To generate samples drawn from the k-stage Erlang distribution, apply the trans-
form method of Example 5.7 in Section 5.4.2 to generate k samples drawn from an
exponential distribution.
6.6∗ Log-survivor functions and hazard functions of a constant and uniform RVs.
Find the expression for the log-survivor function and the completion rate function, when
the service time is

(a) constant a;
(b) uniformly distributed in [a, b].

6.7 Hazard function and distribution functions. Show that the distribution function
FX (x) is given in terms of the corresponding hazard function h X (x) as follows:
#x
FX (x) = 1 − e− 0 h X (t) dt
, x ≥ 0, (6.54)

and hence
#x
f X (x) = h X (x)e− 0 h X (t) dt
. (6.55)

6.8 Hazard function of a k-stage hyperexponential distribution. Consider the

k-stage hyperexponential (or mixed exponential) distribution defined in (4.166) of
Chapter 4. Show that its hazard function h X (t) is monotone decreasing. Find
limt→∞ h X (t).
6.9 Hazard function of the Pareto distribution. Find the hazard function of the
Pareto distribution.
6.10 Hazard function of the Weibull distribution. The Weibull distribution is often
used in modeling reliability problems.
(a) Find the hazard function h X (t) of the standard Weibull distribution. What functional
form does h X (t) take for α = 1 and α = 2?
(b) Plot the hazard function of the standard Weibull distribution for α = 0.1, 0.5, 1, 2,
and 5, and confirm that they agree with the curves of Figure 4.5.

6.11∗ Mean residual life function and the hazard function. Show that the mean
residual life function R X (t) is a monotone-decreasing function if and only if the hazard
function h X (t) is monotone increasing.
Hint: Consider the conditional survivor function of R = X − t, given that X is greater
than t, defined by

S X (r |t) P[R > r |X > t], (6.56)

and find its relations with the hazard function h X (t) and the mean residual life function.
6.12∗ Conditional survivor and mean residual life functions for standard Weibull
distribution.
(a) Find the conditional survivor function S X (r |t) (see Problem 6.11) of the standard
Weibull distribution.
(b) Find the mean residual life function R X (t) for the standard Weibull distribution.

6.13 Mean residual life functions.

(a) For the hyperexponential distribution (6.29), show that
1
lim R X (t) = .
t→∞ α2
(b) Consider the standard gamma distribution defined in (4.32):
x β−1 e−x
f X (x; β) , x ≥ 0; β > 0.
(β)
Show that R X (t) is a monotone-increasing (decreasing) function if β < 1 (β > 1).
Find R X (0) and limt→∞ R X (t).

6.14 Mean residual life functions – continued. Find an expression for the mean
residual life function R X (t) for each of the following distributions:
(a) Pareto distribution with parameters α > 1 and β > 0.
(b) Two-parameter Weibull distribution with parameters α and β.

6.15∗ Covariance between two RVs. Suppose that RVs X and Y are functionally
related according to

Y = cos X.

Let the probability density function of X be given by

1
f X (x) = 2π , − π < x < π,
0, elsewhere.
Find Cov[X, Y ].
6.16 Correlation coefficient. Given two RVs X and Y , define a new RV

X − μX Y − μY 2
Z= t + ,
σX σY
where t is a real constant.
(a) Compute E[Z ].
(b) Show that −1 ≤ ρ X Y ≤ 1, where ρ X Y is the correlation coefficient between X
and Y .

6.17 Correlation coefficient – continued. Show that if ρ = ±1, then (6.51) holds.
6.18∗ Sample covariance. Show that the sample covariance s X Y defined by (6.52) is
an unbiased estimate of the covariance σ X Y .
6.19 Recursive formula for sample covariance. Generalize the recursive computa-
tion formula of Problem 6.2 to the sample covariance.

Tugas APD Resume
No ratings yet
Tugas APD Resume
61 pages
Screenshot 2024-07-22 at 10.26.36 AM
No ratings yet
Screenshot 2024-07-22 at 10.26.36 AM
35 pages
STATISTICS & PROBABILITY
No ratings yet
STATISTICS & PROBABILITY
9 pages
Econ Review Stat W2 Jan2023
No ratings yet
Econ Review Stat W2 Jan2023
49 pages
Research Methods For Engineers 1st Edition David V Thiel pdf download
No ratings yet
Research Methods For Engineers 1st Edition David V Thiel pdf download
76 pages
Revised Lectures 2,3 and 4
No ratings yet
Revised Lectures 2,3 and 4
13 pages
Statistics file of pust
No ratings yet
Statistics file of pust
78 pages
Statistics YTU Day 1_70c47b3d-23fd-4707-8184-60cbab30a3c3
No ratings yet
Statistics YTU Day 1_70c47b3d-23fd-4707-8184-60cbab30a3c3
37 pages
Chapter 6_Descriptive Statistics
No ratings yet
Chapter 6_Descriptive Statistics
59 pages
MATH 403 Engineering Data Analysis 95 132
No ratings yet
MATH 403 Engineering Data Analysis 95 132
38 pages
Engineering Data Analysis (Report)
No ratings yet
Engineering Data Analysis (Report)
18 pages
Frequency Distributions and Graphs2
No ratings yet
Frequency Distributions and Graphs2
8 pages
Denzin_sParadigmShift_RevisitingTriangulation
No ratings yet
Denzin_sParadigmShift_RevisitingTriangulation
15 pages
Chapter 5 - XSTKE
No ratings yet
Chapter 5 - XSTKE
7 pages
q3 Notes (Summary)
No ratings yet
q3 Notes (Summary)
7 pages
Chapter 01 Preliminaries (1)
No ratings yet
Chapter 01 Preliminaries (1)
10 pages
Lecture Notes
No ratings yet
Lecture Notes
90 pages
SI_Chapter-2
No ratings yet
SI_Chapter-2
53 pages
Seismic Resistant Design of Structures: Random Variables
No ratings yet
Seismic Resistant Design of Structures: Random Variables
30 pages
Prelims Stats
No ratings yet
Prelims Stats
39 pages
Internal Paper
No ratings yet
Internal Paper
20 pages
PDF Introduction To Probability and Statistics 13th Edition Mendenhall Solutions Manual Download
No ratings yet
PDF Introduction To Probability and Statistics 13th Edition Mendenhall Solutions Manual Download
49 pages
04-Shubham Bagad
No ratings yet
04-Shubham Bagad
59 pages
DSML
No ratings yet
DSML
510 pages
Chapter 2 - Representing Sample Data: Graphical Displays
No ratings yet
Chapter 2 - Representing Sample Data: Graphical Displays
16 pages
Chap5 Statistical Inference
No ratings yet
Chap5 Statistical Inference
38 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Gurruh Dwi Septano Tugas Rangkuman BAB 2
No ratings yet
Gurruh Dwi Septano Tugas Rangkuman BAB 2
16 pages
Topic06 Written
No ratings yet
Topic06 Written
15 pages
Statistics Review
No ratings yet
Statistics Review
59 pages
MT233 October 2019-1
No ratings yet
MT233 October 2019-1
39 pages
Transition To MATH503
No ratings yet
Transition To MATH503
12 pages
Chapter 1
No ratings yet
Chapter 1
23 pages
Fundamentals of Statistics I - Lecture Notes
No ratings yet
Fundamentals of Statistics I - Lecture Notes
77 pages
Chap5 Statistical Inference
No ratings yet
Chap5 Statistical Inference
37 pages
Task 5: H32 H32 H32 H32 H32 H32 H32 H32 H32 H32 H32 H32
No ratings yet
Task 5: H32 H32 H32 H32 H32 H32 H32 H32 H32 H32 H32 H32
49 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
19 pages
Basic Statistic
No ratings yet
Basic Statistic
20 pages
Statatics Chapter 1
No ratings yet
Statatics Chapter 1
21 pages
Enhancement of Science Students Process Skills TH
No ratings yet
Enhancement of Science Students Process Skills TH
9 pages
Topic 2
No ratings yet
Topic 2
18 pages
Eng 2015 Prelims Reviewer
No ratings yet
Eng 2015 Prelims Reviewer
11 pages
Chapter 1 Mathematics
No ratings yet
Chapter 1 Mathematics
2 pages
Customer Satisfaction Analysis
No ratings yet
Customer Satisfaction Analysis
6 pages
Biostatistics Notes Introductory Chapter
No ratings yet
Biostatistics Notes Introductory Chapter
21 pages
Statistics and Propability 1
No ratings yet
Statistics and Propability 1
35 pages
BDU Biometrics
No ratings yet
BDU Biometrics
122 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
Goldstone Boson and Higgs Mechanism: Proseminar Theoretical Physics
No ratings yet
Goldstone Boson and Higgs Mechanism: Proseminar Theoretical Physics
22 pages
The Lists
No ratings yet
The Lists
49 pages
Accenture 3
No ratings yet
Accenture 3
15 pages
Cambridge Books Online
No ratings yet
Cambridge Books Online
27 pages
Peran Guru BK Dalam Pemilihan Karir Peserta Didik Di Kelas Xii Sman 8 Padang PDF
No ratings yet
Peran Guru BK Dalam Pemilihan Karir Peserta Didik Di Kelas Xii Sman 8 Padang PDF
11 pages
Psychological Statistics
No ratings yet
Psychological Statistics
15 pages
Inferential Statistics: X (Called X Bar), To Symbolize The Sample
No ratings yet
Inferential Statistics: X (Called X Bar), To Symbolize The Sample
19 pages
Binary Search Tree: CS221 (A) - Data Structures & Algorithms
No ratings yet
Binary Search Tree: CS221 (A) - Data Structures & Algorithms
24 pages
Ibdp Mathematics SL Ia
50% (2)
Ibdp Mathematics SL Ia
21 pages
Chapter 2 Data-DrivenModelingUsingMATLAB-2
No ratings yet
Chapter 2 Data-DrivenModelingUsingMATLAB-2
20 pages
Hypothesis Testing & How To Use SPSS
0% (1)
Hypothesis Testing & How To Use SPSS
6 pages
Math
No ratings yet
Math
6 pages
Selective Review - Probability
No ratings yet
Selective Review - Probability
30 pages
IEM Outline Lecture Notes Autumn 2016
No ratings yet
IEM Outline Lecture Notes Autumn 2016
198 pages
Module Wise Important Formulae
No ratings yet
Module Wise Important Formulae
45 pages
Expression Trees: Data Structures & Algorithms
No ratings yet
Expression Trees: Data Structures & Algorithms
16 pages
Lecture 6 - Jan 24 PDF
No ratings yet
Lecture 6 - Jan 24 PDF
22 pages
Stats Assign
No ratings yet
Stats Assign
6 pages
FIDP--FORMAT
No ratings yet
FIDP--FORMAT
2 pages
Jse 07 4 Schouten PDF
No ratings yet
Jse 07 4 Schouten PDF
27 pages
The Binary Tree PDF
No ratings yet
The Binary Tree PDF
64 pages
Math in The Modern World Stat Lecture
No ratings yet
Math in The Modern World Stat Lecture
3 pages
CS221 - Data Structures & Algorithms: Queues
No ratings yet
CS221 - Data Structures & Algorithms: Queues
21 pages
Rudolf Steiner - Fundamentals of Theraphy
100% (2)
Rudolf Steiner - Fundamentals of Theraphy
56 pages
Statistics 1
No ratings yet
Statistics 1
34 pages
CS221A - Data Structures & Algorithms: Trees
No ratings yet
CS221A - Data Structures & Algorithms: Trees
13 pages
Cambridge Books Online
No ratings yet
Cambridge Books Online
31 pages
4 - Statistik Deskriptif
No ratings yet
4 - Statistik Deskriptif
33 pages
Optimal Joint Detection and Estimation in Linear Models: Jianshu Chen, Yue Zhao, Andrea Goldsmith, and H. Vincent Poor
No ratings yet
Optimal Joint Detection and Estimation in Linear Models: Jianshu Chen, Yue Zhao, Andrea Goldsmith, and H. Vincent Poor
6 pages
13 Nested ANOVA 2012
No ratings yet
13 Nested ANOVA 2012
15 pages
1 Engineering Management PDF
100% (2)
1 Engineering Management PDF
28 pages
Uoc Luong Tham So
No ratings yet
Uoc Luong Tham So
38 pages
Basic Statistics For Lms
0% (1)
Basic Statistics For Lms
23 pages
2008 Massoulie Coupon Replication Systems
No ratings yet
2008 Massoulie Coupon Replication Systems
14 pages
2017-Oguz-A New Stable Peer-To-Peer Protocol
No ratings yet
2017-Oguz-A New Stable Peer-To-Peer Protocol
21 pages
The AVL Trees
No ratings yet
The AVL Trees
64 pages
Intro:: Forget To Add Your Own Data and This Data To The Chart Below So That You Have A Total of 12
No ratings yet
Intro:: Forget To Add Your Own Data and This Data To The Chart Below So That You Have A Total of 12
5 pages
ch07 PDF
100% (2)
ch07 PDF
85 pages
2017 Dec Reddyvari Mode Suppression
No ratings yet
2017 Dec Reddyvari Mode Suppression
9 pages
UFO Evidence File
100% (1)
UFO Evidence File
388 pages
Statistics - 2 LG
No ratings yet
Statistics - 2 LG
14 pages
Preparation of Calibration Curves A Guide To Best Practice
No ratings yet
Preparation of Calibration Curves A Guide To Best Practice
31 pages
Types of Research Data
No ratings yet
Types of Research Data
5 pages
Definition of Terms 1. Statistics
No ratings yet
Definition of Terms 1. Statistics
25 pages
1 PDF
No ratings yet
1 PDF
2 pages
The Stack
No ratings yet
The Stack
44 pages
Interpreting The One-Way MANOVA
No ratings yet
Interpreting The One-Way MANOVA
4 pages
Queueing Systems: Learning Objectives
No ratings yet
Queueing Systems: Learning Objectives
69 pages
Lecture 1 PDF
No ratings yet
Lecture 1 PDF
2 pages
Chapter1 Statistic
No ratings yet
Chapter1 Statistic
33 pages
Marginally Gaussian But Not Jointly Gaussian PDF
No ratings yet
Marginally Gaussian But Not Jointly Gaussian PDF
2 pages
Research: Miss Kassandra Venzuelo
No ratings yet
Research: Miss Kassandra Venzuelo
54 pages
Software Measurement
No ratings yet
Software Measurement
38 pages
Elementary Statistics: Frequency Distribution " "
No ratings yet
Elementary Statistics: Frequency Distribution " "
15 pages
Elementary Statistical Methods 7th Edition - 678
No ratings yet
Elementary Statistical Methods 7th Edition - 678
383 pages
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
No ratings yet
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
12 pages
Lecture Notes For STAT2602
No ratings yet
Lecture Notes For STAT2602
104 pages
27 Aug 2011 Test Along With Answers
No ratings yet
27 Aug 2011 Test Along With Answers
4 pages
Frequency Analysis in Hydrology
No ratings yet
Frequency Analysis in Hydrology
24 pages
MAT 211 Introduction To Business Statistics I Lecture Notes
No ratings yet
MAT 211 Introduction To Business Statistics I Lecture Notes
69 pages
Discovering Bitcoin's Public Topology and Influential Nodes
No ratings yet
Discovering Bitcoin's Public Topology and Influential Nodes
17 pages
Phillips Nagle Chapters 1 and 2
No ratings yet
Phillips Nagle Chapters 1 and 2
105 pages

Cambridge Books Online

Uploaded by

Cambridge Books Online

Uploaded by

Cambridge Books Online

Probability, Random Processes, and Statistical Analysis

Applications to Communications, Signal Processing, Queueing Theory and

Hisashi Kobayashi, Brian L. Mark, William Turin

Book DOI: http://dx.doi.org/10.1017/CBO9780511977770

Online ISBN: 9780511977770

Hardback ISBN: 9780521895446

6 - Fundamentals of statistical data analysis pp. 138-156

Chapter DOI: http://dx.doi.org/10.1017/CBO9780511977770.007

Cambridge University Press

The theory of statistics involves interpreting a set of finite observations as a sample

6.1 Sample mean and sample variance

Each sample xi can be viewed as an instance or realization of the associated RV X i .

Taking the expectation of both sides in (6.2), we have

Since X 1 , X 2 , . . ., X n are independent and identically distributed (i.i.d.) RVs, we have

which asserts that x of (6.1) is an unbiased estimate of μ X . An unbiased estimate is

Var [X ] = E[(X − E[X ])2 ], (6.5)

where X − E[X ] = X − μ X can be rewritten as

This quantity can be viewed as an instance of the sample variance variable

Taking expectations, we have

6.2 Relative frequency and histograms

is, as defined in (2.1), the relative frequency of the jth value.

6.3 Graphical presentations

6.3.1 Histogram on probability paper

u P = −1 (P). (6.21)

Therefore, the staircase function plot

provides an estimate of the straight line

6.3.1.2 Testing the log-normal distribution hypothesis

2 We use log to mean the natural logarithm; i.e., log or ln.

of a simulated set of n = 50 sample points xi , where xi = e yi and yi is drawn from

6.3.2 Log-survivor function curve

S X (t) P[X > t] = 1 − FX (t), (6.27)

log S X (t) = log(1 − FX (t)). (6.28)

FX (x) = π1 (1 − e−α1 x ) + π2 (1 − e−α2 x ), α1 > α2 , π1 + π2 = 1, (6.29)

log S X (t) = log(π1 e−α1 t + π2 e−α2 t )

The sample log-survivor function or empirical log-survivor function is similarly

log[1 − H (t)], (6.31)

π1 = 0.0526, π2 = 1 − π1 , α1 = 0.1, and α2 = 2.0. (6.34)

6.3.2.1 Testing the Pareto distribution hypothesis

Analogous to the mixed exponential distribution, a mixed Pareto distribution is

6.3.3 Hazard function and mean residual life curves

where in the log-normal distribution the parameters μY and σY are given as

from which we have

At t = 0, the mean residual life becomes

6.3.4 Dot diagram and correlation coefficient

σ X Y Cov[X, Y ] = E[(X − μ X )(Y − μY )] = E[X Y ] − μ X μY . (6.45)

We say X and Y are uncorrelated if σ X Y = 0.

The correlation coefficient always satisfies the condition

from which we have

depending on whether ab is positive or negative. Conversely, if ρ = ±1, then it implies

The sample covariance of the two variables based on observations {(xi , yi ); 1 ≤ i ≤ n}

6.4 Summary of Chapter 6

6.5 Discussion and further reading

before we can narrow down proper directions of mathematical modeling or analysis of

Section 6.1: Sample mean and sample variance

with the initial values

Section 6.2: Relative frequency and histograms

(a) Show that the expectation of the RV h j (X) is given by

Section 6.3: Graphical presentations

6.8 Hazard function of a k-stage hyperexponential distribution. Consider the

S X (r |t) P[R > r |X > t], (6.56)

6.13 Mean residual life functions.

Let the probability density function of X be given by

You might also like

u P = −1 (P). (6.21)