Understanding Multivariate Distributions
Understanding Multivariate Distributions
Multivariate Distributions
D = {(0, 0), (0, 1), (1, 1), (1, 2), (2, 2), (2, 3)}.
Thus X1 and X2 are two random variables defined on the space C, and, in this
example, the space of these random variables is the two-dimensional set D, which is
a subset of two-dimensional Euclidean space R2 . Hence (X1 , X2 ) is a vector function
from C to D. We now formulate the definition of a random vector.
85
86 Multivariate Distributions
with random variables in Section 1.5 we can uniquely define PX1 ,X2 in terms of the
cumulative distribution function (cdf), which is given by
for all (x1 , x2 ) ∈ R2 . Because X1 and X2 are random variables, each of the events
in the above intersection and the intersection of the events are events in the original
sample space C. Thus the expression is well defined. As with random variables, we
write P [{X1 ≤ x1 } ∩ {X2 ≤ x2 }] as P [X1 ≤ x1 , X2 ≤ x2 ]. As Exercise 2.1.3 shows,
Hence, all induced probabilities of sets of the form (a1 , b1 ]×(a2 , b2 ] can be formulated
in terms of the cdf. We often call this cdf the joint cumulative distribution
function of (X1 , X2 ).
As with random variables, we are mainly concerned with two types of random
vectors, namely discrete and continuous. We first discuss the discrete type.
A random vector (X1 , X2 ) is a discrete random vector if its space D is finite
or countable. Hence, X1 and X2 are both discrete also. The joint probability
mass function (pmf) of (X1 , X2 ) is defined by
for all (x1 , x2 ) ∈ D. As with random variables, the pmf uniquely defines the cdf. It
also is characterized by the two properties
(i) 0 ≤ pX1 ,X2 (x1 , x2 ) ≤ 1 and (ii) pX1 ,X2 (x1 , x2 ) = 1. (2.1.4)
D
Example 2.1.1. Consider the example at the beginning of this section where a fair
coin is flipped three times and X1 and X2 are the number of heads on the first two
flips and all 3 flips, respectively. We can conveniently table the pmf of (X1 , X2 ) as
Support of X2
0 1 2 3
1 1
0 8 8 0 0
2 2
Support of X1 1 0 8 8 0
1 1
2 0 0 8 8
In the next example, we use the general fact that double integrals can be ex-
pressed as iterated univariate integrals. Thus double integrations can be carried
out using iterated univariate integrations. This is discussed in some detail with
examples in Section 4.2 of the accompanying resource Mathematical Comments.2
The aid of a simple sketch of the region of integration is valuable in setting up the
upper and lower limits of integration for each of the iterated integrals.
Example 2.1.3. Suppose an electrical component has two batteries. Let X and Y
denote the lifetimes in standard units of the respective batteries. Assume that the
pdf of (X, Y ) is " 2 2
4xye−(x +y ) x > 0, y > 0
f (x, y) =
0 elsewhere.
The surface z = f (x, y) is sketched in Figure 2.1.1 where the grid squares are
0.1 by 0.1. From the figure, the pdf peaks at about (x, y) = (0.7, 0.7). Solving
the equations ∂f /∂x = 0 and ∂f /∂y =√0 simultaneously
√ shows that actually the
maximum of f (x, y) occurs at (x, y) = ( 2/2, 2/2). The batteries are more likely
to die in regions near the peak. The surface tapers to 0 as x and y get large in√any
direction. for instance, the probability that both batteries survive beyond 2/2
units is given by
√ √ ∞ ∞
2 2 2 2
P X> ,Y > = √ √ 4xye−(x +y ) dxdy
2 2 2/2 2/2
∞ ∞
2 2
= √ 2xe−x √ 2ye−y dy dx
2/2 2/2
∞ ∞ ( )2
−z −w
= e e dw dz = e−1/2 ≈ 0.3679,
1/2 1/2
x y
Figure 2.1.1: A sketch of the the surface of the joint pdf discussed in Example
2.1.3. On the figure, the origin is located at the intersection of the x and z axes
and the grid squares are 0.1 by 0.1, so points are
√ easily
√ located. As discussed in the
text, the peak of the pdf occurs at the point ( 2/2, 2/2).
Likewise we may extend the pmf pX1 ,X2 (x1 , x2 ) over a convenient set by using zero
elsewhere. Hence, we replace
pX1 ,X2 (x1 , x2 ) by p(x1 , x2 ).
D x2 x1
Table 2.1.1: Joint and Marginal Distributions for the discrete random vector
(X1 , X2 ) of Example 2.1.1.
Support of X2
0 1 2 3 pX1 (x1 )
1 1 2
0 8 8 0 0 8
2 2 4
Support of X1 1 0 8 8 0 8
1 1 2
2 0 0 8 8 8
1 3 3 1
pX2 (x2 ) 8 8 8 8
for all x1 ∈ R. By Theorem 1.3.6 we can write this equation as FX1 (x1 ) =
limx2 ↑∞ F (x1 , x2 ). Thus we have a relationship between the cdfs, which we can
extend to either the pmf or pdf depending on whether (X1 , X2 ) is discrete or con-
tinuous.
First consider the discrete case. Let DX1 be the support of X1 . For x1 ∈ DX1 ,
Equation (2.1.7) is equivalent to
$ *
FX1 (x1 ) = pX1 ,X2 (w1 , x2 ) = pX1 ,X2 (w1 , x2 ) .
w1 ≤x1 ,−∞<x2 <∞ w1 ≤x1 x2 <∞
By the uniqueness of cdfs, the quantity in braces must be the pmf of X1 evaluated
at w1 ; that is,
pX1 (x1 ) = pX1 ,X2 (x1 , x2 ), (2.1.8)
x2 <∞
for all x1 ∈ DX1 . Hence, to find the probability that X1 is x1 , keep x1 fixed and
sum pX1 ,X2 over all of x2 . In terms of a tabled joint pmf with rows comprised of
X1 support values and columns comprised of X2 support values, this says that the
distribution of X1 can be obtained by the marginal sums of the rows. Likewise, the
pmf of X2 can be obtained by marginal sums of the columns.
Consider the joint discrete distribution of the random vector (X1 , X2 ) as pre-
sented in Example 2.1.1. In Table 2.1.1, we have added these marginal sums. The
final row of this table is the pmf of X2 , while the final column is the pmf of X1 .
In general, because these distributions are recorded in the margins of the table, we
often refer to them as marginal pmfs.
x2
x1 1 2 p1 (x1 )
1 1 2
1 10 10 10
1 2 3
2 10 10 10
2 3 5
3 10 10 10
4 6
p2 (x2 ) 10 10
The joint probabilities have been summed in each row and each column and these
sums recorded in the margins to give the marginal probability mass functions of X1
and X2 , respectively. Note that it is not necessary to have a formula for p(x1 , x2 )
to do this.
We next consider the continuous case. Let DX1 be the support of X1 . For
x1 ∈ DX1 , Equation (2.1.7) is equivalent to
x1 ∞ x1 " ∞ #
FX1 (x1 ) = fX1 ,X2 (w1 , x2 ) dx2 dw1 = fX1 ,X2 (w1 , x2 ) dx2 dw1 .
−∞ −∞ −∞ −∞
By the uniqueness of cdfs, the quantity in braces must be the pdf of X1 , evaluated
at w1 ; that is, ∞
fX1 (x1 ) = fX1 ,X2 (x1 , x2 ) dx2 (2.1.9)
−∞
for all x1 ∈ DX1 . Hence, in the continuous case the marginal pdf of X1 is found by
integrating out x2 . Similarly, the marginal pdf of X2 is found by integrating out
x1 .
Example 2.1.5 (Example 2.1.2, continued). Consider the vector of continuous
random variables (X, Y ) discussed in Example 2.1.2. The space of the random
vector is the unit circle with center at (0, 0) as shown in Figure 2.1.2. To find the
marginal
√ distribution
√ of X, fix x between −1 and 1 and then integrate out y from
− 1 − x2 to 1 − x2 as the arrow shows on Figure 2.1.2. Hence, the marginal pdf
of X is √1−x2
1 2
fX (x) = √ dy = 1 − x2 , −1 < x < 1.
− 1−x2 π π
Although (X, Y ) has a joint uniform distribution, the distribution of X is unimodal
with peak at 0. This is not surprising. Since the joint distribution is uniform, from
Figure 2.1.2 X is more likely to be near 0 than at either extreme −1 or 1. Because
the joint pdf is symmetric in x and y, the marginal pdf of Y is the same as that of
X.
1.0
( x, 1 − x2 )
0.5
0.0
y
−0.5
( x, − 1 − x2 )
−1.0
Figure 2.1.2: Region of integration for Example 2.1.5. It depicts the integration
with respect to y at a fixed but arbitrary x.
Notice the space of the random vector is the interior of the square with vertices
(0, 0), (1, 0), (1, 1) and (0, 1). The marginal pdf of X1 is
1
f1 (x1 ) = (x1 + x2 ) dx2 = x1 + 12 , 0 < x1 < 1,
0
zero elsewhere. A probability like P (X1 ≤ 12 ) can be computed from either f1 (x1 )
or f (x1 , x2 ) because
1/2 1 1/2
f (x1 , x2 ) dx2 dx1 = f1 (x1 ) dx1 = 83 .
0 0 0
Suppose, though, we want to find the probability P (X1 + X2 ≤ 1). Notice that
the region of integration is the interior of the triangle with vertices (0, 0), (1, 0) and
2.1. Distributions of Two Random Variables 93
(0, 1). The reader should sketch this region on the space of (X1 , X2 ). Fixing x1 and
integrating with respect to x2 , we have
1 1−x1
P (X1 + X2 ≤ 1) = (x1 + x2 ) dx2 dx1
0 0
1
(1 − x1 )2
= x1 (1 − x1 ) + dx1
0 2
1
1 1 2 1
= − x dx1 = .
0 2 2 1 3
This latter probability is the volume under the surface f (x1 , x2 ) = x1 + x2 above
the set {(x1 , x2 ) : 0 < x1 , x1 + x2 ≤ 1}.
Example 2.1.7 (Example 2.1.3, Continued). Recall that the random variables X
and Y of Example 2.1.3 were the lifetimes of two batteries installed in an electrical
component. The joint pdf of (X, Y ) is sketched in Figure 2.1.1. Its space is the
positive quadrant of R2 so there are no constraints involving both x and y. Using
the change-in-variable w = y 2 , the marginal pdf of X is
∞ ∞
−(x2 +y 2 ) −x2 2
fX (x) = 4xye dy = 2xe e−w dw = 2xe−x ,
0 0
for x > 0. By the symmetry of x and y in the model, the pdf of Y is the same as
that of X. To determine the median lifetime, θ, of these batteries, we need to solve
θ
1 2 2
= 2xe−x dx = 1 − e−θ ,
2 0
2
where again we have made
√ use of the change-in-variables z = x . Solving this
equation, we obtain θ = log 2 ≈ 0.8326. So 50% of the batteries have lifetimes
exceeding 0.83 units.
2.1.2 Expectation
The concept of expectation extends in a straightforward manner. Let (X1 , X2 ) be a
random vector and let Y = g(X1 , X2 ) for some real-valued function; i.e., g : R2 → R.
Then Y is a random variable and we could determine its expectation by obtaining
the distribution of Y . But Theorem 1.8.1 is true for random vectors also. Note the
proof we gave for this theorem involved the discrete case, and Exercise 2.1.12 shows
its extension to the random vector case.
Suppose (X1 , X2 ) is of the continuous type. Then E(Y ) exists if
∞ ∞
|g(x1 , x2 )|fX1 ,X2 (x1 , x2 ) dx1 dx2 < ∞.
−∞ −∞
Then ∞ ∞
E(Y ) = g(x1 , x2 )fX1 ,X2 (x1 , x2 ) dx1 dx2 . (2.1.10)
−∞ −∞
94 Multivariate Distributions
Then
E(Y ) = g(x1 , x2 )pX1 ,X2 (x1 , x2 ). (2.1.11)
x1 x2
Proof: We prove it for the continuous case. The existence of the expected value of
k1 Y1 + k2 Y2 follows directly from the triangle inequality and linearity of integrals;
i.e.,
∞ ∞
|k1 g1 (x1 , x2 ) + k2 g1 (x1 , x2 )|fX1 ,X2 (x1 , x2 ) dx1 dx2
−∞ −∞
∞ ∞
≤ |k1 | |g1 (x1 , x2 )|fX1 ,X2 (x1 , x2 ) dx1 dx2
−∞ −∞
∞ ∞
+ |k2 | |g2 (x1 , x2 )|fX1 ,X2 (x1 , x2 ) dx1 dx2 < ∞.
−∞ −∞
We also note that the expected value of any function g(X2 ) of X2 can be found
in two ways:
∞ ∞ ∞
E(g(X2 )) = g(x2 )f (x1 , x2 ) dx1 dx2 = g(x2 )fX2 (x2 ) dx2 ,
−∞ −∞ −∞
the latter single integral being obtained from the double integral by integrating on
x1 first. The following example illustrates these ideas.
2.1. Distributions of Two Random Variables 95
In addition,
1 x2
E(X2 ) = x2 (8x1 x2 ) dx1 dx2 = 45 .
0 0
Since X2 has the pdf f2 (x2 ) = 4x32 , 0 < x2 < 1, zero elsewhere, the latter expecta-
tion can also be found by
1
E(X2 ) = x2 (4x32 ) dx2 = 45 .
0
Example 2.1.9. Continuing with Example 2.1.8, suppose the random variable Y
is defined by Y = X1 /X2 . We determine E(Y ) in two ways. The first way is by
definition; i.e., find the distribution of Y and then determine its expectation. The
cdf of Y , for 0 < y ≤ 1, is
1 yx2
FY (y) = P (Y ≤ y) = P (X1 ≤ yX2 ) = 8x1 x2 dx1 dx2
0 0
1
= 4y 2 x32 dx2 = y 2 .
0
which leads to 1
2
E(Y ) = y(2y) dy = .
0 3
96 Multivariate Distributions
1.0
0.8
( 0, y ) ( y, y )
0.6
y
0.4
0.2
0.0
Figure 2.1.3: Region of integration for Example 2.1.8. The arrow depicts the
integration with respect to x1 at a fixed but arbitrary x2 .
For the second way, we make use of expression (2.1.10) and find E(Y ) directly by
1 " x2 #
X1 x1
E(Y ) = E = 8x1 x2 dx1 dx2
X2 0 0 x2
1
8 3 2
= x dx2 = .
0 3 2 3
so it is quite similar to the mgf of a random variable. Also, the mgfs of X1 and X2
are immediately seen to be MX1 ,X2 (t1 , 0) and MX1 ,X2 (0, t2 ), respectively. If there
is no confusion, we often drop the subscripts on M .
2.1. Distributions of Two Random Variables 97
Example 2.1.10. Let the continuous-type random variables X and Y have the
joint pdf " −y
e 0<x<y<∞
f (x, y) =
0 elsewhere.
The reader should sketch the space of (X, Y ). The mgf of this joint distribution is
∞ ∞
M (t1 , t2 ) = exp(t1 x + t2 y − y) dy dx
0 x
1
= ,
(1 − t1 − t2 )(1 − t2 )
provided that t1 + t2 < 1 and t2 < 1. Furthermore, the moment-generating func-
tions of the marginal distributions of X and Y are, respectively,
1
M (t1 , 0) = , t1 < 1
1 − t1
1
M (0, t2 ) = , t2 < 1.
(1 − t2 )2
These moment-generating functions are, of course, respectively, those of the
marginal probability density functions,
∞
f1 (x) = e−y dy = e−x , 0 < x < ∞,
x
zero elsewhere.
We also need to define the expected value of the random vector itself, but this
is not a new concept because it is defined in terms of componentwise expectation:
Definition 2.1.3 (Expected Value of a Random Vector). Let X = (X1 , X2 ) be a
random vector. Then the expected value of X exists if the expectations of X1 and
X2 exist. If it exists, then the expected value is given by
E(X1 )
E[X] = . (2.1.14)
E(X2 )
EXERCISES
2.1.1. Let f (x1 , x2 ) = 4x1 x2 , 0 < x1 < 1, 0 < x2 < 1, zero elsewhere, be the pdf
of X1 and X2 . Find P (0 < X1 < 12 , 14 < X2 < 1), P (X1 = X2 ), P (X1 < X2 ), and
P (X1 ≤ X2 ).
Hint: Recall that P (X1 = X2 ) would be the volume under the surface f (x1 , x2 ) =
4x1 x2 and above the line segment 0 < x1 = x2 < 1 in the x1 x2 -plane.
98 Multivariate Distributions
2.1.3. Let F (x, y) be the distribution function of X and Y . For all real constants
a < b, c < d, show that P (a < X ≤ b, c < Y ≤ d) = F (b, d) − F (b, c) − F (a, d) +
F (a, c).
2.1.4. Show that the function F (x, y) that is equal to 1 provided that x + 2y ≥ 1,
and that is equal to zero provided that x + 2y < 1, cannot be a distribution function
of two random variables.
Hint: Find four numbers a < b, c < d, so that
2.1.5. Given that the nonnegative function g(x) has the property that
∞
g(x) dx = 1,
0
show that
2g( x21 + x22 )
f (x1 , x2 ) = , 0 < x1 < ∞, 0 < x2 < ∞,
π x21 + x22
zero elsewhere, satisfies the conditions for a pdf of two continuous-type random
variables X1 and X2 .
Hint: Use polar coordinates.
(a) Show that P (a < X < b, c < Y < d) = (exp{−a2 } − exp{−b2 })(exp{−c2 } −
exp{−d2 }).
(b) Using Part (a) and the notation in Example 2.1.3, show that P [(X, Y ) ∈ A] =
0.1879 while P [(X, Y ) ∈ B] = 0.0026.
(c) Show that the following R program computes P (a < X < b, c < Y < d).
Then use it to compute the probabilities in Part (b).
2.1.7. Let f (x, y) = e−x−y , 0 < x < ∞, 0 < y < ∞, zero elsewhere, be the pdf of
X and Y . Then if Z = X + Y , compute P (Z ≤ 0), P (Z ≤ 6), and, more generally,
P (Z ≤ z), for 0 < z < ∞. What is the pdf of Z?
2.1. Distributions of Two Random Variables 99
2.1.8. Let X and Y have the pdf f (x, y) = 1, 0 < x < 1, 0 < y < 1, zero elsewhere.
Find the cdf and pdf of the product Z = XY .
2.1.9. Let 13 cards be taken, at random and without replacement, from an ordinary
deck of playing cards. If X is the number of spades in these 13 cards, find the pmf of
X. If, in addition, Y is the number of hearts in these 13 cards, find the probability
P (X = 2, Y = 5). What is the joint pmf of X and Y ?
2.1.10. Let the random variables X1 and X2 have the joint pmf described as follows:
2.1.13. Let X1 , X2 be two random variables with the joint pmf p(x1 , x2 ) = (x1 +
x2 )/12, for x1 = 1, 2, x2 = 1, 2, zero elsewhere. Compute E(X1 ), E(X12 ), E(X2 ),
E(X22 ), and E(X1 X2 ). Is E(X1 X2 ) = E(X1 )E(X2 )? Find E(2X1 − 6X22 + 7X1 X2 ).
2.1.14. Let X1 , X2 be two random variables with joint pdf f (x1 , x2 ) = 4x1 x2 ,
0 < x1 < 1, 0 < x2 < 1, zero elsewhere. Compute E(X1 ), E(X12 ), E(X2 ), E(X22 ),
and E(X1 X2 ). Is E(X1 X2 ) = E(X1 )E(X2 )? Find E(3X2 − 2X12 + 6X1 X2 ).
2.1.15. Let X1 , X2 be two random variables with joint pmf p(x1 , x2 ) = (1/2)x1 +x2 ,
for 1 ≤ xi < ∞, i = 1, 2, where x1 and x2 are integers, zero elsewhere. Determine
the joint mgf of X1 , X2 . Show that M (t1 , t2 ) = M (t1 , 0)M (0, t2 ).
2.1.16. Let X1 , X2 be two random variables with joint pdf f (x1 , x2 ) = x1 exp{−x2 },
for 0 < x1 < x2 < ∞, zero elsewhere. Determine the joint mgf of X1 , X2 . Does
M (t1 , t2 ) = M (t1 , 0)M (0, t2 )?
2.1.17. Let X and Y have the joint pdf f (x, y) = 6(1 − x − y), x + y < 1, 0 < x,
0 < y, zero elsewhere. Compute P (2X + 3Y < 1) and E(XY + 2X 2 ).
100 Multivariate Distributions
Example 2.2.1. In a large metropolitan area during flu season, suppose that two
strains of flu, A and B, are occurring. For a given week, let X1 and X2 be the
respective number of reported cases of strains A and B with the joint pmf
∞
μx1 1 −1
= e−μ1 x1 μ1 · 1 = μ1 .
x1 =1
(x1 − 1)!
the Preface.
2.2. Transformations: Bivariate Random Variables 101
e−μ1 −μ2
y1
y1 !
= μy1 −y2 μy22
y1 ! y =0 (y1 − y2 )!y2 ! 1
2
For the continuous case we begin with an example that illustrates the cdf tech-
nique.
Example 2.2.2. Consider an experiment in which a person chooses at random
a point (X1 , X2 ) from the unit square S = {(x1 , x2 ) : 0 < x1 < 1, 0 < x2 < 1}.
Suppose that our interest is not in X1 or in X2 but in Z = X1 + X2 . Once a suitable
probability model has been adopted, we shall see how to find the pdf of Z. To be
specific, let the nature of the random experiment be such that it is reasonable to
assume that the distribution of probability over the unit square is uniform. Then
the pdf of X1 and X2 may be written
"
1 0 < x1 < 1, 0 < x2 < 1
fX1 ,X2 (x1 , x2 ) = (2.2.1)
0 elsewhere,
and this describes the probability model. Now let the cdf of Z be denoted by
FZ (z) = P (X1 + X2 ≤ z). Then
⎧
⎪
⎪ 0 z<0
⎪ z z−x1
⎨ z2
0 0
dx 2 dx1 = 2 0≤z<1
FZ (z) = 1 1 (2−z)2
⎪
⎪ 1 − z−1 z−x1 dx2 dx1 = 1 − 2 1≤z<2
⎪
⎩
1 2 ≤ z.
102 Multivariate Distributions
Since FZ (z) exists for all values of z, the pmf of Z may then be written
⎧
⎨ z 0<z<1
fZ (z) = 2−z 1≤z <2 (2.2.2)
⎩
0 elsewhere.
In the last example, we used the cdf technique to find the distribution of the
transformed random vector. Recall in Chapter 1, Theorem 1.7.1 gave a transfor-
mation technique to directly determine the pdf of the transformed random variable
for one-to-one transformations. As discussed in Section 4.1 of the accompanying re-
source Mathematical Comments,4 this is based on the change-in-variable technique
for univariate integration. Further Section 4.2 of this resource shows that a simi-
lar change-in-variable technique exists for multiple integration. We now discuss in
general the transformation technique for the continuous case based on this theory.
Let (X1 , X2 ) have a jointly continuous distribution with pdf fX1 ,X2 (x1 , x2 ) and
support set S. Consider the transformed random vector (Y1 , Y2 ) = T (X1 , X2 ) where
T is a one-to-one continuous transformation. Let T = T (S) denote the support of
(Y1 , Y2 ). The transformation is depicted in Figure 2.2.1. Rewrite the transforma-
tion in terms of its components as (Y1 , Y2 ) = T (X1 , X2 ) = (u1 (X1 , X2 ), u2 (X1 , X2 )),
where the functions y1 = u1 (x1 , x2 ) and y2 = u2 (x1 , x2 ) define T . Since the trans-
formation is one-to-one, the inverse transformation T −1 exists. We write it as
x1 = w1 (y1 , y2 ), x2 = w2 (y1 , y2 ). Finally, we need the Jacobian of the transfor-
mation which is the determinant of order 2 given by
∂x1 ∂x1
∂y
J = 1 ∂y2
.
∂x 2 ∂x2
∂y ∂y
1 2
Note that J plays the role of dx/dy in the univariate case. We assume that these
first-order partial derivatives are continuous and that the Jacobian J is not identi-
cally equal to zero in T .
Let B be any region5 in T and let A = T −1 (B) as shown in Figure 2.2.1.
Because the transformation T is one-to-one, P [(X1 , X2 ) ∈ A] = P [T (X1 , X2 ) ∈
T (A)] = P [(Y1 , Y2 ) ∈ B]. Then based on the change-in-variable technique, cited
above, we have
P [(X1 , X2 ) ∈ A] = fX1 ,X2 (x1 , x2 ) dx1 dy2
A
= fX1 ,X2 [T −1 (y1 , y2 )]|J| dy1 dy2
T (A)
= fX1 ,X2 [w1 (y1 , y2 ), w2 (y1 , y2 )]|J| dy1 dy2 .
B
4 See the reference for Mathematical Comments in the Preface.
5 Technically an event in the support of (Y1 , Y2 ).
2.2. Transformations: Bivariate Random Variables 103
Since B is arbitrary, the last integrand must be the joint pdf of (Y1 , Y2 ). That is
the pdf of (Y1 , Y2 ) is
"
fX1 ,X2 [w1 (y1 , y2 ), w2 (y1 , y2 )]|J| (y1 , y2 ) ∈ T
fY1 ,Y2 (y1 , y2 ) = (2.2.3)
0 elsewhere.
y2
x2
T
B
S
y1
x1
Figure 2.2.1: A general sketch of the supports of (X1 , X2 ), (S), and (Y1 , Y2 ), (T ).
Example 2.2.3. Reconsider Example 2.2.2, where (X1 , X2 ) have the uniform dis-
tribution over the unit square with the pdf given in expression (2.2.1). The support
of (X1 , X2 ) is the set S = {(x1 , x2 ) : 0 < x1 < 1, 0 < x2 < 1} as depicted in Figure
2.2.2.
Suppose Y1 = X1 + X2 and Y2 = X1 − X2 . The transformation is given by
y1 = u1 (x1 , x2 ) = x1 + x2
y2 = u2 (x1 , x2 ) = x1 − x2 .
x1 = w1 (y1 , y2 ) = 12 (y1 + y2 )
x2 = w2 (y1 , y2 ) = 12 (y1 − y2 ).
To determine the set S in the y1 y2 -plane onto which T is mapped under the transfor-
mation, note that the boundaries of S are transformed as follows into the boundaries
104 Multivariate Distributions
x2
x2 = 1
x1 = 0 S x1 = 1
x1
(0, 0) x2 = 0
of T :
x1 = 0 into 0 = 12 (y1 + y2 )
x1 = 1 into 1 = 12 (y1 + y2 )
x2 = 0 into 0 = 12 (y1 − y2 )
x2 = 1 into 1 = 12 (y1 − y2 ).
Accordingly, T is shown in Figure 2.2.3. Next, the Jacobian is given by
∂x1 ∂x1
1 1
∂y1 ∂y2 2 2 1
J =
∂x2 ∂x2 = 1 1 = −2.
2 −2
∂y1 ∂y2
y2
y2 = y 1 y2 = 2 – y1
T
(0, 0) y1
y2 = –y1 y2 = y1 – 2
which agrees with expression (2.2.2) of Example 2.2.2. In a similar manner, the
marginal pdf fY2 (y2 ) is given by
⎧ y +2
⎪ 1
2
⎨ −y2 2 dy1 = y2 + 1 −1 < y2 ≤ 0
2−y2 1
2 dy1 = 1 − y2
fY2 (y2 ) = 0 < y2 < 1
⎪
⎩ y2
0 elsewhere.
Example 2.2.4. Let Y1 = 12 (X1 − X2 ), where X1 and X2 have the joint pdf
" 1
4 exp − x1 +x
2
2
0 < x1 < ∞, 0 < x2 < ∞
fX1 ,X2 (x1 , x2 ) =
0 elsewhere.
which defines the support set T of (Y1 , Y2 ). Hence, the joint pdf of (Y1 , Y2 ) is
zero elsewhere.
2.2. Transformations: Bivariate Random Variables 107
in the continuous case, with summations replacing integrals in the discrete case.
Certainly, that function g(X1 , X2 ) could be exp{tu(X1 , X2 )}, so that in reality
we would be finding the mgf of the function Z = u(X1 , X2 ). If we could then
recognize this mgf as belonging to a certain distribution, then Z would have that
distribution. We give two illustrations that demonstrate the power of this technique
by reconsidering Examples 2.2.1 and 2.2.4.
Example 2.2.6 (Continuation of Example 2.2.1). Here X1 and X2 have the joint
pmf
$ x1 x2 −μ −μ
μ1 μ2 e 1 e 2
pX1 ,X2 (x1 , x2 ) = x1 !x2 ! x1 = 0, 1, 2, 3, . . . , x2 = 0, 1, 2, 3, . . .
0 elsewhere,
where μ1 and μ2 are fixed positive real numbers. Let Y = X1 + X2 and consider
∞
∞
E(etY ) = et(x1 +x2 ) pX1 ,X2 (x1 , x2 )
x1 =0 x2 =0
∞ x1 −μ1 ∞
tx1 μ e μx2 e−μ2
= e etx2
x1 =0
x1 ! x2 =0
x2 !
∞
∞
(et μ1 )x1 (et μ2 )x2
−μ1 −μ2
= e e
x1 =0
x1 ! x2 =0
x2 !
t
t
= eμ1 (e −1) eμ2 (e −1)
t
= e(μ1 +μ2 )(e −1)
.
Notice that the factors in the brackets in the next-to-last equality are the mgfs of
X1 and X2 , respectively. Hence, the mgf of Y is the same as that of X1 except μ1
has been replaced by μ1 + μ2 . Therefore, by the uniqueness of mgfs, the pmf of Y
must be
(μ1 + μ2 )y
pY (y) = e−(μ1 +μ2 ) , y = 0, 1, 2, . . . ,
y!
which is the same pmf that was obtained in Example 2.2.1.
Example 2.2.7 (Continuation of Example 2.2.4). Here X1 and X2 have the joint
pdf " 1
4 exp − x1 +x
2
2
0 < x1 < ∞, 0 < x2 < ∞
fX1 ,X2 (x1 , x2 ) =
0 elsewhere.
108 Multivariate Distributions
EXERCISES
2.2.1. If p(x1 , x2 ) = ( 23 )x1 +x2 ( 13 )2−x1 −x2 , (x1 , x2 ) = (0, 0), (0, 1), (1, 0), (1, 1), zero
elsewhere, is the joint pmf of X1 and X2 , find the joint pmf of Y1 = X1 − X2 and
Y2 = X1 + X2 .
2.2.2. Let X1 and X2 have the joint pmf p(x1 , x2 ) = x1 x2 /36, x1 = 1, 2, 3 and
x2 = 1, 2, 3, zero elsewhere. Find first the joint pmf of Y1 = X1 X2 and Y2 = X2 ,
and then find the marginal pmf of Y1 .
2.2.3. Let X1 and X2 have the joint pdf h(x1 , x2 ) = 2e−x1 −x2 , 0 < x1 < x2 < ∞,
zero elsewhere. Find the joint pdf of Y1 = 2X1 and Y2 = X2 − X1 .
2.2.4. Let X1 and X2 have the joint pdf h(x1 , x2 ) = 8x1 x2 , 0 < x1 < x2 < 1, zero
elsewhere. Find the joint pdf of Y1 = X1 /X2 and Y2 = X2 .
Hint: Use the inequalities 0 < y1 y2 < y2 < 1 in considering the mapping from S
onto T .
2.2.5. Let X1 and X2 be continuous random variables with the joint probability
density function fX1 ,X2 (x1 , x2 ), −∞ < xi < ∞, i = 1, 2. Let Y1 = X1 + X2 and
Y2 = X2 .
(a) Find the joint pdf fY1 ,Y2 .
(b) Show that ∞
fY1 (y1 ) = fX1 ,X2 (y1 − y2 , y2 ) dy2 , (2.2.5)
−∞
which is sometimes called the convolution formula.
2.3. Conditional Distributions and Expectations 109
2.2.6. Suppose X1 and X2 have the joint pdf fX1 ,X2 (x1 , x2 ) = e−(x1 +x2 ) , 0 < xi <
∞, i = 1, 2, zero elsewhere.
(a) Use formula (2.2.5) to find the pdf of Y1 = X1 + X2 .
(b) Find the mgf of Y1 .
2.2.7. Use the formula (2.2.5) to find the pdf of Y1 = X1 + X2 , where X1 and X2
have the joint pdf fX1 ,X2 (x1 , x2 ) = 2e−(x1 +x2 ) , 0 < x1 < x2 < ∞, zero elsewhere.
2.2.8. Suppose X1 and X2 have the joint pdf
" −x −x
e 1 e 2 x1 > 0, x2 > 0
f (x1 , x2 ) =
0 elsewhere.
For constants w1 > 0 and w2 > 0, let W = w1 X1 + w2 X2 .
(a) Show that the pdf of W is
" 1 −w/w1
w1 −w2 (e − e−w/w2 ) w>0
fW (w) =
0 elsewhere.
For any fixed x1 with pX1 (x1 ) > 0, this function pX2 |X1 (x2 |x1 ) satisfies the con-
ditions of being a pmf of the discrete type because pX2 |X1 (x2 |x1 ) is nonnegative
and
pX ,X (x1 , x2 )
pX2 |X1 (x2 |x1 ) = 1 2
x x
p X1 (x1 )
2 2
1 pX1 (x1 )
= pX1 ,X2 (x1 , x2 ) = = 1.
pX1 (x1 ) x pX1 (x1 )
2
We call pX2 |X1 (x2 |x1 ) the conditional pmf of the discrete type of random variable
X2 , given that the discrete type of random variable X1 = x1 . In a similar manner,
provided x2 ∈ SX2 , we define the symbol pX1 |X2 (x1 |x2 ) by the relation
and we call pX1 |X2 (x1 |x2 ) the conditional pmf of the discrete type of random variable
X1 , given that the discrete type of random variable X2 = x2 . We often abbreviate
pX1 |X2 (x1 |x2 ) by p1|2 (x1 |x2 ) and pX2 |X1 (x2 |x1 ) by p2|1 (x2 |x1 ). Similarly, p1 (x1 )
and p2 (x2 ) are used to denote the respective marginal pmfs.
Now let X1 and X2 denote random variables of the continuous type and have the
joint pdf fX1 ,X2 (x1 , x2 ) and the marginal probability density functions fX1 (x1 ) and
fX2 (x2 ), respectively. We use the results of the preceding paragraph to motivate
a definition of a conditional pdf of a continuous type of random variable. When
fX1 (x1 ) > 0, we define the symbol fX2 |X1 (x2 |x1 ) by the relation
We often abbreviate these conditional pdfs by f1|2 (x1 |x2 ) and f2|1 (x2 |x1 ), respec-
tively. Similarly, f1 (x1 ) and f2 (x2 ) are used to denote the respective marginal pdfs.
Since each of f2|1 (x2 |x1 ) and f1|2 (x1 |x2 ) is a pdf of one random variable, each
has all the properties of such a pdf. Thus we can compute probabilities and math-
ematical expectations. If the random variables are of the continuous type, the
probability
b
P (a < X2 < b|X1 = x1 ) = f2|1 (x2 |x1 ) dx2
a
is called “the conditional probability that a < X2 < b, given that X1 = x1 .” If there
is no ambiguity, this may be written in the form P (a < X2 < b|x1 ). Similarly, the
conditional probability that c < X1 < d, given X2 = x2 , is
d
P (c < X1 < d|X2 = x2 ) = f1|2 (x1 |x2 ) dx1 .
c
Note that E[u(X2 )|x1 ] is a function of x1 . If they do exist, then E(X2 |x1 ) is the
mean and E{[X2 − E(X2 |x1 )]2 |x1 } is the conditional variance of the conditional
distribution of X2 , given X1 = x1 , which can be written more simply as Var(X2 |x1 ).
It is convenient to refer to these as the “conditional mean” and the “conditional
variance” of X2 , given X1 = x1 . Of course, we have
from an earlier result. In a like manner, the conditional expectation of u(X1 ), given
X2 = x2 , if it exists, is given by
∞
E[u(X1 )|x2 ] = u(x1 )f1|2 (x1 |x2 ) dx1 .
−∞
With random variables of the discrete type, these conditional probabilities and
conditional expectations are computed by using summation instead of integration.
An illustrative example follows.
Example 2.3.1. Let X1 and X2 have the joint pdf
"
2 0 < x1 < x2 < 1
f (x1 , x2 ) =
0 elsewhere.
and " x2
0
2 dx1 = 2x2 0 < x2 < 1
f2 (x2 ) =
0 elsewhere.
The conditional pdf of X1 , given X2 = x2 , 0 < x2 < 1, is
" 2 1
f1|2 (x1 |x2 ) = 2x2 = x2 0 < x1 < x2 < 1
0 elsewhere.
Here the conditional mean and the conditional variance of X1 , given X2 = x2 , are
respectively,
∞
E(X1 |x2 ) = x1 f1|2 (x1 |x2 ) dx1
−∞
x2
1
= x1 dx1
0 x2
x2
= , 0 < x2 < 1,
2
and
(
x2
x2 )2 1
Var(X1 |x2 ) = x1 − dx1
0 2 x2
x22
= , 0 < x2 < 1.
12
Finally, we compare the values of
We have
1/2 1/2
1 3
P (0 < X1 < 2 |X2 = 4) = f1|2 (x1 | 34 ) dx1 = ( 43 ) dx1 = 23 ,
0 0
but
1/2 1/2
P (0 < X1 < 12 ) = 0 f1 (x1 ) dx1 = 0 2(1 − x1 ) dx1 = 34 .
Since E(X2 |x1 ) is a function of x1 , then E(X2 |X1 ) is a random variable with its
own distribution, mean, and variance. Let us consider the following illustration of
this.
Example 2.3.2. Let X1 and X2 have the joint pdf
"
6x2 0 < x2 < x1 < 1
f (x1 , x2 ) =
0 elsewhere.
Now E(X2 |X1 ) = 2X1 /3 is a random variable, say Y . The cdf of Y = 2X1 /3 is
3y 2
G(y) = P (Y ≤ y) = P X1 ≤ , 0≤y< .
2 3
Of course, G(y) = 0 if y < 0, and G(y) = 1 if 23 < y. The pdf, mean, and variance
of Y = 2X1 /3 are
81y 2 2
g(y) = , 0≤y< ,
8 3
zero elsewhere,
2/3
81y 2 1
E(Y ) = y dy = ,
0 8 2
and
2/3
2 81y 2 1 1
Var(Y ) = y dy − = .
0 8 4 60
Since the marginal pdf of X2 is
1
f2 (x2 ) = 6x2 dx1 = 6x2 (1 − x2 ), 0 < x2 < 1,
x2
1 1
zero elsewhere, it is easy to show that E(X2 ) = 2 and Var(X2 ) = 20 . That is, here
and
Var(Y ) = Var[E(X2 |X1 )] ≤ Var(X2 ).
Theorem 2.3.1. Let (X1 , X2 ) be a random vector such that the variance of X2 is
finite. Then,
(a) E[E(X2 |X1 )] = E(X2 ).
(b) Var[E(X2 |X1 )] ≤ Var(X2 ).
Proof: The proof is for the continuous case. To obtain it for the discrete case,
exchange summations for integrals. We first prove (a). Note that
∞ ∞
E(X2 ) = x2 f (x1 , x2 ) dx2 dx1
−∞ −∞
∞ ∞
f (x1 , x2 )
= x2 dx2 f1 (x1 ) dx1
−∞ −∞ f1 (x1 )
∞
= E(X2 |x1 )f1 (x1 ) dx1
−∞
= E[E(X2 |X1 )],
Var(X2 ) = E[(X2 − μ2 )2 ]
= E{[X2 − E(X2 |X1 ) + E(X2 |X1 ) − μ2 ]2 }
= E{[X2 − E(X2 |X1 )]2 } + E{[E(X2 |X1 ) − μ2 ]2 }
+ 2E{[X2 − E(X2 |X1 )][E(X2 |X1 ) − μ2 ]}.
We show that the last term of the right-hand member of the immediately preceding
equation is zero. It is equal to
∞ ∞
2 [x2 − E(X2 |x1 )][E(X2 |x1 ) − μ2 ]f (x1 , x2 ) dx2 dx1
−∞ −∞
∞ " ∞ #
f (x1 , x2 )
=2 [E(X2 |x1 ) − μ2 ] [x2 − E(X2 |x1 )] dx2 f1 (x1 ) dx1 .
−∞ −∞ f1 (x1 )
But E(X2 |x1 ) is the conditional mean of X2 , given X1 = x1 . Since the expression
in the inner braces is equal to
The first term in the right-hand member of this equation is nonnegative because it
is the expected value of a nonnegative function, namely [X2 − E(X2 |X1 )]2 . Since
E[E(X2 |X1 )] = μ2 , the second term is Var[E(X2 |X1 )]. Hence we have
Intuitively, this result could have this useful interpretation. Both the random
variables X2 and E(X2 |X1 ) have the same mean μ2 . If we did not know μ2 , we
could use either of the two random variables to guess at the unknown μ2 . Since,
however, Var(X2 ) ≥ Var[E(X2 |X1 )], we would put more reliance in E(X2 |X1 ) as a
guess. That is, if we observe the pair (X1 , X2 ) to be (x1 , x2 ), we could prefer to use
E(X2 |x1 ) to x2 as a guess at the unknown μ2 . When studying the use of sufficient
statistics in estimation in Chapter 7, we make use of this famous result, attributed
to C. R. Rao and David Blackwell.
We finish this section with an example illustrating Theorem 2.3.1.
Example 2.3.3. Let X1 and X2 be discrete random variables. Suppose the condi-
tional pmf of X1 given X2 and the marginal distribution of X2 are given by
x2
x2 1
p(x1 |x2 ) = , x1 = 0, 1, . . . , x2
x1 2
x −1
2 1 2
p(x2 ) = , x2 = 1, 2, 3 . . . .
3 3
Let us determine the mgf of X1 . For fixed x2 , by the binomial theorem,
x2
x −x x
x2 tx1 1 2 1 1 1
E etX1 |x2 = e
x1 =0
x1 2 2
x2
1 1 t
= + e .
2 2
Hence, by the geometric series and Theorem 2.3.1,
E etX1 = E E etX1 |X2
∞ x x −1
1 1 t 22 1 2
= + e
x2 =1
2 2 3 3
∞ x −1
2 1 1 t 1 1 t 2
= + e + e
3 2 2 x2 =1
6 6
2 1 1 t 1
= + e ,
3 2 2 1 − [(1/6) + (1/6)et ]
EXERCISES
2.3.1. Let X1 and X2 have the joint pdf f (x1 , x2 ) = x1 + x2 , 0 < x1 < 1, 0 <
x2 < 1, zero elsewhere. Find the conditional mean and variance of X2 , given
X1 = x1 , 0 < x1 < 1.
116 Multivariate Distributions
2.3.2. Let f1|2 (x1 |x2 ) = c1 x1 /x22 , 0 < x1 < x2 , 0 < x2 < 1, zero elsewhere, and
f2 (x2 ) = c2 x42 , 0 < x2 < 1, zero elsewhere, denote, respectively, the conditional pdf
of X1 , given X2 = x2 , and the marginal pdf of X2 . Determine:
2.3.3. Let f (x1 , x2 ) = 21x21 x32 , 0 < x1 < x2 < 1, zero elsewhere, be the joint pdf
of X1 and X2 .
(a) Find the conditional mean and variance of X1 , given X2 = x2 , 0 < x2 < 1.
(c) Determine E(Y ) and Var(Y ) and compare these to E(X1 ) and Var(X1 ), re-
spectively.
2.3.4. Suppose X1 and X2 are random variables of the discrete type that have
the joint pmf p(x1 , x2 ) = (x1 + 2x2 )/18, (x1 , x2 ) = (1, 1), (1, 2), (2, 1), (2, 2), zero
elsewhere. Determine the conditional mean and variance of X2 , given X1 = x1 , for
x1 = 1 or 2. Also, compute E(3X1 − 2X2 ).
2.3.5. Let X1 and X2 be two random variables such that the conditional distribu-
tions and means exist. Show that:
(a) Compute the marginal pdf of X and the conditional pdf of Y , given X = x.
(b) For a fixed X = x, compute E(1 + x + Y |x) and use the result to compute
E(Y |x).
2.3.7. Suppose X1 and X2 are discrete random variables which have the joint pmf
p(x1 , x2 ) = (3x1 + x2 )/24, (x1 , x2 ) = (1, 1), (1, 2), (2, 1), (2, 2), zero elsewhere. Find
the conditional mean E(X2 |x1 ), when x1 = 1.
2.3.8. Let X and Y have the joint pdf f (x, y) = 2 exp{−(x + y)}, 0 < x < y < ∞,
zero elsewhere. Find the conditional mean E(Y |x) of Y , given X = x.
2.4. Independent Random Variables 117
2.3.9. Five cards are drawn at random and without replacement from an ordinary
deck of cards. Let X1 and X2 denote, respectively, the number of spades and the
number of hearts that appear in the five cards.
and p(x1 , x2 ) is equal to zero elsewhere. Find the two marginal probability mass
functions and the two conditional means.
Hint: Write the probabilities in a rectangular array.
2.3.11. Let us choose at random a point from the interval (0, 1) and let the random
variable X1 be equal to the number that corresponds to that point. Then choose
a point at random from the interval (0, x1 ), where x1 is the experimental value of
X1 ; and let the random variable X2 be equal to the number that corresponds to
this point.
(a) Make assumptions about the marginal pdf f1 (x1 ) and the conditional pdf
f2|1 (x2 |x1 ).
Suppose that we have an instance where f2|1 (x2 |x1 ) does not depend upon x1 . Then
the marginal pdf of X2 is, for random variables of the continuous type,
∞
f2 (x2 ) = f2|1 (x2 |x1 )f1 (x1 ) dx1
−∞
∞
= f2|1 (x2 |x1 ) f1 (x1 ) dx1
−∞
= f2|1 (x2 |x1 ).
Accordingly,
when f2|1 (x2 |x1 ) does not depend upon x1 . That is, if the conditional distribution
of X2 , given X1 = x1 , is independent of any assumption about x1 , then f (x1 , x2 ) =
f1 (x1 )f2 (x2 ).
The same discussion applies to the discrete case too, which we summarize in
parentheses in the following definition.
Definition 2.4.1 (Independence). Let the random variables X1 and X2 have the
joint pdf f (x1 , x2 ) [joint pmf p(x1 , x2 )] and the marginal pdfs [pmfs] f1 (x1 ) [p1 (x1 )]
and f2 (x2 ) [p2 (x2 )], respectively. The random variables X1 and X2 are said to be
independent if, and only if, f (x1 , x2 ) ≡ f1 (x1 )f2 (x2 ) [p(x1 , x2 ) ≡ p1 (x1 )p2 (x2 )].
Random variables that are not independent are said to be dependent.
Remark 2.4.1. Two comments should be made about the preceding definition.
First, the product of two positive functions f1 (x1 )f2 (x2 ) means a function that is
positive on the product space. That is, if f1 (x1 ) and f2 (x2 ) are positive on, and
only on, the respective spaces S1 and S2 , then the product of f1 (x1 ) and f2 (x2 )
is positive on, and only on, the product space S = {(x1 , x2 ) : x1 ∈ S1 , x2 ∈ S2 }.
For instance, if S1 = {x1 : 0 < x1 < 1} and S2 = {x2 : 0 < x2 < 3}, then
S = {(x1 , x2 ) : 0 < x1 < 1, 0 < x2 < 3}. The second remark pertains to the
identity. The identity in Definition 2.4.1 should be interpreted as follows. There
may be certain points (x1 , x2 ) ∈ S at which f (x1 , x2 ) = f1 (x1 )f2 (x2 ). However, if A
is the set of points (x1 , x2 ) at which the equality does not hold, then P (A) = 0. In
subsequent theorems and the subsequent generalizations, a product of nonnegative
functions and an identity should be interpreted in an analogous manner.
Example 2.4.1. Suppose an urn contains 10 blue, 8 red, and 7 yellow balls that
are the same except for color. Suppose 4 balls are drawn without replacement. Let
X and Y be the number of red and blue balls drawn, respectively. The joint pmf
of (X, Y ) is
108 7
x y 4−x−y
p(x, y) = 25 , 0 ≤ x, y ≤ 4; x + y ≤ 4.
4
2.4. Independent Random Variables 119
Since X + Y ≤ 4, it would seem that X and Y are dependent. To see that this is
true by definition, we first find the marginal pmf’s which are:
10 15
pX (x) = x
254−x
, 0 ≤ x ≤ 4;
8 417
pY (y) =
y
254−y
, 0 ≤ y ≤ 4.
4
To show dependence, we need to find only one point in the support of (X1 , X2 ) where
the joint pmf does not factor into the product of the marginal pmf’s. Suppose we
select the point x = 1 and y = 1. Then, using R for calculation, we compute (to 4
places):
7 25
p(1, 1) = 10 · 8 · / = 0.1328
2 4
15 25
pX (1) = 10 / = 0.3597
3 4
17 25
pY (1) = 8 / = 0.4300.
3 4
Since 0.1328 = 0.1547 = 0.3597 · 0.4300, X and Y are dependent random variables.
We show that X1 and X2 are dependent. Here the marginal probability density
functions are
" ∞ 1 1
f1 (x1 ) = −∞ f (x1 , x2 ) dx2 = 0 (x1 + x2 ) dx2 = x1 + 2 0 < x1 < 1
0 elsewhere,
and
" ∞ 1 1
−∞ f (x1 , x2 ) dx1 = 0 (x1 + x2 ) dx1 = 2 + x2 0 < x2 < 1
f2 (x2 ) =
0 elsewhere.
Since f (x1 , x2 ) ≡ f1 (x1 )f2 (x2 ), the random variables X1 and X2 are dependent.
The following theorem makes it possible to assert that the random variables X1
and X2 of Example 2.4.2 are dependent, without computing the marginal probability
density functions.
Theorem 2.4.1. Let the random variables X1 and X2 have supports S1 and S2 ,
respectively, and have the joint pdf f (x1 , x2 ). Then X1 and X2 are independent if
120 Multivariate Distributions
where g(x1 ) > 0, x1 ∈ S1 , zero elsewhere, and h(x2 ) > 0, x2 ∈ S2 , zero elsewhere.
Proof. If X1 and X2 are independent, then f (x1 , x2 ) ≡ f1 (x1 )f2 (x2 ), where f1 (x1 )
and f2 (x2 ) are the marginal probability density functions of X1 and X2 , respectively.
Thus the condition f (x1 , x2 ) ≡ g(x1 )h(x2 ) is fulfilled.
Conversely, if f (x1 , x2 ) ≡ g(x1 )h(x2 ), then, for random variables of the contin-
uous type, we have
∞ ∞
f1 (x1 ) = g(x1 )h(x2 ) dx2 = g(x1 ) h(x2 ) dx2 = c1 g(x1 )
−∞ −∞
and ∞ ∞
f2 (x2 ) = g(x1 )h(x2 ) dx1 = h(x2 ) g(x1 ) dx1 = c2 h(x2 ),
−∞ −∞
where c1 and c2 are constants, not functions of x1 or x2 . Moreover, c1 c2 = 1 because
∞ ∞ ∞ ∞
1= g(x1 )h(x2 ) dx1 dx2 = g(x1 ) dx1 h(x2 ) dx2 = c2 c1 .
−∞ −∞ −∞ −∞
This theorem is true for the discrete case also. Simply replace the joint pdf by
the joint pmf. For instance, the discrete random variables X and Y of Example
2.4.1 are immediately seen to be dependent because the support of (X, Y ) is not a
product space.
Next, consider the joint distribution of the continuous random vector (X, Y )
given in Example 2.1.3. The joint pdf is
2 2
f (x, y) = 4xe−x ye−y , x > 0, y > 0.
Instead of working with pdfs (or pmfs) we could have presented independence
in terms of cumulative distribution functions. The following theorem shows the
equivalence.
Theorem 2.4.2. Let (X1 , X2 ) have the joint cdf F (x1 , x2 ) and let X1 and X2 have
the marginal cdfs F1 (x1 ) and F2 (x2 ), respectively. Then X1 and X2 are independent
if and only if
Proof: We give the proof for the continuous case. Suppose expression (2.4.1) holds.
Then the mixed second partial is
∂2
F (x1 , x2 ) = f1 (x1 )f2 (x2 ).
∂x1 ∂x2
Hence, X1 and X2 are independent. Conversely, suppose X1 and X2 are indepen-
dent. Then by the definition of the joint cdf,
x1 x2
F (x1 , x2 ) = f1 (w1 )f2 (w2 ) dw2 dw1
−∞ −∞
x1 x2
= f1 (w1 ) dw1 · f2 (w2 ) dw2 = F1 (x1 )F2 (x2 ).
−∞ −∞
Theorem 2.4.3. The random variables X1 and X2 are independent random vari-
ables if and only if the following condition holds,
Proof: If X1 and X2 are independent, then an application of the last theorem and
expression (2.1.2) shows that
which is the right side of expression (2.4.2). Conversely, condition (2.4.2) implies
that the joint cdf of (X1 , X2 ) factors into a product of the marginal cdfs, which in
turn by Theorem 2.4.2 implies that X1 and X2 are independent.
122 Multivariate Distributions
whereas
1/2
P (0 < X1 < 12 ) = 0
(x1 + 12 ) dx1 = 3
8
and
1/2 1
P (0 < X2 < 12 ) = 0 (2 + x1 ) dx2 = 38 .
Hence, condition (2.4.2) does not hold.
Not merely are calculations of some probabilities usually simpler when we have
independent random variables, but many expectations, including certain moment
generating functions, have comparably simpler computations. The following result
proves so useful that we state it in the form of a theorem.
Theorem 2.4.4. Suppose X1 and X2 are independent and that E(u(X1 )) and
E(v(X2 )) exist. Then
Proof. We give the proof in the continuous case. The independence of X1 and X2
implies that the joint pdf of X1 and X2 is f1 (x1 )f2 (x2 ). Thus we have, by definition
of expectation,
∞ ∞
E[u(X1 )v(X2 )] = u(x1 )v(x2 )f1 (x1 )f2 (x2 ) dx1 dx2
−∞ −∞
∞ ∞
= u(x1 )f1 (x1 ) dx1 v(x2 )f2 (x2 ) dx2
−∞ −∞
= E[u(X1 )]E[v(X2 )].
Upon taking the functions u(·) and v(·) to be the identity functions in Theorem
2.4.4, we have that for independent random variables X1 and X2 ,
We next prove a very useful theorem about independent random variables. The
proof of the theorem relies heavily upon our assertion that an mgf, when it exists,
is unique and that it uniquely determines the distribution of probability.
Theorem 2.4.5. Suppose the joint mgf, M (t1 , t2 ), exists for the random variables
X1 and X2 . Then X1 and X2 are independent if and only if
that is, the joint mgf is identically equal to the product of the marginal mgfs.
2.4. Independent Random Variables 123
Thus the independence of X1 and X2 implies that the mgf of the joint distribution
factors into the product of the moment-generating functions of the two marginal
distributions.
Suppose next that the mgf of the joint distribution of X1 and X2 is given by
M (t1 , t2 ) = M (t1 , 0)M (0, t2 ). Now X1 has the unique mgf, which, in the continuous
case, is given by ∞
M (t1 , 0) = et1 x1 f1 (x1 ) dx1 .
−∞
Thus we have
∞ ∞
t1 x1
M (t1 , 0)M (0, t2 ) = e f1 (x1 ) dx1 et2 x2 f2 (x2 ) dx2
−∞ −∞
∞ ∞
= et1 x1 +t2 x2 f1 (x1 )f2 (x2 ) dx1 dx2 .
−∞ −∞
The uniqueness of the mgf implies that the two distributions of probability that are
described by f1 (x1 )f2 (x2 ) and f (x1 , x2 ) are the same. Thus
That is, if M (t1 , t2 ) = M (t1 , 0)M (0, t2 ), then X1 and X2 are independent. This
completes the proof when the random variables are of the continuous type. With
random variables of the discrete type, the proof is made by using summation instead
of integration.
124 Multivariate Distributions
provided that t1 + t2 < 1 and t2 < 1. Because M (t1 , t2 ) = M (t1 , 0)M (t1 , 0), the
random variables are dependent.
Example 2.4.6 (Exercise 2.1.15, Continued). For the random variable X1 and X2
defined in Exercise 2.1.15, we showed that the joint mgf is
exp{t1 } exp{t2 }
M (t1 , t2 ) = , ti < log 2 , i = 1, 2.
2 − exp{t1 } 2 − exp{t2 }
We showed further that M (t1 , t2 ) = M (t1 , 0)M (0, t2 ). Hence, X1 and X2 are inde-
pendent random variables.
EXERCISES
2.4.1. Show that the random variables X1 and X2 with joint pdf
"
12x1 x2 (1 − x2 ) 0 < x1 < 1, 0 < x2 < 1
f (x1 , x2 ) =
0 elsewhere
are independent.
2.4.2. If the random variables X1 and X2 have the joint pdf f (x1 , x2 ) = 2e−x1 −x2 , 0 <
x1 < x2 , 0 < x2 < ∞, zero elsewhere, show that X1 and X2 are dependent.
1
2.4.3. Let p(x1 , x2 ) = 16 , x1 = 1, 2, 3, 4, and x2 = 1, 2, 3, 4, zero elsewhere, be the
joint pmf of X1 and X2 . Show that X1 and X2 are independent.
2.4.4. Find P (0 < X1 < 13 , 0 < X2 < 13 ) if the random variables X1 and X2 have
the joint pdf f (x1 , x2 ) = 4x1 (1 − x2 ), 0 < x1 < 1, 0 < x2 < 1, zero elsewhere.
2.4.5. Find the probability of the union of the events a < X1 < b, −∞ < X2 < ∞,
and −∞ < X1 < ∞, c < X2 < d if X1 and X2 are two independent variables with
P (a < X1 < b) = 32 and P (c < X2 < d) = 58 .
2.5. The Correlation Coefficient 125
2.4.6. If f (x1 , x2 ) = e−x1 −x2 , 0 < x1 < ∞, 0 < x2 < ∞, zero elsewhere, is the
joint pdf of the random variables X1 and X2 , show that X1 and X2 are independent
and that M (t1 , t2 ) = (1 − t1 )−1 (1 − t2 )−1 , t2 < 1, t1 < 1. Also show that
E(et(X1 +X2 ) ) = (1 − t)−2 , t < 1.
Accordingly, find the mean and the variance of Y = X1 + X2 .
2.4.7. Let the random variables X1 and X2 have the joint pdf f (x1 , x2 ) = 1/π, for
(x1 − 1)2 + (x2 + 2)2 < 1, zero elsewhere. Find f1 (x1 ) and f2 (x2 ). Are X1 and X2
independent?
2.4.8. Let X and Y have the joint pdf f (x, y) = 3x, 0 < y < x < 1, zero elsewhere.
Are X and Y independent? If not, find E(X|y).
2.4.9. Suppose that a man leaves for work between 8:00 a.m. and 8:30 a.m. and
takes between 40 and 50 minutes to get to the office. Let X denote the time of
departure and let Y denote the time of travel. If we assume that these random
variables are independent and uniformly distributed, find the probability that he
arrives at the office before 9:00 a.m.
2.4.10. Let X and Y be random variables with the space consisting of the four
points (0, 0), (1, 1), (1, 0), (1, −1). Assign positive probabilities to these four points
so that the correlation coefficient is equal to zero. Are X and Y independent?
2.4.11. Two line segments, each of length two units, are placed along the x-axis.
The midpoint of the first is between x = 0 and x = 14 and that of the second is
between x = 6 and x = 20. Assuming independence and uniform distributions for
these midpoints, find the probability that the line segments overlap.
2.4.12. Cast a fair die and let X = 0 if 1, 2, or 3 spots appear, let X = 1 if 4 or 5
spots appear, and let X = 2 if 6 spots appear. Do this two independent times,
obtaining X1 and X2 . Calculate P (|X1 − X2 | = 1).
2.4.13. For X1 and X2 in Example 2.4.6, show that the mgf of Y = X1 + X2 is
e2t /(2 − et )2 , t < log 2, and then compute the mean and variance of Y .
cov(X, Y ) = E(XY − μ2 X − μ1 Y + μ1 μ2 )
= E(XY ) − μ2 E(X) − μ1 E(Y ) + μ1 μ2
= E(XY ) − μ1 μ2 , (2.5.2)
It should be noted that the expected value of the product of two random variables
is equal to the product of their expectations plus their covariance; that is, E(XY ) =
μ1 μ2 + cov(X, Y ) = μ1 μ2 + ρσ1 σ2 .
As illustrations, we present two examples. The first is for a discrete model while
the second concerns a continuous model.
Example 2.5.1. Reconsider the random vector (X1 , X2 ) of Example 2.1.1 where a
fair coin is flipped three times and X1 is the number of heads on the first two flips
while X2 is the number of heads on all three flips. Recall that Table 2.1.1 contains
the marginal distributions of X1 and X2 . By symmetry of these pmfs, we have
E(X1 ) = 1 and E(X2 ) = 3/2. To compute the correlation coefficient of (X1 , X2 ),
we next sketch the computation of the required moments:
1 1 3 3 1
E(X12 ) = + 22 · = ⇒ σ12 = − 12 = ;
2 4 2 2 2
2
3 3 1 3 1
E(X22 ) = + 4 · + 9 · = 3 ⇒ σ22 = 3 − 12 = ;
8 8 8 2 2
2 2 1 1 3 1
E(X1 X2 ) = + 1 · 2 · + 2 · 2 · + 2 · 3 · = 2 ⇒ cov(X1 , X2 ) = 2 − 1 · =
8 8 8 8 2 2
From which it follows that ρ = (1/2)/( (1/2) 3/4) = 0.816.
Example 2.5.2. Let the random variables X and Y have the joint pdf
"
x + y 0 < x < 1, 0 < y < 1
f (x, y) =
0 elsewhere.
and 2
1 1
7 11
σ12 2
= E(X ) − μ21 = 2
x (x + y) dxdy − = .
0 0 12 144
Similarly,
7 11
μ2 = E(Y ) = and σ22 = E(Y 2 ) − μ22 = .
12 144
The covariance of X and Y is
1 1 2
7 1
E(XY ) − μ1 μ2 = xy(x + y) dxdy − =− .
0 0 12 144
Accordingly, the correlation coefficient of X and Y is
1
− 144 1
ρ= + =− .
11
( 144 11
)( 144 ) 11
Then h(v) ≥ 0, for all v. Hence, the discriminant of h(v) is less than or equal to 0.
To obtain the discriminant, we expand h(v) as
h(v) = σ12 + 2vρσ1 σ2 + v 2 σ22 .
Hence, the discriminant of h(v) is 4ρ2 σ12 σ22 − 4σ22 σ12 . Since this is less than or equal
to 0, we have
4ρ2 σ12 σ22 ≤ 4σ22 σ12 or ρ2 ≤ 1,
which is the result sought.
Theorem 2.5.2. If X and Y are independent random variables then cov(X, Y ) = 0
and, hence, ρ = 0.
Proof: Because X and Y are independent, it follows from expression (2.4.3) that
E(XY ) = E(X)E(Y ). Hence, by (2.5.2) the covariance of X and Y is 0; i.e., ρ = 0.
As the following example shows, the converse of this theorem is not true:
Example 2.5.3. Let X and Y be jointly discrete random variables whose distri-
bution has mass 1/4 at each of the four points (−1, 0), (0, −1), (1, 0) and (0, 1). It
follows that both X and Y have the same marginal distribution with range {−1, 0, 1}
and respective probabilities 1/4, 1/2, and 1/4. Hence, μ1 = μ2 = 0 and a quick cal-
culation shows that E(XY ) = 0. Thus, ρ = 0. However, P (X = 0, Y = 0) = 0
while P (X = 0)P (Y = 0) = (1/2)(1/2) = 1/4. Thus, X and Y are dependent but
the correlation coefficient of X and Y is 0.
128 Multivariate Distributions
Although the converse of Theorem 2.5.2 is not true, the contrapositive is; i.e.,
if ρ = 0 then X and Y are dependent. For instance, in Example 2.5.1, since
ρ = 0.816, we know that the random variables X1 and X2 discussed in this example
are dependent. As discussed in Section 10.8, this contrapositive is often used in
Statistics.
Exercise 2.5.7 points out that in the proof of Theorem 2.5.1, the discriminant
of the polynomial h(v) is 0 if and only if ρ = ±1. In that case X and Y are linear
functions of one another with probability one; although, as shown, the relationship is
degenerate. This suggests the following interesting question: When ρ does not have
one of its extreme values, is there a line in the xy-plane such that the probability
for X and Y tends to be concentrated in a band about this line? Under certain
restrictive conditions this is, in fact, the case, and under those conditions we can
look upon ρ as a measure of the intensity of the concentration of the probability for
X and Y about that line.
We summarize these thoughts in the next theorem. For notation, let f (x, y)
denote the joint pdf of two random variables X and Y and let f1 (x) denote the
marginal pdf of X. Recall from Section 2.3 that the conditional pdf of Y , given
X = x, is
f (x, y)
f2|1 (y|x) =
f1 (x)
at points where f1 (x) > 0, and the conditional mean of Y , given X = x, is given by
∞
∞ yf (x, y) dy
−∞
E(Y |x) = yf2|1 (y|x) dy = ,
−∞ f1 (x)
when dealing with random variables of the continuous type. This conditional mean
of Y , given X = x, is, of course, a function of x, say u(x). In a like vein, the
conditional mean of X, given Y = y, is a function of y, say v(y).
In case u(x) is a linear function of x, say u(x) = a + bx, we say the conditional
mean of Y is linear in x; or that Y has a linear conditional mean. When u(x) =
a + bx, the constants a and b have simple values which we show in the following
theorem.
Theorem 2.5.3. Suppose (X, Y ) have a joint distribution with the variances of X
and Y finite and positive. Denote the means and variances of X and Y by μ1 , μ2
and σ12 , σ22 , respectively, and let ρ be the correlation coefficient between X and Y . If
E(Y |X) is linear in X then
σ2
E(Y |X) = μ2 + ρ (X − μ1 ) (2.5.4)
σ1
and
E(Var(Y |X)) = σ22 (1 − ρ2 ). (2.5.5)
Proof: The proof is given in the continuous case. The discrete case follows similarly
2.5. The Correlation Coefficient 129
Note that if the variance, Equation (2.5.9), is denoted by k(x), then E[k(X)] =
σ22 (1 − ρ2 ) ≥ 0. Accordingly, ρ2 ≤ 1, or −1 ≤ ρ ≤ 1. This verifies Theorem 2.5.1
for the special case of linear conditional means.
As a corollary to Theorem 2.5.3, suppose that the variance, Equation (2.5.9), is
positive but not a function of x; that is, the variance is a constant k > 0. Now if k
is multiplied by f1 (x) and integrated on x, the result is k, so that k = σ22 (1 − ρ2 ).
Thus, in this case, the variance of each conditional distribution of Y , given X = x, is
σ22 (1 − ρ2 ). If ρ = 0, the variance of each conditional distribution of Y , given X = x,
is σ22 , the variance of the marginal distribution of Y . On the other hand, if ρ2 is near
1, the variance of each conditional distribution of Y , given X = x, is relatively small,
and there is a high concentration of the probability for this conditional distribution
near the mean E(Y |x) = μ2 + ρ(σ2 /σ1 )(x − μ1 ). Similar comments can be made
about E(X|y) if it is linear. In particular, E(X|y) = μ1 + ρ(σ1 /σ2 )(y − μ2 ) and
E[Var(X|Y )] = σ12 (1 − ρ2 ).
Example 2.5.4. Let the random variables X and Y have the linear conditional
1
means E(Y |x) = 4x + 3 and E(X|y) = 16 y − 3. In accordance with the general
formulas for the linear conditional means, we see that E(Y |x) = μ2 if x = μ1 and
E(X|y) = μ1 if y = μ2 . Accordingly, in this special case, we have μ2 = 4μ1 + 3
1
and μ1 = 16 μ2 − 3 so that μ1 = − 15 4 and μ2 = −12. The general formulas for the
linear conditional means also show that the product of the coefficients of x and y,
respectively, is equal to ρ2 and that the quotient of these coefficients is equal to
1
σ22 /σ12 . Here ρ2 = 4( 16 ) = 14 with ρ = 12 (not − 21 ), and σ22 /σ12 = 64. Thus, from the
two linear conditional means, we are able to find the values of μ1 , μ2 , ρ, and σ2 /σ1 ,
but not the values of σ1 and σ2 .
E(Y|x) = bx
a
x
–h (0, 0) h
–a
Example 2.5.5. To illustrate how the correlation coefficient measures the intensity
of the concentration of the probability for X and Y about a line, let these random
variables have a distribution that is uniform over the area depicted in Figure 2.5.1.
That is, the joint pdf of X and Y is
" 1
4ah −a + bx < y < a + bx, −h < x < h
f (x, y) =
0 elsewhere.
We assume here that b ≥ 0, but the argument can be modified for b ≤ 0. It is easy
to show that the pdf of X is uniform, namely
$
a+bx 1 1
−a+bx 4ah dy = 2h −h < x < h
f1 (x) =
0 elsewhere.
so that
∞ ∞
∂ k+m M (t1 , t2 )
= xk y m f (x, y) dxdy = E(X k Y m ).
∂tk1 ∂tm
2 t1 =t2 =0 −∞ −∞
132 Multivariate Distributions
∂M (0, 0)
μ1 = E(X) =
∂t1
∂M (0, 0)
μ2 = E(Y ) =
∂t2
∂ 2 M (0, 0)
σ12 = E(X 2 ) − μ21 = − μ21
∂t21
∂ 2 M (0, 0)
σ22 = E(Y 2 ) − μ22 = − μ22
∂t22
∂ 2 M (0, 0)
E[(X − μ1 )(Y − μ2 )] = − μ1 μ2 , (2.5.10)
∂t1 ∂t2
and from these we can compute the correlation coefficient ρ.
It is fairly obvious that the results of equations (2.5.10) hold if X and Y are
random variables of the discrete type. Thus the correlation coefficients may be com-
puted by using the mgf of the joint distribution if that function is readily available.
An illustrative example follows.
Example 2.5.6 (Example 2.1.10, Continued). In Example 2.1.10, we considered
the joint density " −y
e 0<x<y<∞
f (x, y) =
0 elsewhere,
and showed that the mgf was
1
M (t1 , t2 ) = ,
(1 − t1 − t2 )(1 − t2 )
for t1 + t2 < 1 and t2 < 1. For this distribution, equations (2.5.10) become
μ1 = 1, μ2 = 2
σ12 = 1, σ22 = 2 (2.5.11)
E[(X − μ1 )(Y − μ2 )] = 1.
EXERCISES
2.5.1. Let the random variables X and Y have the joint pmf
(a) p(x, y) = 31 , (x, y) = (0, 0), (1, 1), (2, 2), zero elsewhere.
(b) p(x, y) = 13 , (x, y) = (0, 2), (1, 1), (2, 0), zero elsewhere.
(c) p(x, y) = 13 , (x, y) = (0, 0), (1, 1), (2, 0), zero elsewhere.
2.5. The Correlation Coefficient 133
(b) Compute E(Y |X = 1), E(Y |X = 2), and the line μ2 + ρ(σ2 /σ1 )(x − μ1 ). Do
the points [k, E(Y |X = k)], k = 1, 2, lie on this line?
2.5.3. Let f (x, y) = 2, 0 < x < y, 0 < y < 1, zero elsewhere, be the joint pdf of
X and Y . Show that the conditional means are, respectively, (1 + x)/2, 0 < x < 1,
and y/2, 0 < y < 1. Show that the correlation coefficient of X and Y is ρ = 21 .
2.5.9. Let X and Y have the joint pmf p(x, y) = 17 , (0, 0), (1, 0), (0, 1), (1, 1), (2, 1),
(1, 2), (2, 2), zero elsewhere. Find the correlation coefficient ρ.
2.5.10. Let X1 and X2 have the joint pmf described by the following table:
134 Multivariate Distributions
or as x1 x2 xn
FX (x) = ··· f (w1 , . . . , wn ) dwn · · · dw1 .
−∞ −∞ −∞
and (b) its integral over all real values of its argument(s) is 1. Likewise, a point
function p essentially satisfies the conditions of being a joint pmf if (a) p is defined
and is nonnegative for all real values of its argument(s) and (b) its sum over all real
values of its argument(s) is 1. As in previous sections, it is sometimes convenient
to speak of the support set of a random vector. For the discrete case, this would be
all points in D that have positive mass, while for the continuous case these would
be all points in D that can be embedded in an open set of positive probability. We
use S to denote support sets.
Example 2.6.1. Let
"
e−(x+y+z) 0 < x, y, z < ∞
f (x, y, z) =
0 elsewhere
be the pdf of the random variables X, Y , and Z. Then the distribution function of
X, Y , and Z is given by
F (x, y, z) = P (X ≤ x, Y ≤ y, Z ≤ z)
z y x
= e−u−v−w dudvdw
0 0 0
= (1 − e−x )(1 − e−y )(1 − e−z ), 0 ≤ x, y, z < ∞,
and is equal to zero elsewhere. The relationship (2.6.2) can easily be verified.
Let (X1 , X2 , . . . , Xn ) be a random vector and let Y = u(X1 , X2 , . . . , Xn ) for
some function u. As in the bivariate case, the expected value of the random variable
exists if the n-fold integral
∞ ∞
··· |u(x1 , x2 , . . . , xn )|f (x1 , x2 , . . . , xn ) dx1 dx2 · · · dxn
−∞ −∞
exists when the random variables are of the continuous type, or if the n-fold sum
··· |u(x1 , x2 , . . . , xn )|p(x1 , x2 , . . . , xn )
xn x1
exists when the random variables are of the discrete type. If the expected value of
Y exists, then its expectation is given by
∞ ∞
E(Y ) = ··· u(x1 , x2 , . . . , xn )f (x1 , x2 , . . . , xn ) dx1 dx2 · · · dxn (2.6.3)
−∞ −∞
for the discrete case. The properties of expectation discussed in Section 2.1 hold
for the n-dimensional case also. In particular, E is a linear operator. That is, if
136 Multivariate Distributions
Therefore, f1 (x1 ) is the pdf of the random variable X1 and f1 (x1 ) is called the
marginal pdf of X1 . The marginal probability density functions f2 (x2 ), . . . , fn (xn )
of X2 , . . . , Xn , respectively, are similar (n − 1)-fold integrals.
Up to this point, each marginal pdf has been a pdf of one random variable.
It is convenient to extend this terminology to joint probability density functions,
which we do now. Let f (x1 , x2 , . . . , xn ) be the joint pdf of the n random variables
X1 , X2 , . . . , Xn , just as before. Now, however, take any group of k < n of these
random variables and find the joint pdf of them. This joint pdf is called the marginal
pdf of this particular group of k variables. To fix the ideas, take n = 6, k = 3, and
let us select the group X2 , X4 , X5 . Then the marginal pdf of X2 , X4 , X5 is the joint
pdf of this particular group of three variables, namely,
∞ ∞ ∞
f (x1 , x2 , x3 , x4 , x5 , x6 ) dx1 dx3 dx6 ,
−∞ −∞ −∞
f (x1 , x2 , . . . , xn )
f2,...,n|1 (x2 , . . . , xn |x1 ) = ,
f1 (x1 )
and f2,...,n|1 (x2 , . . . , xn |x1 ) is called the joint conditional pdf of X2 , . . . , Xn ,
given X1 = x1 . The joint conditional pdf of any n − 1 random variables, say
X1 , . . . , Xi−1 , Xi+1 , . . . , Xn , given Xi = xi , is defined as the joint pdf of X1 , . . . , Xn
divided by the marginal pdf fi (xi ), provided that fi (xi ) > 0. More generally, the
joint conditional pdf of n−k of the random variables, for given values of the remain-
ing k variables, is defined as the joint pdf of the n variables divided by the marginal
2.6. Extension to Several Random Variables 137
pdf of the particular group of k variables, provided that the latter pdf is positive.
We remark that there are many other conditional probability density functions; for
instance, see Exercise 2.3.12.
Because a conditional pdf is the pdf of a certain number of random variables,
the expectation of a function of these random variables has been defined. To em-
phasize the fact that a conditional pdf is under consideration, such expectations
are called conditional expectations. For instance, the conditional expectation of
u(X2 , . . . , Xn ), given X1 = x1 , is, for random variables of the continuous type,
given by
∞ ∞
E[u(X2 , . . . , Xn )|x1 ] = ··· u(x2 , . . . , xn )f2,...,n|1 (x2 , . . . , xn |x1 ) dx2 · · · dxn
−∞ −∞
provided f1 (x1 ) > 0 and the integral converges (absolutely). A useful random
variable is given by h(X1 ) = E[u(X2 , . . . , Xn )|X1 )].
The above discussion of marginal and conditional distributions generalizes to
random variables of the discrete type by using pmfs and summations instead of
integrals.
Let the random variables X1 , X2 , . . . , Xn have the joint pdf f (x1 , x2 , . . . , xn ) and
the marginal probability density functions f1 (x1 ), f2 (x2 ), . . . , fn (xn ), respectively.
The definition of the independence of X1 and X2 is generalized to the mutual
independence of X1 , X2 , . . . , Xn as follows: The random variables X1 , X2 , . . . , Xn
are said to be mutually independent if and only if
for the continuous case. In the discrete case, X1 , X2 , . . . , Xn are said to be mutually
independent if and only if
Proof. Assume t is in the interval (− mini {hi }, mini {hi }). Then, by independence,
n
Pn
tki Xi (tki )Xi
MT (t) = E e i=1 =E e
i=1
n n
= E etki Xi = Mi (ki t),
i=1 i=1
P (Y ≤ 12 ) = P (X1 ≤ 12 , X2 ≤ 12 , X3 ≤ 12 )
1/2 1/2 1/2
= 8x1 x2 x3 dx1 dx2 dx3
0 0 0
= ( 12 )6 = 1
64 .
Obviously, if i = j, we have
The following is a useful corollary to Theorem 2.6.1 for iid random variables. Its
proof is asked for in Exercise 2.6.7.
Corollary 2.6.1. Suppose X1 , X2 , . . . , Xn are iid random variables with the com-
n
mon mgf M (t), for −h < t < h, where h > 0. Let T = i=1 Xi . Then T has the
mgf given by
MT (t) = [M (t)]n , −h < t < h. (2.6.9)
∗
2.6.1 Multivariate Variance-Covariance Matrix
This section makes explicit use of matrix algebra and it is considered as an optional
section.
In Section 2.5 we discussed the covariance between two random variables. In
this section we want to extend this discussion to the n-variate case. Let X =
(X1 , . . . , Xn ) be an n-dimensional random vector. Recall that we defined E(X) =
(E(X1 ), . . . , E(Xn )) , that is, the expectation of a random vector is just the vector
of the expectations of its components. Now suppose W is an m × n matrix of
random variables, say, W = [Wij ] for the random variables Wij , 1 ≤ i ≤ m and
1 ≤ j ≤ n. Note that we can always string out the matrix into an mn × 1 random
vector. Hence, we define the expectation of a random matrix
As the following theorem shows, the linearity of the expectation operator easily
follows from this definition:
Then
Proof: Because of the linearity of the operator E on random variables, we have for
the (i, j)th components of expression (2.6.11) that
m
m
m
m
E a1is W1sj + a2is W2sj = a1is E[W1sj ] + a2is E[W2sj ].
s=1 s=1 s=1 s=1
μ1 = 1, μ2 = 2
σ12 = 1, σ22 = 2 (2.6.14)
E[(X − μ1 )(Y − μ2 )] = 1.
1 1 1
E[Z] = and Cov(Z) = .
2 1 2
EXERCISES
2.6.1. Let X, Y, Z have joint pdf f (x, y, z) = 2(x + y + z)/3, 0 < x < 1, 0 < y <
1, 0 < z < 1, zero elsewhere.
(a) Find the marginal probability density functions of X, Y, and Z.
(b) Compute P (0 < X < 12 , 0 < Y < 12 , 0 < Z < 12 ) and P (0 < X < 12 ) = P (0 <
Y < 12 ) = P (0 < Z < 12 ).
(c) Are X, Y , and Z independent?
(d) Calculate E(X 2 Y Z + 3XY 4 Z 2 ).
(e) Determine the cdf of X, Y, and Z.
(f ) Find the conditional distribution of X and Y , given Z = z, and evaluate
E(X + Y |z).
(g) Determine the conditional distribution of X, given Y = y and Z = z, and
compute E(X|y, z).
2.6.2. Let f (x1 , x2 , x3 ) = exp[−(x1 + x2 + x3 )], 0 < x1 < ∞, 0 < x2 < ∞, 0 <
x3 < ∞, zero elsewhere, be the joint pdf of X1 , X2 , X3 .
(a) Compute P (X1 < X2 < X3 ) and P (X1 = X2 < X3 ).
(b) Determine the joint mgf of X1 , X2 , and X3 . Are these random variables
independent?
2.6.3. Let X1 , X2 , X3 , and X4 be four independent random variables, each with
pdf f (x) = 3(1 − x)2 , 0 < x < 1, zero elsewhere. If Y is the minimum of these four
variables, find the cdf and the pdf of Y .
Hint: P (Y > y) = P (Xi > y , i = 1, . . . , 4).
2.7. Transformations for Several Random Variables 143
2.6.4. A fair die is cast at random three independent times. Let the random variable
Xi be equal to the number of spots that appear on the ith trial, i = 1, 2, 3. Let the
random variable Y be equal to max(Xi ). Find the cdf and the pmf of Y .
Hint: P (Y ≤ y) = P (Xi ≤ y, i = 1, 2, 3).
2.6.5. Let M (t1 , t2 , t3 ) be the mgf of the random variables X1 , X2 , and X3 of
Bernstein’s example, described in the remark following Example 2.6.2. Show that
M (t1 , t2 , 0) = M (t1 , 0, 0)M (0, t2 , 0), M (t1 , 0, t3 ) = M (t1 , 0, 0)M (0, 0, t3),
and
M (0, t2 , t3 ) = M (0, t2 , 0)M (0, 0, t3)
are true, but that
Whenever the conditions of this theorem are satisfied, we can determine the joint pdf
of n functions of n random variables. Appropriate changes of notation in Section
2.2 (to indicate n-space as opposed to 2-space) are all that are needed to show
that the joint pdf of the random variables Y1 = u1 (X1 , X2 , . . . , Xn ), . . . , Yn =
un (X1 , X2 , . . . , Xn ), where the joint pdf of X1 , . . . , Xn is f (x1 , . . . , xn ), is given by
Because g(y1 , y2 , y3 ) = g1 (y1 )g2 (y2 )g3 (y3 ), the random variables Y1 , Y2 , Y3 are mu-
tually independent.
Example 2.7.2. Let X1 , X2 , X3 be iid with common pdf
" −x
e 0<x<∞
f (x) =
0 elsewhere.
x1 = y1 y3 , x2 = y2 y3 , and x3 = y3 − y1 y3 − y2 y3 ,
f (0) = 0? Then our new S is S = {−∞ < x < ∞ but x = 0}. We then take
A1 = {x : −∞ < x < 0} and A2 = {x : 0 < x < ∞}. Thus y = x2 , with the
√
inverse x = − y, maps A1 onto T = {y : 0 < y < ∞} and the transformation is
√
one-to-one. Moreover, the transformation y = x2 , with inverse x = y, maps A2
onto T = {y : 0 < y < ∞} and the transformation is one-to-one. Consider the
√
probability P (Y ∈ B), where B ⊂ T . Let A3 = {x : x = − y, y ∈ B} ⊂ A1 and
√
let A4 = {x : x = y, y ∈ B} ⊂ A2 . Then Y ∈ B when and only when X ∈ A3 or
X ∈ A4 . Thus we have
P (Y ∈ B) = P (X ∈ A3 ) + P (X ∈ A4 )
= f (x) dx + f (x) dx.
A3 A4
√ √
In the first of these integrals, let x = − y. Thus the Jacobian, say J1 , is −1/2 y;
√
furthermore, the set A3 is mapped onto B. In the second integral let x = y. Thus
√
the Jacobian, say J2 , is 1/2 y; furthermore, the set A4 is also mapped onto B.
Finally,
√ 1 √ 1
P (Y ∈ B) = f (− y) − √ dy + f ( y) √ dy
B 2 y B 2 y
√ √ 1
= [f (− y) + f ( y)] √ dy.
B 2 y
1 √ √
g(y) = √ [f (− y) + f ( y)], y∈T.
2 y
may not be one-to-one. Suppose, however, that we can represent S as the union of
a finite number, say k, of mutually disjoint sets A1 , A2 , . . . , Ak so that
y1 = u1 (x1 , x2 , . . . , xn ), . . . , yn = un (x1 , x2 , . . . , xn )
denote the k groups of n inverse functions, one group for each of these k transfor-
mations. Let the first partial derivatives be continuous and let each
∂w
1i ∂w1i
· · · ∂w 1i
∂y1 ∂y2 ∂yn
∂w2i ∂w2i · · · ∂w2i
∂y1 ∂yn
Ji = . .. , i = 1, 2, . . . , k,
∂y2
.
.. .. .
∂wni ∂wni ∂wni
∂y ∂y · · · ∂y
1 2 n
k
g(y1 , y2 , . . . , yn ) = f [w1i (y1 , . . . , yn ), . . . , wni (y1 , . . . , yn )]|Ji |,
i=1
provided that (y1 , y2 , . . . , yn ) ∈ T , and equals zero elsewhere. The pdf of any Yi ,
say Y1 , is then
∞ ∞
g1 (y1 ) = ··· g(y1 , y2 , . . . , yn ) dy2 · · · dyn .
−∞ −∞
Example 2.7.3. Let X1 and X2 have the joint pdf defined over the unit circle
given by " 1
π 0 < x21 + x22 < 1
f (x1 , x2 ) =
0 elsewhere.
Let Y1 = X12 + X22 and Y2 = X12 /(X12 + X22 ). Thus y1 y2 = x21 and x22 = y1 (1 − y2 ).
The support S maps onto T = {(y1 , y2 ) : 0 < yi < 1, i = 1, 2}. For each ordered
pair (y1 , y2 ) ∈ T , there are four points in S, given by
√
(x1 , x2 ) such that x1 = y1 y2 and x2 = y1 (1 − y2 )
√
(x1 , x2 ) such that x1 = y1 y2 and x2 = − y1 (1 − y2 )
√
(x1 , x2 ) such that x1 = − y1 y2 and x2 = y1 (1 − y2 )
√
and (x1 , x2 ) such that x1 = − y1 y2 and x2 = − y1 (1 − y2 ).
2.7. Transformations for Several Random Variables 149
It iseasy to see that the absolute value of each of the four Jacobians equals
1/4 y2 (1 − y2 ). Hence, the joint pdf of Y1 and Y2 is the sum of four terms and can
be written as
1 1 1
g(y1 , y2 ) = 4 = , (y1 , y2 ) ∈ T .
π 4 y2 (1 − y2 ) π y2 (1 − y2 )
Of course, as in the bivariate case, we can use the mgf technique by noting that
if Y = g(X1 , X2 , . . . , Xn ) is a function of the random variables, then the mgf of Y
is given by
∞ ∞ ∞
tY
E e = ··· etg(x1 ,x2 ,...,xn ) f (x1 , x2 , . . . , xn ) dx1 dx2 · · · dxn ,
−∞ −∞ −∞
in the continuous case, where f (x1 , x2 , . . . , xn ) is the joint pdf. In the discrete case,
summations replace the integrals. This procedure is particularly useful in cases in
which we are dealing with linear functions of independent random variables.
If Y = X1 + X2 + X3 , the mgf of Y is
( )
E etY = E et(X1 +X2 +X3 )
= E etX1 etX2 etX3
= E etX1 E etX2 E etX3 ,
Hence,
E etY = exp{(μ1 + μ2 + μ3 )(et − 1)}.
150 Multivariate Distributions
Hence,
E etY = (1 − t)−4 .
In Section 3.3, we find that this is the mgf of a distribution with pdf
" 1 3 −y
3! y e 0<y<∞
fY (y) =
0 elsewhere.
2.7.5. Let X1 , X2 , X3 be iid with common pdf f (x) = e−x , x > 0, 0 elsewhere.
Find the joint pdf of Y1 = X1 /X2 , Y2 = X3 /(X1 + X2 ), and Y3 = X1 + X2 . Are
Y1 , Y2 , Y3 mutually independent?
2.7.6. Let X1 , X2 have the joint pdf f (x1 , x2 ) = 1/π, 0 < x21 + x22 < 1. Let
Y1 = X12 + X22 and Y2 = X2 . Find the joint pdf of Y1 and Y2 .
2.7.7. Let X1 , X2 , X3 , X4 have the joint pdf f (x1 , x2 , x3 , x4 ) = 24, 0 < x1 < x2 <
x3 < x4 < 1, 0 elsewhere. Find the joint pdf of Y1 = X1 /X2 , Y2 = X2 /X3 ,Y3 =
X3 /X4 ,Y4 = X4 and show that they are mutually independent.
2.7.8. Let X1 , X2 , X3 be iid with common mgf M (t) = ((3/4) + (1/4)et )2 , for all
t ∈ R.
(a) Determine the probabilities, P (X1 = k), k = 0, 1, 2.
(b) Find the mgf of Y = X1 + X2 + X3 and then determine the probabilities,
P (Y = k), k = 0, 1, 2, . . . , 6.
for specified constants a1 , . . . , an . We obtain expressions for the mean and variance
of T .
The mean of T follows immediately from linearity of expectation. For reference,
we state it formally as a theorem.
Theorem 2.8.1. Suppose T is given by expression (2.8.1). Suppose E(Xi ) − μi ,
for i = 1, . . . , n. Then
n
E(T ) = ai μi . (2.8.2)
i=1
Proof: Using the definition of the covariance and Theorem 2.8.1, we have the first
equality below, while the second equality follows from the linearity of E:
⎡ ⎤
n m
Cov(T, W ) = E ⎣ (ai Xi − ai E(Xi ))(bj Yj − bj E(Yj ))⎦
i=1 j=1
n
m
= ai bj E[(Xi − E(Xi ))(Yj − E(Yj ))],
i=1 j=1
n
Var(T ) = Cov(T, T ) = a2i Var(Xi ) + 2 ai aj Cov(Xi , Xj ). (2.8.4)
i=1 i<j
Note that we need only Xi and Xj to be uncorrelated for all i = j to obtain this
result.
Next, in addition to independence, we assume that the random variables have
the same distribution. We call such a collection of random variables a random
sample which we now state in a formal definition.
Definition 2.8.1. If the random variables X1 , X2 , . . . , Xn are independent and
identically distributed, i.e. each Xi has the same distribution, then we say that
these random variables constitute a random sample of size n from that common
distribution. We abbreviate independent and identically distributed by iid.
In the next two examples, we find some properties of two functions of a random
sample, namely the sample mean and variance.
Example 2.8.1 (Sample Mean). Let X1 , . . . , Xn be independent and identically
distributed random variables with common mean μ and variance σ 2 . The sample
mean is defined by X = n−1 i=1 Xi . This is a linear combination of the sample
n
−1
observations with ai ≡ n ; hence, by Theorem 2.8.1 and Corollary 2.8.2, we have
σ2
E(X) = μ and Var(X) = n . (2.8.6)
where the second equality follows after some algebra; see Exercise 2.8.1.
In the average that defines the sample variance S 2 , the division is by n − 1
instead of n. One reason for this is that it makes S 2 unbiased for σ 2 , as next
shown. Using the above theorems, the results of the last example, and the facts
2
that E(X 2 ) = σ 2 + μ2 and E(X ) = (σ 2 /n) + μ2 , we have the following:
n
2
2 −1 2
E(S ) = (n − 1) E(Xi ) − nE(X )
i=1
% &
= (n − 1)−1 nσ 2 + nμ2 − n[(σ 2 /n) + μ2 ]
= σ2 . (2.8.8)
EXERCISES
2.8.1. Derive the second equality in expression (2.8.7).
2.8.2. Let X1 , X2 , X3 , X4 be four iid random variables having the same pdf f (x) =
2x, 0 < x < 1, zero elsewhere. Find the mean and variance of the sum Y of these
four random variables.
2.8.3. Let X1 and X2 be two independent random variables so that the variances
of X1 and X2 are σ12 = k and σ22 = 2, respectively. Given that the variance of
Y = 3X2 − X1 is 25, find k.
2.8.4. If the independent variables X1 and X2 have means μ1 , μ2 and variances
σ12 , σ22 , respectively, show that the mean and variance of the product Y = X1 X2
are μ1 μ2 and σ12 σ22 + μ21 σ22 + μ22 σ12 , respectively.
2.8.5. Find the mean and variance of the sum Y = 5i=1 Xi , where X1 , . . . , X5 are
iid, having pdf f (x) = 6x(1 − x), 0 < x < 1, zero elsewhere.
2.8.6. Determine the mean and variance of the sample mean X = 5−1 5i=1 Xi ,
where X1 , . . . , X5 is a random sample from a distribution having pdf f (x) = 4x3 , 0 <
x < 1, zero elsewhere.
2.8.7. Let X and Y be random variables with μ1 = 1, μ2 = 4, σ12 = 4, σ22 =
6, ρ = 21 . Find the mean and variance of the random variable Z = 3X − 2Y .
2.8.8. Let X and Y be independent random variables with means μ1 , μ2 and
variances σ12 , σ22 . Determine the correlation coefficient of X and Z = X − Y in
terms of μ1 , μ2 , σ12 , σ22 .
154 Multivariate Distributions
2.8.9. Let μ and σ 2 denote the mean and variance of the random variable X. Let
Y = c + bX, where b and c are real constants. Show that the mean and variance of
Y are, respectively, c + bμ and b2 σ 2 .
2.8.14. Let X1 and X2 have a joint distribution with parameters μ1 , μ2 , σ12 , σ22 ,
and ρ. Find the correlation coefficient of the linear functions of Y = a1 X1 + a2 X2
and Z = b1 X1 + b2 X2 in terms of the real constants a1 , a2 , b1 , b2 , and the
parameters of the distribution.
2.8.15. Let X1 , X2 , and X3 be random variables with equal variances but with
correlation coefficients ρ12 = 0.3, ρ13 = 0.5, and ρ23 = 0.2. Find the correlation
coefficient of the linear functions Y = X1 + X2 and Z = X2 + X3 .
2.8.16. Find the variance of the sum of 10 random variables if each has variance 5
and if each pair has correlation coefficient 0.5.
2.8.17. Let X and Y have the parameters μ1 , μ2 , σ12 , σ22 , and ρ. Show that the
correlation coefficient of X and [Y − ρ(σ2 /σ1 )X] is zero.
2.8.18. Let S 2 be the sample variance of a random sample from a distribution with
variance σ 2 > 0. Since E(S 2 ) = σ 2 , why isn’t E(S) = σ?
Hint: Use Jensen’s inequality to show that E(S) < σ.