Joint Probability Distribution Reference 2
Joint Probability Distribution Reference 2
If the random variables are continuous, then the joint probability density function is a
function f (x, y) ≥ 0 satisfying
f (x, y) dx dy = 1
state space
The probability that a ≤ X ≤ b and c ≤ Y ≤ d is obtained from the joint probability density
function as
b d
f (x, y) d y d x
x =a y =c
Example 19 A company that services air conditioner units in residences and office blocks is interested
Air Conditioner in how to schedule its technicians in the most efficient manner. Specifically, the company is
Maintenance interested in how long a technician takes on a visit to a particular location, and the company
recognizes that this mainly depends on the number of air conditioner units at the location that
need to be serviced.
If the random variable X, taking the values 1, 2, 3, and 4, is the service time in hours taken
at a particular location, and the random variable Y , taking the values 1, 2, and 3, is the number
of air conditioner units at the location, then these two random variables can be thought of as
jointly distributed.
Suppose that their joint probability mass function pi j is given in Figure 2.58. The figure
indicates, for example, that there is a probability of 0.12 that X = 1 and Y = 1, so that there
is a probability of 0.12 that a particular location chosen at random has one air conditioner
unit that takes a technician one hour to service. Similarly, there is a probability of 0.07 that
a location has three air conditioner units that take four hours to service. Notice that this is a
valid probability mass function since
is given in Figure 2.59. For example, the probability that a location has no more than two air
conditioner units that take no more than two hours to service is
F(2, 2) = p11 + p12 + p21 + p22 = 0.12 + 0.08 + 0.08 + 0.15 = 0.43
Example 20 In order to determine the economic viability of mining in a certain area, a mining company
Mineral Deposits obtains samples of ore from the location and measures their zinc content and their iron content.
Suppose that the random variable X is the zinc content of the ore, taking values between 0.5
and 1.5, and that the random variable Y is the iron content of the ore, taking values between
20.0 and 35.0. Furthermore, suppose that their joint probability density function is
The joint probability density function provides complete information about the joint prob-
abilistic properties of the random variables X and Y . For example, the probability that a
randomly chosen sample of ore has a zinc content between 0.8 and 1.0 and an iron content
between 25 and 30 is
1.0 30.0
f (x, y) dx dy
x = 0.8 y = 25.0
which can be calculated to be 0.092. Consequently only about 9% of the ore at the location
has mineral levels within these limits.
2.5 JOINTLY DISTRIBUTED RANDOM VARIABLES 117
and for two continuous random variables, the probability density function of the marginal
distribution of X is
∞
f X (x) = f (x, y) d y
−∞
where in practice the summation and integration limits can be curtailed at the appropriate
boundaries of the state space. Note that the marginal distribution of a random variable X
and the marginal distribution of a random variable Y do not uniquely determine their joint
distribution.
The expectations and variances of the random variables X and Y can be obtained from
their marginal distributions in the usual manner, as illustrated in the following examples.
Example 19 The marginal probability mass function of X , the time taken to service the air conditioner units
Air Conditioner at a particular location, is given in Figure 2.60 and is obtained by summing the appropriate
Maintenance values of the joint probability mass function. For example,
3
P(X = 1) = p1 j = 0.12 + 0.08 + 0.01 = 0.21
j=1
Y = number of
2 0.08 0.15 0.21 0.13 0.57
air conditioner units
Marginal
0.21 0.24 0.30 0.25 distribution
of Y
Marginal distribution of X
0.57
E(X) = 1.79
E(X) = 2.59
0.11
σ = 1.08 σ = 1.08 σ = 0.62 σ = 0.62
1 2 3 4 1 2 3
Service time (hrs) Number of units
FIGURE 2.61 FIGURE 2.62
Marginal probability mass function of the service time Marginal probability mass function of the number of air
conditioner units
FIGURE 2.63 2
fX (x) = 57 − 51 −
(x 1)
Marginal probability density 40 10
function of zinc content
E( X ) = 1.0
σ = 0.23 σ = 0.23
x
0.5 1.5
Zinc content
for example. The expected number of air conditioner units can be calculated to be E(Y ) =
1.79, and the standard deviation is σ = 0.62.
Example 20 The marginal probability density function of X, the zinc content of the ore, is
y=35.0
Mineral Deposits
f X (x) = f (x, y) d y
y=20.0
y=35.0
39 17(x − 1)2 (y − 25)2
= − − dy
y=20.0 400 50 10,000
y=35.0
39y 17y(x − 1)2 (y − 25)3 57 51(x − 1)2
= − − = −
400 50 30,000 y=20.0 40 10
for 0.5 ≤ x ≤ 1.5. This is shown in Figure 2.63, and since it is symmetric about the point
x = 1, the expected zinc content is E(X ) = 1. The variance of the zinc content is
Var(X ) = E((X − E(X ))2 )
1.5 1.5
57 51(x − 1)2
= (x − 1)2 f X (x)d x = (x − 1)2 − dx
0.5 0.5 40 10
1.5
19 51
= (x − 1)3 − (x − 1) 5
= [0.0275] − [−0.0275] = 0.055
40 50 0.5
√
and the standard deviation is therefore σ = 0.055 = 0.23.
The probability that a sample of ore has a zinc content between 0.8 and 1.0 can be calculated
from the marginal probability density function to be
1.0 1.0
57 51(x − 1)2
P(0.8 ≤ X ≤ 1.0) = f X (x)dx = − dx
0.8 0.8 40 10
1.0
57x 17(x − 1)3
=− = [1.425] − [1.1536] = 0.2714
40 10 0.8
Consequently about 27% of the ore has a zinc content within these limits.
120 CHAPTER 2 RANDOM VARIABLES
= 4.27 = 4.27
y
20 35
Iron content
The marginal probability density function of Y , the iron content of the ore, is
x=1.5 x =1.5
39 17(x − 1)2 (y − 25)2
f Y (y) = f (x, y)dx = − − dx
x = 0.5 x = 0.5 400 50 10,000
x =1.5
39x 17(x − 1)3 x(y − 25)2 83 (y − 25)2
= − − = −
400 150 10,000 x = 0.5 1200 10,000
for 20.0 ≤ y ≤ 35.0. This is shown in Figure 2.64 together with the expected iron content
and the standard deviation of the iron content, which can be calculated to be E(Y ) = 27.36
and σ = 4.27.
Example 19 Suppose that a technician is visiting a location that is known to have three air conditioner
Air Conditioner units, an event that has a probability of
Maintenance P(Y = 3) = p+3 = 0.01 + 0.01 + 0.02 + 0.07 = 0.11
The conditional distribution of the service time X consists of the probability values
p13 0.01
p1|Y =3 = P(X = 1|Y = 3) = = = 0.091
p+3 0.11
p23 0.01
p2|Y =3 = P(X = 2|Y = 3) = = = 0.091
p+3 0.11
p33 0.02
p3|Y =3 = P(X = 3|Y = 3) = = = 0.182
p+3 0.11
p43 0.07
p4|Y =3 = P(X = 4|Y = 3) = = = 0.636
p+3 0.11
These values are shown in Figure 2.65, and they are clearly different from the marginal
distribution of the service time given in Figure 2.61. Conditioning on a location having three
air conditioner units increases the chances of a large service time being required.
The conditional expectation of the service time is
4
E(X |Y = 3) = i pi |Y =3
i=1
= (1 × 0.091) + (2 × 0.091) + (3 × 0.182) + (4 × 0.636) = 3.36
which, as expected, is considerably larger than the “overall” expected service time of 2.59
hours. The difference between these expected values can be interpreted in the following way.
122 CHAPTER 2 RANDOM VARIABLES
2
fY|X = 0.55 (y) = 0.073 − (y − 25)
3922.5
E(X | Y = 3) = 3.36
E(Y | X = 0.55) = 27.14
0.636
Probability
0.182
y
1 2 3 4 20 35
Service time (hrs) Iron content
FIGURE 2.65 FIGURE 2.66
Conditional probability mass function of service time when Y = 3 Conditional probability density function of iron content when
X = 0.55
If a technician sets off for a location for which the number of air conditioner units is not
known, then the expected service time at the location is 2.59 hours. However, if the technician
knows that there are three air conditioner units at the location that need servicing, then the
expected service time is 3.36 hours.
Example 20 Suppose that a sample of ore has a zinc content of X = 0.55. What is known about its iron
Mineral Deposits content? The information about the iron content Y is summarized in the conditional probability
density function for the iron content, which is
f (0.55, y)
f Y |X = 0.55 (y) =
f X (0.55)
where the denominator is the marginal distribution of the zinc content X evaluated at 0.55.
Since
57 51(0.55 − 1.00)2
f X (0.55) = − = 0.39225
40 10
the conditional probability density function is
f (0.55, y) 39 17(0.55 − 1.00)2 (y − 25)2
f Y |X = 0.55 (y) = = − −
0.39225 400 × 0.39225 50 × 0.39225 10,000 × 0.39225
(y − 25)2
= 0.073 −
3922.5
for 20.0 ≤ y ≤ 35.0. This is shown in Figure 2.66 with the conditional expectation of the
iron content, which can be calculated to be 27.14, and with the conditional standard deviation,
which is 4.14.
2.5 JOINTLY DISTRIBUTED RANDOM VARIABLES 123
Notice that if the random variables X and Y are independent, then their conditional
distributions are identical to their marginal distributions. If the random variables are discrete,
this is because
pi j pi+ p+ j
pi|Y =y j = = = pi +
p+ j p+ j
and if the random variables are continuous, this is because
f (x, y) f X (x) f Y (y)
f X|Y =y (x) = = = f X (x)
f Y (y) f Y (y)
In either case, the conditional distributions do not depend upon the value conditioned upon, and
they are equal to the marginal distributions. This result has the interpretation that knowledge
of the value taken by the random variable Y does not influence the distribution of the random
variable X , and vice versa.
As a simple example of two independent random variables, suppose that X and Y have a
joint probability density function of
f (x, y) = 6x y 2
for 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1 and f (x, y) = 0 elsewhere. The fact that this joint density
function is a function of x multiplied by a function of y (and that the state spaces of the random
variables [0, 1] do not depend upon each other) immediately indicates that the two random
variables are independent. Specifically, the marginal distribution of X is
1
f X (x) = 6x y 2 dy = 2x
y= 0
for 0 ≤ x ≤ 1, and the marginal distribution of Y is
1
f Y (y) = 6x y 2 d x = 3y 2
x= 0
124 CHAPTER 2 RANDOM VARIABLES
X
0 1 2
Joint probability mass function and marginal Joint probability mass function and marginal
probability mass functions for X and Y in probability mass functions for X and Z in
coin-tossing game coin-tossing game
for 0 ≤ y ≤ 1. The fact that f (x, y) = f X (x) f Y (y) confirms that the random variables X
and Y are independent.
GAMES OF CHANCE Suppose that a fair coin is tossed three times so that there are eight equally likely outcomes,
and that the random variable X is the number of heads obtained in the first and second tosses,
the random variable Y is the number of heads in the third toss, and the random variable Z is
the number of heads obtained in the second and third tosses.
The joint probability mass function of X and Y is given in Figure 2.67 together with the
marginal distributions of X and Y . For example, P(X = 0, Y = 0) = P(T T T ) = 1/8 and
P(X = 0) = P(T T T ) + P(T T H ) = 1/4. It is easy to check that
P(X = i, Y = j) = P(X = i )P(Y = j)
for all values of i = 0, 1, 2 and j = 0, 1, so that the joint probability mass function is equal
to the product of the two marginal probability mass functions. Consequently, the random
variables X and Y are independent, which is not surprising since the outcome of the third coin
toss should be unrelated to the outcomes of the first two coin tosses.
Figure 2.68 shows the joint probability mass function of X and Z together with the marginal
distributions of X and Z. For example, P(X = 1, Z = 1) = P(H T H ) + P(T H T ) = 1/4.
Notice, however, that
1
P(X = 0, Z = 0) = P(T T T ) =
8
1
P(X = 0) = P(T T H ) + P(T T T ) =
4
and
1
P(Z = 0) = P(H T T ) + P(T T T ) =
4
2.5 JOINTLY DISTRIBUTED RANDOM VARIABLES 125
so that
P(X = 0, Z = 0) = P(X = 0)P(Z = 0)
This result indicates that the random variables X and Z are not independent. In fact, their
dependence is a result of their both depending upon the result of the second coin toss.
The strength of the dependence of two random variables on each other is indicated by
their covariance, which is defined to be
Cov(X, Y ) = E((X − E(X ))(Y − E(Y )))
The covariance can be any positive or negative number, and independent random variables
have a covariance of zero. It is often convenient to calculate the covariance from an alternative
expression
Cov(X, Y ) = E((X − E(X))(Y − E(Y )))
= E(X Y − X E(Y ) − E(X )Y + E(X )E(Y ))
= E(X Y ) − E(X )E(Y ) − E(X)E(Y ) + E(X)E(Y )
= E(X Y ) − E(X )E(Y )
Covariance
The covariance of two random variables X and Y is defined to be
Cov(X, Y ) = E((X − E(X ))(Y − E(Y ))) = E(X Y ) − E(X )E(Y )
The covariance can be any positive or negative number, and independent random
variables have a covariance of 0.
In practice, the most convenient way to assess the strength of the dependence between
two random variables is through their correlation.
Correlation
The correlation between two random variables X and Y is defined to be
Cov(X, Y )
Corr(X, Y ) = √
Var(X)Var(Y )
The correlation takes values between −1 and 1, and independent random variables
have a correlation of 0.
Random variables with a positive correlation are said to be positively correlated, and in
such cases there is a tendency for high values of one random variable to be associated with
high values of the other random variable. Random variables with a negative correlation are
said to be negatively correlated, and in such cases there is a tendency for high values of one
random variable to be associated with low values of the other random variable. The strength
of these tendencies increases as the correlation moves further away from 0 to 1 or to −1.
As an illustration of the calculation of a covariance, consider again the simple example
where
f (x, y) = 6x y 2
126 CHAPTER 2 RANDOM VARIABLES
Example 19 The expected service time is E(X) = 2.59 hours, and the expected number of units serviced
Air Conditioner is E(Y ) = 1.79. In addition,
Maintenance 4 3
E(X Y ) = i j pi j
i=1 j=1
= (1 × 1 × 0.12) + (1 × 2 × 0.08) + · · · + (4 × 3 × 0.07) = 4.86
Since Var(X ) = 1.162 and Var(Y ) = 0.384, the correlation between the service time and the
number of units serviced is
Cov(X, Y ) 0.224
Corr(X, Y ) = √ = √ = 0.34
Var(X)Var(Y ) 1.162 × 0.384
As expected, the service time and the number of units serviced are not independent but are
positively correlated. This makes sense because there is a tendency for locations with a large
number of air conditioner units to require relatively long service times.
GAMES OF CHANCE Consider again the tossing of three coins and the random variables X , Y , and Z. It is easy to
check that E(X) = 1 and E(Y ) = 1/2, and that
2 1
1 1 1
E(XY ) = i j pi j = 1×1× + 2×1× =
4 8 2
i =0 j =0
Consequently,
1 1
Cov(X, Y ) = E(XY ) − E(X)E(Y ) = − 1× =0
2 2
which is expected because the random variables X and Y are independent.