62 CHAPTER 1.
ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
1.12 Multivariate Random Variables
We will be using matrix notation to denote multivariate rvs and their distributions.
Denote by X = (X1 , . . . , Xn )T an n-dimensional random vector whose compo-
nents are random variables. Then, all the definitions given for bivariate rvs extend
to the multivariate case. For example, if X is continuous, then we may write
Z xn Z x1
FX (x1 , . . . , xn ) = ··· fX (x1 , . . . , xn )dx1 . . . dxn
−∞ −∞
and Z Z
P (X ⊆ A) = ··· fX (x1 , . . . , xn )dx1 . . . dxn ,
A
where A ⊆ X and X ⊆ Rn is the support of fX .
Example 1.37. Let X = (X1 , X2 , X3 , X4 )T be a four-dimensional random vector
with the joint pdf given by
3
fX (x1 , x2 , x3 , x4 ) = (x21 + x22 + x23 + x24 )IX ,
4
where X = {(x1 , x2 , x3 , x4 ) ∈ R4 : 0 < xi < 1, i = 1, 2, 3, 4}. Calculate:
1. the marginal pdf of (X1 , X2 );
2. the expectation E(X1 X2 );
3. the conditional pdf f x3 , x4 |x1 = 31 , x2 = 2
3
;
4. the probability P X1 < 21 , X2 < 34 , X4 > 1
2
.
Solution:
1. Here we have to calculate the double integral of the joint pdf with respect
to x3 and x4 , that is,
Z ∞Z ∞
f (x1 , x2 ) = fX (x1 , x2 , x3 , x4 )dx3 dx4
−∞ −∞
Z 1Z 1
3 2
= (x1 + x22 + x23 + x24 )dx3 dx4
0 0 4
3 2 1
= (x1 + x22 ) + .
4 2
1.12. MULTIVARIATE RANDOM VARIABLES 63
2. By definition of expectation we have
Z ∞Z ∞
E(X1 X2 ) = x1 x2 f (x1 , x2 )dx1 dx2
−∞ −∞
Z 1Z 1
3 2 2 1 5
= x1 x2 (x1 + x2 ) + dx1 dx2 = .
0 0 4 2 16
3. By definition of a conditional pdf we have,
fX (x1 , x2 , x3 , x4 )
f x3 , x4 |x1 , x2 =
f (x1 , x2 )
3
4
(x1 + x22 + x23 + x24 )
2
= 3
4
(x21 + x22 ) + 21
x2 + x2 + x2 + x2
= 1 2 2 2 3 2 4.
x1 + x2 + 3
Hence,
1 2
2 2
1 2 + + x23 + x24 5 9 9
f x3 , x4 |x1 = , x2 = = 3
3
= + x23 + x24 .
3 3 1 2 2
11 11 11
3
+ 23 + 32
4. Here we use (indirectly) the marginal pdf for (X1 , X2 , X4 ):
1 3 1
P X1 < , X2 < , X4 >
2 4 2
Z 1Z 1Z 3 Z 1
4 2 3 151
= (x21 + x22 + x23 + x24 )dx1 dx2 dx3 dx4 = .
1
2
0 0 0 4 1024
The following results will be very useful in the second part of this course. They
are extensions of Definition 1.18, Theorem 1.13, and Theorem 1.14, respectively,
to n random variables X1 , X2 , . . . , Xn .
Definition 1.22. Let X = (X1 , X2 , . . . , Xn )T denote a continuous n-dimensional
rv with joint pdf fX (x1 , x2 , . . . , xn ) and marginal pdfs fXi (xi ), i = 1, 2, . . . , n.
The random variables are called mutually independent (or just independent) if
n
Y
fX (x1 , x2 , . . . , xn ) = fXi (xi ).
i=1
64 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
It means that all pairs Xi , Xj , i 6= j, are independent.
Example 1.38. Suppose that Yi ∼ Exp(λ) independently for i = 1, 2, . . . , n.
Then the joint pdf of Y = (Y1 , Y2 , . . . , Yn )T is
n
Y Pn
fY (y1 , . . . , yn ) = λe−λyi = λn e−λ i=1 yi
.
i=1
Theorem 1.21. For gj (Xj ), a function of Xj only, j = 1, 2, . . . , m, m ≤ n, we
have !
Ym m
Y
E gj (Xj ) = E gj (Xj ) .
j=1 j=1
Theorem 1.22. Let X = (X1 , X2 , . . . , Xn )T be a vector of mutually independent
rvs with mgfs MX1 (t), MX2 (t), . . . , MXn (t) and let a1 , a2 , . . . , P
an and b1 , b2 , . . . , bn
be fixed constants. Then the mgf of the random variable Z = ni=1 (ai Xi + bi ) is
P n
Y
MZ (t) = et bi
MXi (ai t).
i=1
Exercise 1.23. Proof Theorem 1.22.
Example
Pn 1.39. Calculate the mean and the variance of the random variable Y =
i=1 X i where Xi ∼ Gamma(αi , λ) independently.
,
First, we will find the mgf of Y and then generate the first and second moments
using this mgf (Theorem 1.7). Xi are independent, hence, by Theorem 1.22 we
have n
Y
MY (t) = MXi (t).
i=1
The pdf of a single rv X ∼ Gamma(α, λ) is
λα α−1 −λx
fX (x) = x e I[0,∞) (x).
Γ(α)
1.12. MULTIVARIATE RANDOM VARIABLES 65
Thus, by the definition of the mgf we have
MX (t) = E etX
Z ∞
λα
= etx xα−1 e−λx dx
Γ(α) 0
Z ∞
λα
= xα−1 e−(λ−t)x dx
Γ(α) 0
Z ∞
(λ − t)α λα
= xα−1 e−(λ−t)x dx
(λ − t)α Γ(α) 0
Z
λα (λ − t)α ∞ α−1 −(λ−t)x
= x e dx
(λ − t)α Γ(α) 0
| {z }
=1, (pdf of a Gamma rv)
α −α
λ t
= = 1− .
λ−t λ
Hence,
n n −αi − Pni=1 αi
Y Y t t
MY (t) = MXi (t) = 1− = 1− .
i=1 i=1
λ λ
This
Pn has the same form as the mgf of a Gamma random variable with parameters
i=1 αi and λ, that is,
n
!
X
Y ∼ Gamma αi , λ .
i=1
The mean and variance of a Gamma rv can be obtained calculating the derivatives
of the mgf at t = 0, see Theorem 1.7. For X ∼ Gamma(α, λ) we have
−α
t
MX (t) = 1 −
λ
α
EX =
λ
α(α + 1)
E X2 =
λ2
α
var(X) = E X 2 − [E X]2 = 2
λ
Pn
Hence, for Y ∼ Gamma( i=1 αi , λ) we get
Pn Pn
α i αi
E Y = i=1 and var(Y ) = i=12 .
λ λ
66 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
The following definition is often used when we consider realizations of rvs (sam-
ples) coming from populations having the same distribution.
Definition 1.23. The random variables X1 , X2 , . . . , Xn are identically distributed
if their distribution functions are identical, that is,
FX1 (x) = FX2 (x) = . . . = FXn (x) f or all x ∈ R.
If they are also independent then we denote this briefly as IID, which means In-
dependently, Identically Distributed. For example, notation
{Xi }i=1,2,...,n ∼ IID
means that the variables Xi are IID but the type of the distribution is not specified.
We will often use IID normal rvs denoted by
Xi ∼ N (µ, σ 2 ), i = 1, 2, . . . , n.
iid
1
Pn
Exercise 1.24. Find the pdf of the random variable X = n i=1 Xi , where
2
Xi ∼ N (µ, σ ), i = 1, 2, . . . , n.
iid
1.12.1 Expectation and Variance of Random Vectors
The expectation of a random vector X is a vector of expectations of its compo-
nents, that is,
X1 E(X1 ) µ1
X2
E(X2 )
µ2
E(X) = E .. = .. = .. = µ.
. . .
Xn E(Xn ) µn
The variance-covariance matrix of X is
V = Var(X)
= E (X − E(X))(X − E(X))T
var(X1 ) cov(X1 , X2 ) . . . cov(X1 , Xn ) (1.20)
cov(X2 , X1 ) var(X2 ) . . . cov(X2 , Xn )
= .. .. .. ..
. . . .
cov(Xn , X1 ) cov(Xn , X2 ) . . . var(Xn )
1.12. MULTIVARIATE RANDOM VARIABLES 67
The following theorem shows a basic property of the variance-covariance matrix.
Theorem 1.23. If X is a random vector then its variance-covariance matrix V
is a non-negative definite matrix, that is for any constant vector b the quadratic
form bT V b is non-negative.
Proof. For any constant vector b ∈ Rn we can construct a one-dimensional vari-
able Y = bT X whose variance is
0 ≤ var(Y ) = E (Y − E(Y ))2
= E (bT X − E(bT X))2
= E (bT X − E(bT X))(bT X − E(bT X))T
= E bT (X − E(X))(X − E(X))T b
= bT E (X − E(X))(X − E(X))T b
= bT Var(X)b = bT V b.
That is bT V b ≥ 0 and so V is a non-negative definite matrix.
The proof of the above theorem shows that the variance of a combination Y =
P n
i=1 bi Xi of random variables Xi is a quadratic form of the variance-covariance
matrix of X and the vector of the coefficients of the combination b. More gen-
erally, if X is n-dimensional rv, B is an m × n constant matrix and a is a real
m × 1 vector, then the expectation and the variance of the random vector
Y = a + BX
are, respectively
E(Y ) = a + B E(X) = a + Bµ,
and
Var(Y ) = B Var XB T .
The covariance of two random vectors, n-dimensional X and m-dimensional Y ,
is defined as
Cov(X, Y ) = E (X − E(X))(Y − E(Y ))T .
It is an n × m-dimensional matrix.
68 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
1.12.2 Joint Moment Generating Function
Definition 1.24. Let X = (X1 , X2 , . . . , Xn )T be a random vector. We define the
joint mgf as T
MX (t) = E et X ,
where t = (t1 , t2 , . . . , tn )T is an n-dimensional argument of M.
Similarly as in the univariate case, there is a unique relationship between the joint
pdf and the joint mgf. The mgf related to a marginal distribution of a subset of
variables Xi1 , . . . , Xis can be obtained by setting tj = 0 for all j not in the set
{i1 , . . . , is }.
Note also that if the variables X1 , X2 , . . . , Xn are mutually independent, then the
joint mgf is a product of the marginal mgfs, that is
n n
tT X Pn tj Xj Y
tj Xj
Y
MX (t) = E e =E e j=1 =E e = MXj (tj ).
j=1 j=1
Another useful property of the joint mgf is given in the following theorem.
Theorem 1.24. Let X = (X1 , X2 , . . . , Xn )T be a random vector. If the joint mgf
of X can be written as a product of some functions gj (tj ), j = 1, 2, . . . , n, that is
n
Y
MX (t) = gj (tj ),
j=1
then the variables X1 , X2 , . . . , Xn are independent.
Proof. Let ti = 0 for all i 6= j. Then, the marginal mgf MXj (tj ) is
Y
MXj (tj ) = gj (tj ) gi (0).
i6=j
Also, note that if ti = 0 for all i = 1, 2, . . . , n, then
Pn
MX (t) = E e j=1 tj Xj = E e0 = 1.
This gives
n
Y Y 1
1 = MX (t) = gj (0) ⇒ gi (0) = .
j=1 i6=j
gj (0)
1.12. MULTIVARIATE RANDOM VARIABLES 69
Therefore,
gj (tj )
MXj (tj ) =
gj (0)
and hence
n
Y n
Y n
Y
MX (t) = gj (tj ) = gj (0)MXj (tj ) = 1 × MXj (tj ).
j=1 j=1 j=1
This means that the joint pdf can also be written as a product of marginal pdfs,
g (t )
each with the marginal mgf equal to MXj (tj ) = gjj (0)j . Hence, the random vari-
ables X1 , X2 , . . . , Xn are independent.
1.12.3 Transformations of Random Vectors
Let X = (X1 , X2 , . . . , Xn )T be a continuous random vector and let g : Rn → Rn
be a one-to-one and onto function denoted by
g(x) = (g1 (x), g2(x), . . . , gn (x))T ,
where x = (x1 , x2 , . . . , xn )T and gi : Rn → R. Then, for a transformed random
vector Y = g(X) we have the following result.
Theorem 1.25. The density of Y = g(X) is given by
fY (y) = fX h(y) Jh (y) ,
where h(y) = g −1 (y) and Jh (y) denotes the absolute value of the Jacobian
∂ ∂ ∂
h (y)
∂y1 1
h (y)
∂y1 2
... h (y)
∂y1 n
∂ ∂ ∂
∂
h (y)
∂y2 1
h (y)
∂y2 2
... h (y)
∂y2 n
Jh (y) = det h(y) = det .. .. ..
∂y . . .
∂ ∂ ∂
h (y)
∂yn 1
h (y) . . . ∂yn hn (y)
∂yn 2
Another useful form of the Jacobian is
−1
Jh (y) = Jg h(y) ,
70 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
where
∂
Jg (x) = det g(x).
∂x
Exercise 1.25. Let A be a non-singular n × n real matrix and let X be an n-
dimensional random vector. Show that the linearly transformed random variable
Y = AX has the joint pdf given by
1
fY (y) = fX A−1 y .
| det A|
1.12.4 Multivariate Normal Distribution
A random variable X has a multivariate normal distribution if its joint pdf can be
written as
1 1 T −1
fX (x1 , . . . , xn ) = √ exp − (x − µ) V (x − µ) ,
(2π)n/2 det V 2
where the mean is
µ = (µ1 , . . . , µn )T ,
and the variance-covariance matrix has the form (1.20).
Exercise 1.26. Use the result from Exercise 1.25 to show that if X ∼ N n (µ, V )
then Y = AX has n-dimensional normal distribution with expectation Aµ and
variance-covariance matrix AV AT .
Lemma 1.3. If X ∼ Nn (µ, V ), B is an m × n matrix, and a is a real m × 1
vector, then the random vector
Y = a + BX
is also multivariate normal with
E(Y ) = a + B E(X) = a + Bµ,
and the variance-covariance matrix,
VY = BV B T .
1.12. MULTIVARIATE RANDOM VARIABLES 71
Note that taking B = bT , where b is an n × 1 dimensional vector and a = 0 we
obtain
Y = bT X = b1 X1 + . . . + bn Xn ,
and
Y ∼ N (bT µ, bT V b).