0% found this document useful (0 votes)
58 views

Multivariate Analysis

This document summarizes key concepts from Chapter 2 of STAT7005 Multivariate Methods. It discusses: 1) The multivariate normal distribution and its properties including linearity, additivity, and partitioning. 2) Maximum likelihood estimation of the mean and covariance matrix of a multivariate normal, where the sample mean and covariance matrix are the MLEs. 3) The Wishart distribution, a generalization of the chi-squared distribution to matrices that describes the distribution of sample covariance matrices.

Uploaded by

Paul Etham
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Multivariate Analysis

This document summarizes key concepts from Chapter 2 of STAT7005 Multivariate Methods. It discusses: 1) The multivariate normal distribution and its properties including linearity, additivity, and partitioning. 2) Maximum likelihood estimation of the mean and covariance matrix of a multivariate normal, where the sample mean and covariance matrix are the MLEs. 3) The Wishart distribution, a generalization of the chi-squared distribution to matrices that describes the distribution of sample covariance matrices.

Uploaded by

Paul Etham
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

STAT7005 Multivariate Methods

Chapter 2: Multivariate Normal and Related Distributions

Tony Fung

University of Hong Kong


Department of Statistics and Actuarial Science

HKU

Tony Fung STAT7005 HKU 1 / 25


Contents

1 Multivariate normal distribution

2 Estimating µ and Σ

3 Wishart distribution

4 Assessing normality assumption

5 Transformations to near normality

Tony Fung STAT7005 HKU 2 / 25


1. Multivariate normal distribution

Multivariate normal distribution

A random vector x is said to have a multivariate normal


distribution (multinormal distribution) if every linear combination
of its components has a univariate normal distribution.
Suppose a = (a1 a2 )0 and x = (x1 x2 )0 . The multinormality of x
requires that a0 x = a1 x1 + a2 x2 is univariate normal for all a1 and a2 .

Tony Fung STAT7005 HKU 3 / 25


1. Multivariate normal distribution

Multivariate normal distribution (cont.)

Tony Fung STAT7005 HKU 4 / 25


1. Multivariate normal distribution

Multivariate normal distribution (cont.)

Tony Fung STAT7005 HKU 5 / 25


1. Multivariate normal distribution

Multivariate normal distribution (cont.)


There are a lot of convenient properties for the multivariate normal
distribution, as follows.
1 (Linearity) If x is multinormal, for any constant vector a,

a0 x ∼ N(a0 µ, a0 Σa).

2 The mgf of a multinormal random vector x with mean vector µ and


covariance matrix Σ is given by
 
0 1 0
Mx (t) = exp t µ + t Σt .
2

Thus, a multinormal distribution is identified by its mean µ and


covariance Σ. We use the notation x ∼ Np (µ, Σ).

Tony Fung STAT7005 HKU 6 / 25


1. Multivariate normal distribution

Multivariate normal distribution (cont.)


3 (Additivity) Given x ∼ Np (µ1 , Σ1 ) and y ∼ Np (µ2 , Σ2 ). If x and y
are independent,

x + y ∼ Np (µ1 + µ2 , Σ1 + Σ2 ).

4 (Multiple linear combinations) If x ∼ Np (µ, Σ), for any constant


m × p matrix A and constant m × 1 vector d,

Ax + d ∼ Nm (Aµ + d, AΣA0 ).
5 (Standardization) Given a positive definite (i.e. non-singular square,
or invertible) matrix Σ. Then, x ∼ Np (µ, Σ) iff there exists a
non-singular matrix B and z ∼ Np (0, I) such that

x = µ + Bz.

In this case, Σ = BB0 .

Tony Fung STAT7005 HKU 7 / 25


1. Multivariate normal distribution

Multivariate normal distribution (cont.)


6 The pdf of x ∼ Np (µ, Σ) is given by
 
1 1 0 −1
f (x) = p 1 exp − (x − µ) Σ (x − µ) , (Σ is p.d.).
(2π) 2 |Σ| 2 2
     
x1 µ1 Σ11 Σ12
7 (Partition) Let x = ,µ= and Σ = ,
x2 µ2 Σ21 Σ22
where x1 consists of the first q components and x2 consists of the last
(p − q) components.
1 x1 and x2 are independent iff Cov(x1 , x2 ) = Σ12 = 0.
2 x1 ∼ Nq (µ1 , Σ11 ) and x2 ∼ Np−q (µ2 , Σ22 ).
3 (x1 − Σ12 Σ−1
22 x2 ) is independent of x2 and is distributed as
Nq (µ1 − Σ12 Σ−1 −1
22 µ2 , Σ11 − Σ12 Σ22 Σ21 ).

4 Given x2 , x1 ∼ Nq (µ1 + Σ12 Σ−1 −1


22 (x2 − µ2 ),Σ11 − Σ12 Σ22 Σ21 ).

Tony Fung STAT7005 HKU 8 / 25


1. Multivariate normal distribution

Multivariate normal distribution (cont.)


8 (Quadratic form) Given E(x) = µ and Var(x) = Σ and p × p
symmetric matrix A. Then,

E(x0 Ax) = µ0 Aµ + tr(AΣ).

9 (“Squaring” a standard normal) Given x ∼ Np (µ, Σ) and Σ is p.d..


Then,
(x − µ)0 Σ−1 (x − µ) ∼ χ2 (p).
10 Let x ∼ Np (µ, Σ) and Σ is p.d.. Then, for any m × p matrix A and
n × p matrix B,
1 Ax is independent of Bx iff AΣB0 = 0.
2 x0 Ax (A is symmetric p × p) is independent of Bx iff BΣA = 0.
3 x0 Ax and x0 Bx (A and B are both symmetric p × p) are independent iff
AΣB = 0.

Tony Fung STAT7005 HKU 9 / 25


2. Estimating µ and Σ

Maximum likelihood estimation

Suppose x1 , . . . , xn are i.i.d. Np (µ, Σ) where Σ is positive definite,


the sample mean vector x̄ and the sample covariance matrix S defined
are unbiased estimators. By the Law of Large Numbers, these sample
quantities approach to µ and Σ, i.e.,
p p
x̄ → µ; S → Σ.

The likelihood function for the random sample is


n
( )
1 1X
L(µ, Σ) = np n exp − (xi − µ)0 Σ−1 (xi − µ) .
(2π) 2 |Σ| 2 2
i=1

It can be shown that the MLE of µ and Σ are

W (n − 1)S
µ̂ = x̄; Σ̂ = = .
n n
Tony Fung STAT7005 HKU 10 / 25
2. Estimating µ and Σ

Maximum likelihood estimation (cont.)

Properties of the MLEs:

1 x̄ and S are sufficient statistics for Np (µ, Σ). That means x̄ and S
contain all information needed to make inference on µ and Σ.
2 x̄ ∼ Np (µ, n1 Σ) and (n − 1)S ∼ Wp (n − 1, Σ), a central Wishart
distribution (multivariate analogue of chi-squared) which is defined
in the next section.
3 µ̂ is unbiased but Σ̂ is biased (like the univariate case). However, S is
unbiased for Σ.
4 The MLEs possess an invariance property: if MLE of θ is θ̂, then MLE
of φ = h(θ) is φ̂ = h(θ̂), provided that h(·) is a one-to-one function.

Tony Fung STAT7005 HKU 11 / 25


3. Wishart distribution

Definition

The Wishart distribution is one defined on matrices, and can be


thought of as a multivariate generalization of the chi-squared
distribution.
Definition: Suppose xi (i = 1, . . . , k) are independent Np (µi , Σ).
Define a symmetric p × p matrix V as
k
X
V= xi x0i = X0 X.
i=1

Then, V is said to follow a p-dimensional Wishart distribution,


denoted by Wp (k, Σ, Ψ), where:
I k is the degree of freedom;
I Σ is the scaling matrix;
k
µi µ0i is the (p × p symmetric) noncentrality matrix.
P
I Ψ=
i=1

Tony Fung STAT7005 HKU 12 / 25


3. Wishart distribution

Definition (cont.)

When Ψ = 0, we call it a central Wishart distribution, denoted simply


by Wp (k, Σ).
Note that, when p = 1, the pdf of the central Wishart distribution is
reduced to the central chi-squared distribution.
The pdf of the general Wishart distribution is given by
  
−k/2 (k−p−1)/2 1 −1
f (V) = c(p, k) |Σ| |V| exp tr − Σ V
2

k k−1 k−p+1 −1


where c(p, k) = 2kp/2 π p(p−1)/4 Γ
  
2 Γ 2 ···Γ 2 .

Tony Fung STAT7005 HKU 13 / 25


3. Wishart distribution

Properties

Some key properties of the Wishart distribution are listed below:


1 (Additivity) If V1 ∼ Wp (k1 , Σ, Ψ1 ) and V2 ∼ Wp (k2 , Σ, Ψ2 ) are
independent, V1 + V2 ∼ Wp (k1 + k2 , Σ, Ψ1 + Ψ2 ).
2 If V ∼ Wp (k, Σ, Ψ), AVA0 ∼ Wq (k, AΣA0 , AΨA0 ) for any given
q × p matrix A.
3 (Getting chi-squared)
1 If V ∼ Wp (k, Σ, Ψ) and a is a constant vector,
a0 Va/a0 Σa ∼ χ2 (k, a0 Ψa), a non-central chi-squared r.v. In particular,
for the ith diagonal element of V, vii /σii ∼ χ2 (k, Ψii ).
2 If y is any random vector independent of V ∼ Wp (k, Σ),
0
y0 Vy/y Σy ∼ χ2 (k) and is independent of y.
3 If y is any random vector independent of V ∼ Wp (k, Σ), k > p, the
ratio y0 Σ−1 y/y0 V−1 y ∼ χ2 (k − p + 1) and is independent of y.

Tony Fung STAT7005 HKU 14 / 25


3. Wishart distribution

Properties (cont.)

4 (Normal random sample) Let x1 , . . . , xn be a random sample from


Np (µ, Σ).
1 x̄ ∼ Np (µ, n1 Σ).
2 (n − 1)S (= W, the CSSP matrix) ∼ Wp (n − 1, Σ).
3 x̄ and S are independent.

Several other properties are given in the lecture notes. The Wishart
distribution will be used for the construction of other random variables in
the next few chapters.

Tony Fung STAT7005 HKU 15 / 25


4. Assessing normality assumption

Univariate checks

A number of statistical analysis we will introduce in the following


chapters depend on the assumption that the observations are
(multivariate) normally distributed.
How to check normality when there are multiple variables?
Simple method: check each variable for univariate normality
(necessary for multinormality but not sufficient):

1 Q-Q plot (quantile against quantile plot) for normal distribution:


I Sample quantiles are plotted against the theoretical quantiles of a
standard normal distribution.
I A straight line indicates univariate normality.
I Non-linearity may indicate a need to transform the variable.
I One may use a P-P plot as well (sample cdf vs theoretical cdf).

Tony Fung STAT7005 HKU 16 / 25


4. Assessing normality assumption

Univariate checks (cont.)

OK May need transformation


2 Shapiro-Wilk W test — a statistic for checking normality (together
with p-value) conveniently obtained from statistical software.
3 Kolmogorov-Smirnov-Lilliefors (KSL) test for large samples —
comparing empirical and fitted normal cdf.
4 Check for (almost) zero skewness and excess kurtosis.

Tony Fung STAT7005 HKU 17 / 25


4. Assessing normality assumption

Multivariate checks

When n − p is large enough, we make use of property (9) of Section


2.1: if x ∼ Np (µ, Σ), then

(x − µ)0 Σ−1 (x − µ) ∼ χ2 (p).

Check whether the squared generalized distance (Mahalanobis


distance) as defined below follows a chi-squared distribution by a Q-Q
plot (necessary and sufficient condition for very large sample size).
Mahalanobis distance is similar to Euclidean distance, but takes into
account the ellipsoidal contour of the covariance matrix.

Tony Fung STAT7005 HKU 18 / 25


4. Assessing normality assumption

Multivariate checks (cont.)

Steps to calculate distance and produce Q-Q plot:

1 Define the squared generalized distance as di2 = (xi − x̄)0 S−1 (xi − x̄),
i = 1, . . . , n.
2 Order d12 , d22 , . . . , dn2 as d(1)
2 ≤ d2 ≤ · · · ≤ d2 .
(2) (n)
3 2 , where χ2 is the 100(i − 1 )/n percentile of χ2 (p)
Plot χ2(i) vs d(i) (i) 2
distribution.
4 A straight line indicates multivariate normality.

Tony Fung STAT7005 HKU 19 / 25


4. Assessing normality assumption

Multivariate checks (cont.)

OK May need transformation

We may check the principal components (PCs) of the data — each


PC is a linear combination of the variables; if the full data set is
multivariate normal, the PCs must be univariate normal. Hence this is
only a necessary condition (unless n is large enough).
More about principal component analysis (PCA) in Chapter 8.

Tony Fung STAT7005 HKU 20 / 25


5. Transformations to near normality

Common transformations

The standard procedure is to transform each variable to near


normality.
This does not guarantee the transformed data follow multivariate
normal; one may need to check the multinormality again.
Common univariate transformations:
I For right-skewed data (e.g., log-normal), log(x) can work well.

I For count data (e.g., Poisson), we may use x.
I For binomial
√ proportion, we may use the arcsine transformation:
x
arcsin( x), or the logit transformation: log( 1−x ).
More systematic approach?

Tony Fung STAT7005 HKU 21 / 25


5. Transformations to near normality

Box-Cox transformation

The Box-Cox transformation (applied in a univariate fashion) has a


[λ]
parameter λ. The transformed xi is denoted as xi where

 xiλ − 1
xi
[λ]
= for λ 6= 0 , i = 1, . . . , n;
 logλx for λ = 0 , i = 1, . . . , n.
i

Essentially, this performs a power transformation.

Tony Fung STAT7005 HKU 22 / 25


5. Transformations to near normality

Box-Cox transformation (cont.)


“Convenient” choice of λ:
Power, λ Transformation
3 cubic
2 square
1 no transform
0.5 square-root
1/3 cubic-root
0 log
−1/3 inverse of cubic-root
−0.5 inverse of square-root
−1 inverse
−2 inverse of square
−3 inverse of cubic

Tony Fung STAT7005 HKU 23 / 25


5. Transformations to near normality

Box-Cox transformation (cont.)

Since x [λ] ’s are of different scales for different λ, we instead consider


the following “standardized” Box-Cox transformation, so that we can
compare the quality of transformation for different λ’s:

xiλ − 1

, if λ 6= 0 , i = 1, . . . , n;

[λ]
xi = λ[GM(x)]λ−1
GM(x) log xi , if λ = 0 , i = 1, . . . , n,

where GM(x) = (x1 x2 · · · xn )1/n is the geometric mean of the


observations x1 , x2 , . . . , xn . We choose the value of λ that minimizes
the sum of squares of residuals (equivalent to sample variance) after
transformation.

Tony Fung STAT7005 HKU 24 / 25


5. Transformations to near normality

Box-Cox transformation (cont.)

Before transformation After transformation with λ = 2


This procedure is repeated for other variables (with potentially
different λ’s). After all variables have been transformed, we can
calculate the squared Mahalanobis distances and construct the
chi-squared Q-Q plot to check if the transformed data set is closer to
multivariate normal.

Tony Fung STAT7005 HKU 25 / 25

You might also like