0% found this document useful (0 votes)
56 views8 pages

Stata User Guide Release 18 - Multivariate Statistics, Mvtest Normality

This document summarizes the mvtest normality command in Stata, which performs tests for univariate, bivariate, and multivariate normality. It provides an example using Fisher's iris data to test whether four features (petal length, petal width, sepal length, sepal width) are normally distributed within the Iris setosa species. The results show petal width is not normally distributed univariate, several bivariate pairs including petal width are not normal, but only the Doornik-Hansen test rejects multivariate normality. Computing time for some tests scales with number of observations.

Uploaded by

R B Fujitsu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views8 pages

Stata User Guide Release 18 - Multivariate Statistics, Mvtest Normality

This document summarizes the mvtest normality command in Stata, which performs tests for univariate, bivariate, and multivariate normality. It provides an example using Fisher's iris data to test whether four features (petal length, petal width, sepal length, sepal width) are normally distributed within the Iris setosa species. The results show petal width is not normally distributed univariate, several bivariate pairs including petal width are not normal, but only the Doornik-Hansen test rejects multivariate normality. Computing time for some tests scales with number of observations.

Uploaded by

R B Fujitsu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Title stata.

com
mvtest normality — Multivariate normality tests

Description Quick start Menu


Syntax Options Remarks and examples
Stored results Methods and formulas Acknowledgment
References Also see

Description
mvtest normality performs tests for univariate, bivariate, and multivariate normality.
See [MV] mvtest for more multivariate tests.

Quick start
Doornik–Hansen omnibus test for multivariate normality for v1, v2, v3, and v4
mvtest normality v1 v2 v3 v4
Also show Henze–Zirkler’s consistent test, Mardia’s multivariate kurtosis test, and Mardia’s multivariate
skewness test
mvtest normality v1 v2 v3 v4, ///
stats(dhansen hzirkler kurtosis skewness)
Same as above
mvtest normality v1 v2 v3 v4, stats(all)
Also show Doornik–Hansen test for bivariate normality for each pair of variables
mvtest normality v1 v2 v3 v4, stats(all) bivariate
As above, but show univariate normality tests from sktest instead of the bivariate tests
mvtest normality v1 v2 v3 v4, stats(all) univariate
Show all multivariate, bivariate, and univariate tests of normality for v1, v2, v3, and v4
mvtest normality v1 v2 v3 v4, stats(all) bivariate univariate

Menu
Statistics > Multivariate analysis > MANOVA, multivariate regression, and related > Multivariate test of means,
covariances, and normality

1
2 mvtest normality — Multivariate normality tests

Syntax
       
mvtest normality varlist if in weight , options
options Description
Options
univariate display tests for univariate normality (sktest)
bivariate display tests for bivariate normality (Doornik–Hansen)
stats(stats) statistics to be computed

stats Description
dhansen Doornik–Hansen omnibus test; the default
hzirkler Henze–Zirkler’s consistent test
kurtosis Mardia’s multivariate kurtosis test
skewness Mardia’s multivariate skewness test
all all tests listed here

bootstrap, by, jackknife, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands.
Weights are not allowed with the bootstrap prefix; see [R] bootstrap.
aweights are not allowed with the jackknife prefix; see [R] jackknife.
fweights are allowed; see [U] 11.1.6 weight.

Options

 Options
univariate specifies that tests for univariate normality be displayed, as obtained from sktest; see
[R] sktest.
bivariate specifies that the Doornik–Hansen (2008) test for bivariate normality be displayed for
each pair of variables.
stats(stats) specifies one or more test statistics for multivariate normality. Multiple stats are separated
by white space. The following stats are available:
dhansen produces the Doornik–Hansen (2008) omnibus test.
hzirkler produces Henze–Zirkler’s (1990) consistent test.
kurtosis produces the test based on Mardia’s (1970) measure of multivariate kurtosis.
skewness produces the test based on Mardia’s (1970) measure of multivariate skewness.
all is a convenient shorthand for stats(dhansen hzirkler kurtosis skewness).
mvtest normality — Multivariate normality tests 3

Remarks and examples stata.com


Univariate and multivariate tests of normality are provided by the mvtest normality command.

Example 1
The classic Fisher iris data from Anderson (1935) consists of four features measured on 50 samples
from each of three iris species. The four features are the length and width of the sepal and petal. The
three species are Iris setosa, Iris versicolor, and Iris virginica. We hypothesize that these features
might be normally distributed within species, though they are likely not normally distributed across
species. We will examine the Iris setosa data.
. use http://www.stata-press.com/data/r14/iris
(Iris data)
. kdensity petlen if iris==1, name(petlen, replace) title(petal length)
. kdensity petwid if iris==1, name(petwid, replace) title(petal width)
. kdensity sepwid if iris==1, name(sepwid, replace) title(sepal width)
. kdensity seplen if iris==1, name(seplen, replace) title(sepal length)
. graph combine petlen petwid seplen sepwid, title("Iris setosa data")

Iris setosa data


petal length petal width
0 .5 1 1.5 2 2.5

6
Density

Density
2 4 0

1 1.2 1.4 1.6 1.8 2 .1 .2 .3 .4 .5 .6


Petal length in cm Petal width in cm
kernel = epanechnikov, bandwidth = 0.0610 kernel = epanechnikov, bandwidth = 0.0305

sepal length sepal width


1.5

0 .2 .4 .6 .8 1
Density

Density
.5 1 0

4 4.5 5 5.5 6 2 2.5 3 3.5 4 4.5


Sepal length in cm Sepal width in cm
kernel = epanechnikov, bandwidth = 0.1220 kernel = epanechnikov, bandwidth = 0.1525
4 mvtest normality — Multivariate normality tests

We perform all multivariate, univariate, and bivariate tests of normality.


. mvtest norm pet* sep* if iris==1, bivariate univariate stats(all)
Test for univariate normality

joint
Variable Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2

petlen 0.7403 0.1447 2.36 0.3074


petwid 0.0010 0.0442 12.03 0.0024
seplen 0.7084 0.8157 0.19 0.9075
sepwid 0.8978 0.1627 2.07 0.3553

Doornik-Hansen test for bivariate normality

Pair of variables chi2 df Prob>chi2

petlen petwid 17.47 4 0.0016


seplen 5.76 4 0.2177
sepwid 8.50 4 0.0748
petwid seplen 14.97 4 0.0048
sepwid 19.15 4 0.0007
seplen sepwid 5.92 4 0.2049

Test for multivariate normality


Mardia mSkewness = 3.079721 chi2(20) = 27.860 Prob>chi2 = 0.1128
Mardia mKurtosis = 26.53766 chi2(1) = 1.677 Prob>chi2 = 0.1953
Henze-Zirkler = .9488453 chi2(1) = 2.707 Prob>chi2 = 0.0999
Doornik-Hansen chi2(8) = 24.414 Prob>chi2 = 0.0020

From the univariate tests of normality, petwid does not appear to be normally distributed: p-values
of 0.0010 for skewness, 0.0442 for kurtosis, and 0.0024 for the joint univariate test. The univariate
tests of the other three variables do not lead to a rejection of the null hypothesis of normality.
The bivariate tests of normality show a rejection (at the 5% level) of the null hypothesis of
bivariate normality for all pairs of variables that include petwid. Other pairings fail to reject the null
hypothesis of bivariate normality.
Of the four multivariate normality tests, only the Doornik–Hansen test rejects the null hypothesis
of multivariate normality, p-value of 0.0020.

The Doornik-Hansen (2008) test and Mardia’s (1970) test for multivariate kurtosis take computing
time roughly proportional to the number of observations. In contrast, the computing time of the test
by Henze-Zirkler (1990) and Mardia’s (1970) test for multivariate skewness are roughly proportional
to the square of the number of observations.
mvtest normality — Multivariate normality tests 5

Stored results
mvtest normality stores the following in r():
Scalars
r(p dh) significance of chi2 dh (stats(dhansen))
r(df dh) degrees of freedom of chi2 dh (stats(dhansen))
r(chi2 dh) Doornik–Hansen statistic (stats(dhansen))
r(rank hz) rank of covariance matrix (stats(hzirkler))
r(p hz) two-sided significance of hz (stats(hzirkler))
r(z hz) normal variate associated with hz (stats(hzirkler))
r(V hz) expected variance of log(hz) (stats(hzirkler))
r(E hz) expected value of log(hz) (stats(hzirkler))
r(hz) Henze–Zirkler discrepancy statistic (stats(hzirkler))
r(rank mkurt) rank of covariance matrix (stats(kurtosis))
r(p mkurt) significance of Mardia mKurtosis test (stats(kurtosis))
r(z mkurt) normal variate associated with Mardia mKurtosis (stats(kurtosis))
r(chi2 mkurt) chi-squared of Mardia mKurtosis (stats(kurtosis))
r(mkurt) Mardia mKurtosis test statistic (stats(kurtosis))
r(rank mskew) rank for Mardia mSkewness test (stats(skewness))
r(p mskew) significance of Mardia mSkewness test (stats(skewness))
r(df mskew) degrees of freedom of Mardia mSkewness test (stats(skewness))
r(chi2 mskew) chi-squared of Mardia mSkewness test (stats(skewness))
r(mskew) Mardia mSkewness test statistic (stats(skewness))
Matrices
r(U dh) matrix with the skewness and kurtosis of orthonormalized variables
(used in the Doornik–Hansen test): b1, b2, z(b1), and z(b2) (stats(dhansen))
r(Btest) bivariate test statistics (bivariate)
r(Utest) univariate test statistics (univariate)

Methods and formulas


There are N independent k -variate observations, xi , i = 1, . . . , N . Let X denote the N × k
matrix of observations. We wish to test whether these
P observations are multivariate normal distributed,
MVNkP (µ, Σ). The sample mean is x = 1/N i xi , and the sample covariance matrix is S =
1/N (xi − x)(xi − x)0 .
Methods and formulas are presented under the following headings:
Mardia mSkewness and mKurtosis
Henze–Zirkler
Doornik–Hansen

Mardia mSkewness and mKurtosis


Mardia (1970) defined multivariate skewness, b1,k , and kurtosis, b2,k , as

N N N
1 XX 3 1 X 2
b1,k = g and b2,k = g
N 2 i=1 j=1 ij N i=1 ii

where gij = (xi − x)0 S−1 (xj − x). The test statistic

(k + 1)(N + 1)(N + 3)
z1 = b1,k
6{(N + 1)(k + 1) − 6}
6 mvtest normality — Multivariate normality tests

is approximately χ2 distributed with k(k + 1)(k + 2)/6 degrees of freedom. The test statistic

b2,k − k(k + 2)
z2 = p
8k(k + 2)/N

is approximately N (0, 1) distributed. Also see Rencher and Christensen (2012, 108); Mardia, Kent,
and Bibby (1979, 20–22); and Seber (1984, 148–149).

Henze–Zirkler
The Henze–Zirkler (1990) test, under the assumption that S is nonsingular, is

N N
β2
 
1 XX
T = exp − (xi − xj )0 S−1 (xi − xj )
N i=1 j=1 2
N
β2
X  
− 2(1 + β 2 )−k/2 exp − (xi − x)0 S−1 (xi − x)
i=1
2(1 + β 2 )
2 −k/2
+ N (1 + 2β )

where
 1/(k+4)
1 N (2k + 1)
β=√
2 4

As N → ∞, the first two moments of T are given by

kβ 2 k(k + 2)β 4
 
2 −k/2
E(T ) = 1 − (1 + 2β ) 1+ +
1 + 2β 2 2(1 + 2β 2 )2

2kβ 4 3k(k + 2)β 8


 
2 −k/2 2 −k
Var(T ) = 2(1 + 4β ) + 2(1 + 2β ) 1+ +
(1 + 2β 2 )2 4(1 + 2β 2 )4
4 8
 
3kβ k(k + 2)β
− 4w−k/2 1 + +
2w 2w2
where w = (1 + β 2 )(1 + 3β 2 ).
Henze–Zirkler suggest obtaining a p-value from the assumption, supported
 by a series of simulations,
that T is approximately lognormal distributed. Thus let V Z = ln 1 + Var(T )/E(T )2 and EZ =

ln {E(T )} − V Z/2. The transformation Z = { ln(T ) − EZ} / V Z . The p-value of Z is computed
as p = 2Φ(−|Z|), where Φ() is the cumulative normal distribution.

Doornik–Hansen
For the Doornik–Hansen (2008) test, the multivariate observations are transformed, then the
univariate skewness and kurtosis for each transformed variable is computed, and then these are
combined into an approximate χ2 statistic.
mvtest normality — Multivariate normality tests 7

−1/2
Let V be a matrix with ith diagonal element equal to Sii , where Sii is the ith diagonal
element of S. C = VSV is then the correlation matrix. Let H be a matrix with columns equal to
the eigenvectors of C, and let Λ be a diagonal matrix with the corresponding eigenvalues. Let X̆ be
the centered version of X, that is, x subtracted from each row. The data are then transformed using
Ẋ = X̆VHΛ−1/2 H0 .
The univariate skewness and kurtosis for each column of Ẋ is then computed. The general
√ 3/2
formula for univariate skewness is b1 = m3 /m2 and kurtosis is b2 = m4 /m22 , where mp =
PN
1/N i=1 (xi − x)p . Let ẋi denote the ith observation from the selected column of Ẋ. Because
√ by
construction the mean of ẋ is zero and the variance m2 is one, the formulas simplify to b1 = m3
PN p
and b2 = m4 , where mp = 1/N i=1 ẋi .

The univariate skewness, b1 , is transformed into an approximately normal variate, z1 , as in
D’Agostino (1970):  
p
z1 = δ log y + 1 + y 2

where 1/2
b1 (ω 2 − 1)(N + 1)(N + 3)

y=
12(N − 2)
 √ −1/2
δ = log ω 2
p
ω 2 = −1 + 2(β − 1)
3(N 2 + 27N − 70)(N + 1)(N + 3)
β=
(N − 2)(N + 5)(N + 7)(N + 9)

The univariate kurtosis, b2 , is transformed from a gamma variate into a χ2 -variate and then into
a standard normal variable, z2 , using the Wilson–Hilferty (1931) transform:


 
χ 1/3 1
z2 = 9α −1+
2α 9α

where
χ = 2f (b2 − 1 − b1 )
α = a + b1 c
(N + 5)(N + 7)(N 3 + 37N 2 + 11N − 313)
f=
12δ
(N − 7)(N + 5)(N + 7)(N 2 + 2N − 5)
c=

(N − 2)(N + 5)(N + 7)(N 2 + 27N − 70)
a=

δ = (N − 3)(N + 1)(N 2 + 15N − 4)

The z1 and z2 associated with the columns of Ẋ are collected into vectors Z1 and Z2 . The statistic
Z01 Z1 + Z02 Z2 is approximately χ2 distributed with 2k degrees of freedom.
8 mvtest normality — Multivariate normality tests

Acknowledgment
An earlier implementation of the Doornik and Hansen (2008) test is the omninorm package of
Baum and Cox (2007).

References
Anderson, E. 1935. The irises of the Gaspe Peninsula. Bulletin of the American Iris Society 59: 2–5.
Baum, C. F., and N. J. Cox. 2007. omninorm: Stata module to calculate omnibus test for univariate/multivariate
normality. Boston College Department of Economics, Statistical Software Components S417501.
http://ideas.repec.org/c/boc/bocode/s417501.html.
D’Agostino, R. B. 1970. Transformation to normality of the null distribution of g1 . Biometrika 57: 679–681.
Doornik, J. A., and H. Hansen. 2008. An omnibus test for univariate and multivariate normality. Oxford Bulletin of
Economics and Statistics 70: 927–939.
Henze, N., and B. Zirkler. 1990. A class of invariant consistent tests for multivariate normality. Communications in
Statistics—Theory and Methods 19: 3595–3617.
Mardia, K. V. 1970. Measures of multivariate skewness and kurtosis with applications. Biometrika 57: 519–530.
Mardia, K. V., J. T. Kent, and J. M. Bibby. 1979. Multivariate Analysis. London: Academic Press.
Rencher, A. C., and W. F. Christensen. 2012. Methods of Multivariate Analysis. 3rd ed. Hoboken, NJ: Wiley.
Seber, G. A. F. 1984. Multivariate Observations. New York: Wiley.
Wilson, E. B., and M. M. Hilferty. 1931. The distribution of chi-square. Proceedings of the National Academy of
Sciences 17: 684–688.

Also see
[R] sktest — Skewness and kurtosis test for normality
[R] swilk — Shapiro – Wilk and Shapiro – Francia tests for normality

You might also like