Stata User Guide Release 18 - Multivariate Statistics, Mvtest Normality
Stata User Guide Release 18 - Multivariate Statistics, Mvtest Normality
com
mvtest normality — Multivariate normality tests
Description
mvtest normality performs tests for univariate, bivariate, and multivariate normality.
See [MV] mvtest for more multivariate tests.
Quick start
Doornik–Hansen omnibus test for multivariate normality for v1, v2, v3, and v4
mvtest normality v1 v2 v3 v4
Also show Henze–Zirkler’s consistent test, Mardia’s multivariate kurtosis test, and Mardia’s multivariate
skewness test
mvtest normality v1 v2 v3 v4, ///
stats(dhansen hzirkler kurtosis skewness)
Same as above
mvtest normality v1 v2 v3 v4, stats(all)
Also show Doornik–Hansen test for bivariate normality for each pair of variables
mvtest normality v1 v2 v3 v4, stats(all) bivariate
As above, but show univariate normality tests from sktest instead of the bivariate tests
mvtest normality v1 v2 v3 v4, stats(all) univariate
Show all multivariate, bivariate, and univariate tests of normality for v1, v2, v3, and v4
mvtest normality v1 v2 v3 v4, stats(all) bivariate univariate
Menu
Statistics > Multivariate analysis > MANOVA, multivariate regression, and related > Multivariate test of means,
covariances, and normality
1
2 mvtest normality — Multivariate normality tests
Syntax
mvtest normality varlist if in weight , options
options Description
Options
univariate display tests for univariate normality (sktest)
bivariate display tests for bivariate normality (Doornik–Hansen)
stats(stats) statistics to be computed
stats Description
dhansen Doornik–Hansen omnibus test; the default
hzirkler Henze–Zirkler’s consistent test
kurtosis Mardia’s multivariate kurtosis test
skewness Mardia’s multivariate skewness test
all all tests listed here
bootstrap, by, jackknife, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands.
Weights are not allowed with the bootstrap prefix; see [R] bootstrap.
aweights are not allowed with the jackknife prefix; see [R] jackknife.
fweights are allowed; see [U] 11.1.6 weight.
Options
Options
univariate specifies that tests for univariate normality be displayed, as obtained from sktest; see
[R] sktest.
bivariate specifies that the Doornik–Hansen (2008) test for bivariate normality be displayed for
each pair of variables.
stats(stats) specifies one or more test statistics for multivariate normality. Multiple stats are separated
by white space. The following stats are available:
dhansen produces the Doornik–Hansen (2008) omnibus test.
hzirkler produces Henze–Zirkler’s (1990) consistent test.
kurtosis produces the test based on Mardia’s (1970) measure of multivariate kurtosis.
skewness produces the test based on Mardia’s (1970) measure of multivariate skewness.
all is a convenient shorthand for stats(dhansen hzirkler kurtosis skewness).
mvtest normality — Multivariate normality tests 3
Example 1
The classic Fisher iris data from Anderson (1935) consists of four features measured on 50 samples
from each of three iris species. The four features are the length and width of the sepal and petal. The
three species are Iris setosa, Iris versicolor, and Iris virginica. We hypothesize that these features
might be normally distributed within species, though they are likely not normally distributed across
species. We will examine the Iris setosa data.
. use http://www.stata-press.com/data/r14/iris
(Iris data)
. kdensity petlen if iris==1, name(petlen, replace) title(petal length)
. kdensity petwid if iris==1, name(petwid, replace) title(petal width)
. kdensity sepwid if iris==1, name(sepwid, replace) title(sepal width)
. kdensity seplen if iris==1, name(seplen, replace) title(sepal length)
. graph combine petlen petwid seplen sepwid, title("Iris setosa data")
6
Density
Density
2 4 0
0 .2 .4 .6 .8 1
Density
Density
.5 1 0
joint
Variable Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2
From the univariate tests of normality, petwid does not appear to be normally distributed: p-values
of 0.0010 for skewness, 0.0442 for kurtosis, and 0.0024 for the joint univariate test. The univariate
tests of the other three variables do not lead to a rejection of the null hypothesis of normality.
The bivariate tests of normality show a rejection (at the 5% level) of the null hypothesis of
bivariate normality for all pairs of variables that include petwid. Other pairings fail to reject the null
hypothesis of bivariate normality.
Of the four multivariate normality tests, only the Doornik–Hansen test rejects the null hypothesis
of multivariate normality, p-value of 0.0020.
The Doornik-Hansen (2008) test and Mardia’s (1970) test for multivariate kurtosis take computing
time roughly proportional to the number of observations. In contrast, the computing time of the test
by Henze-Zirkler (1990) and Mardia’s (1970) test for multivariate skewness are roughly proportional
to the square of the number of observations.
mvtest normality — Multivariate normality tests 5
Stored results
mvtest normality stores the following in r():
Scalars
r(p dh) significance of chi2 dh (stats(dhansen))
r(df dh) degrees of freedom of chi2 dh (stats(dhansen))
r(chi2 dh) Doornik–Hansen statistic (stats(dhansen))
r(rank hz) rank of covariance matrix (stats(hzirkler))
r(p hz) two-sided significance of hz (stats(hzirkler))
r(z hz) normal variate associated with hz (stats(hzirkler))
r(V hz) expected variance of log(hz) (stats(hzirkler))
r(E hz) expected value of log(hz) (stats(hzirkler))
r(hz) Henze–Zirkler discrepancy statistic (stats(hzirkler))
r(rank mkurt) rank of covariance matrix (stats(kurtosis))
r(p mkurt) significance of Mardia mKurtosis test (stats(kurtosis))
r(z mkurt) normal variate associated with Mardia mKurtosis (stats(kurtosis))
r(chi2 mkurt) chi-squared of Mardia mKurtosis (stats(kurtosis))
r(mkurt) Mardia mKurtosis test statistic (stats(kurtosis))
r(rank mskew) rank for Mardia mSkewness test (stats(skewness))
r(p mskew) significance of Mardia mSkewness test (stats(skewness))
r(df mskew) degrees of freedom of Mardia mSkewness test (stats(skewness))
r(chi2 mskew) chi-squared of Mardia mSkewness test (stats(skewness))
r(mskew) Mardia mSkewness test statistic (stats(skewness))
Matrices
r(U dh) matrix with the skewness and kurtosis of orthonormalized variables
(used in the Doornik–Hansen test): b1, b2, z(b1), and z(b2) (stats(dhansen))
r(Btest) bivariate test statistics (bivariate)
r(Utest) univariate test statistics (univariate)
N N N
1 XX 3 1 X 2
b1,k = g and b2,k = g
N 2 i=1 j=1 ij N i=1 ii
where gij = (xi − x)0 S−1 (xj − x). The test statistic
(k + 1)(N + 1)(N + 3)
z1 = b1,k
6{(N + 1)(k + 1) − 6}
6 mvtest normality — Multivariate normality tests
is approximately χ2 distributed with k(k + 1)(k + 2)/6 degrees of freedom. The test statistic
b2,k − k(k + 2)
z2 = p
8k(k + 2)/N
is approximately N (0, 1) distributed. Also see Rencher and Christensen (2012, 108); Mardia, Kent,
and Bibby (1979, 20–22); and Seber (1984, 148–149).
Henze–Zirkler
The Henze–Zirkler (1990) test, under the assumption that S is nonsingular, is
N N
β2
1 XX
T = exp − (xi − xj )0 S−1 (xi − xj )
N i=1 j=1 2
N
β2
X
− 2(1 + β 2 )−k/2 exp − (xi − x)0 S−1 (xi − x)
i=1
2(1 + β 2 )
2 −k/2
+ N (1 + 2β )
where
1/(k+4)
1 N (2k + 1)
β=√
2 4
kβ 2 k(k + 2)β 4
2 −k/2
E(T ) = 1 − (1 + 2β ) 1+ +
1 + 2β 2 2(1 + 2β 2 )2
Doornik–Hansen
For the Doornik–Hansen (2008) test, the multivariate observations are transformed, then the
univariate skewness and kurtosis for each transformed variable is computed, and then these are
combined into an approximate χ2 statistic.
mvtest normality — Multivariate normality tests 7
−1/2
Let V be a matrix with ith diagonal element equal to Sii , where Sii is the ith diagonal
element of S. C = VSV is then the correlation matrix. Let H be a matrix with columns equal to
the eigenvectors of C, and let Λ be a diagonal matrix with the corresponding eigenvalues. Let X̆ be
the centered version of X, that is, x subtracted from each row. The data are then transformed using
Ẋ = X̆VHΛ−1/2 H0 .
The univariate skewness and kurtosis for each column of Ẋ is then computed. The general
√ 3/2
formula for univariate skewness is b1 = m3 /m2 and kurtosis is b2 = m4 /m22 , where mp =
PN
1/N i=1 (xi − x)p . Let ẋi denote the ith observation from the selected column of Ẋ. Because
√ by
construction the mean of ẋ is zero and the variance m2 is one, the formulas simplify to b1 = m3
PN p
and b2 = m4 , where mp = 1/N i=1 ẋi .
√
The univariate skewness, b1 , is transformed into an approximately normal variate, z1 , as in
D’Agostino (1970):
p
z1 = δ log y + 1 + y 2
where 1/2
b1 (ω 2 − 1)(N + 1)(N + 3)
y=
12(N − 2)
√ −1/2
δ = log ω 2
p
ω 2 = −1 + 2(β − 1)
3(N 2 + 27N − 70)(N + 1)(N + 3)
β=
(N − 2)(N + 5)(N + 7)(N + 9)
The univariate kurtosis, b2 , is transformed from a gamma variate into a χ2 -variate and then into
a standard normal variable, z2 , using the Wilson–Hilferty (1931) transform:
√
χ 1/3 1
z2 = 9α −1+
2α 9α
where
χ = 2f (b2 − 1 − b1 )
α = a + b1 c
(N + 5)(N + 7)(N 3 + 37N 2 + 11N − 313)
f=
12δ
(N − 7)(N + 5)(N + 7)(N 2 + 2N − 5)
c=
6δ
(N − 2)(N + 5)(N + 7)(N 2 + 27N − 70)
a=
6δ
δ = (N − 3)(N + 1)(N 2 + 15N − 4)
The z1 and z2 associated with the columns of Ẋ are collected into vectors Z1 and Z2 . The statistic
Z01 Z1 + Z02 Z2 is approximately χ2 distributed with 2k degrees of freedom.
8 mvtest normality — Multivariate normality tests
Acknowledgment
An earlier implementation of the Doornik and Hansen (2008) test is the omninorm package of
Baum and Cox (2007).
References
Anderson, E. 1935. The irises of the Gaspe Peninsula. Bulletin of the American Iris Society 59: 2–5.
Baum, C. F., and N. J. Cox. 2007. omninorm: Stata module to calculate omnibus test for univariate/multivariate
normality. Boston College Department of Economics, Statistical Software Components S417501.
http://ideas.repec.org/c/boc/bocode/s417501.html.
D’Agostino, R. B. 1970. Transformation to normality of the null distribution of g1 . Biometrika 57: 679–681.
Doornik, J. A., and H. Hansen. 2008. An omnibus test for univariate and multivariate normality. Oxford Bulletin of
Economics and Statistics 70: 927–939.
Henze, N., and B. Zirkler. 1990. A class of invariant consistent tests for multivariate normality. Communications in
Statistics—Theory and Methods 19: 3595–3617.
Mardia, K. V. 1970. Measures of multivariate skewness and kurtosis with applications. Biometrika 57: 519–530.
Mardia, K. V., J. T. Kent, and J. M. Bibby. 1979. Multivariate Analysis. London: Academic Press.
Rencher, A. C., and W. F. Christensen. 2012. Methods of Multivariate Analysis. 3rd ed. Hoboken, NJ: Wiley.
Seber, G. A. F. 1984. Multivariate Observations. New York: Wiley.
Wilson, E. B., and M. M. Hilferty. 1931. The distribution of chi-square. Proceedings of the National Academy of
Sciences 17: 684–688.
Also see
[R] sktest — Skewness and kurtosis test for normality
[R] swilk — Shapiro – Wilk and Shapiro – Francia tests for normality