0% found this document useful (0 votes)
71 views

Multivariate Methods

This document provides an overview of multivariate methods for data analysis. It describes multivariate data as having more than one outcome variable measured across subjects, resulting in interrelated observations within subjects. The document outlines two main categories of multivariate methods: exploratory methods that extract and visualize data patterns to form hypotheses, and confirmatory methods that test specific hypotheses. It provides examples of commonly used exploratory and confirmatory multivariate techniques.

Uploaded by

Matthew Grayson
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

Multivariate Methods

This document provides an overview of multivariate methods for data analysis. It describes multivariate data as having more than one outcome variable measured across subjects, resulting in interrelated observations within subjects. The document outlines two main categories of multivariate methods: exploratory methods that extract and visualize data patterns to form hypotheses, and confirmatory methods that test specific hypotheses. It provides examples of commonly used exploratory and confirmatory multivariate techniques.

Uploaded by

Matthew Grayson
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Overview

Multivariate methods
Yueh-Yun Chi∗
Since the early 20th century, the need for multivariate analysis has driven the
development of various types of multivariate methods for applications ranging
from social science to biomedical research. Multivariate data, as distinguished
from univariate and multivariable data, consist of more than one outcome variable
measured on a number of subjects. There exist exploratory data analysis methods
that aim to extract, summarize, and visualize empirical data information for the
purpose of formulating hypotheses as well as confirmatory data analysis methods
that allow testing specific research questions and hypotheses. Major exploratory
tools for dimension reduction and classification include principal component
analysis, canonical correlation, exploratory factor analysis, discriminant, and
cluster analysis. Advanced confirmatory modeling techniques encompass general
linear multivariate model, mixed model, and structural equation model. This article
aims to provide an overview of these methods as well as recent developments in
the area of high dimension, low sample size analysis for data from high-throughput
bioassays or high-resolution medical images.  2011 Wiley Periodicals, Inc.

How to cite this article:


WIREs Comp Stat 2012, 4:35–47. doi: 10.1002/wics.185

Keywords: MANOVA; exploratory data analysis

INTRODUCTION for dimension reduction, classification and pathway


analysis. In many cases, the covariance structure can
M ultivariate data, as distinguished from univari-
ate and multivariable data, consist of more than
one outcome (response, dependent) variable measured
have more effect on the validity and quality of a
statistical analysis than any other data features.
The source of multivariate outcomes dictates the
on a number of subjects (independent sampling units). type and measurement scales of the multivariate data,
The data collected are usually presented as a matrix, which in turn, determine the underlying covariance
for example, as an N × p outcome matrix Y, where
structure. Pure multivariate outcomes include vari-
the N rows represent the subjects and the p columns
ables measured at different metrics (e.g., height and
the variables. Observations from one subject (one row
weight). In contrast, commensurate multivariate out-
of Y) are statistically independent from observations
comes involve variables measured at the same scale
from any other subject while observations from the
and unit (e.g., concentrations of different types of
same subject may be correlated. In a clinical trial, the
fatty acids in blood). Repeated measures belong to a
effect of a treatment may be evaluated by a series of
special case of commensurate multivariate outcomes
cognitive measures, which are interrelated within any
and have multiple observations of the subject on a
study participant. The structure of the dependence
single variable at several points in time or space (e.g.,
and variability of each cognitive measure constitute
blood cholesterol level every week for a month). Dou-
the covariance matrix of the outcome variables and
bly multivariate outcomes extend the dimensionality
play an important role in the history of developing
by including repeated measures of two or more (pure
multivariate methods. Depending on the application,
or commensurate) multivariate variables (e.g., height
the covariance matrix (second moment property) can
and weight every year) or repeated measures in time
exist as the nuisance parameters for the primary
and space (e.g., blood pressure every day for each
interest of inferring about the mean structure (first
residing family member).
moment property) or become the primary parameters
The type of measurement scale(s) used in observ-
∗ Correspondenceto: [email protected]
ing the multivariate outcomes is central to the choice
Department of Biostatistics, University of Florida, Gainesville, FL,
and validity of a multivariate analysis. A nominal scale
USA only defines categories or groups of a variable (e.g.,

Volume 4, January/February 2012  2011 Wiley Periodicals, Inc. 35


Overview wires.wiley.com/compstats

blood type) while an ordinal scale provides mean- of predictors and can be used to test all sorts of
ingful ranking (e.g., pain level: minimal, moderate, general linear hypotheses defined by between- and
severe, unbearable). Both nominal and ordinal scale within-subject contrasts. Special cases for testing
data imply using specific methods for categorical data general linear hypotheses include multivariate anal-
analysis. Continuous data can be on either an interval ysis of variance (MANOVA) for comparison between
scale if all differences of the same size are equiva- groups, multivariate analysis of covariance (MAN-
lent or a ratio scale if ratios of the same size are COVA) for adjusting for covariate effect in group
equivalent. The majority of multivariate methods can comparison, and univariate approach to repeated
be applied to analyzing continuous data if additional measures (UNIREP) method for analysis of longi-
assumptions such as Gaussian distribution are valid. tudinal profiles. Generalizations of the general linear
multivariate model include the seemingly unrelated
model (SUM) which allows a distinct set of predictors
Taxonomy of Multivariate Methods
for different outcomes, and the growth curve model
Multivariate methods1–3 can be divided in two (GCM) which characterizes the functional forms of
categories: exploratory data analysis and confirmatory repeated measures (e.g., the rate of tumor cell growth).
data analysis methods (Figure 1). For exploratory Mixed model incorporates both fixed and random
data analysis, empirical information from the data effects such that mean and covariance structures of
is extracted, summarized, and visualized for an the multivariate data can be modeled simultaneously.
ultimate goal of formulating research hypotheses. Structural equation model, useful for pathway analysis
Principal component analysis (PCA) summarizes and causal inference, assumes both outcome and pre-
data dimensions by a handful of components dictor variables are random for the primary objective
that together account for a meaningful portion of of modeling their joint covariance structure.
outcome variation. Canonical correlation analysis
can be viewed as a generalization of PCA in
that information regarding the relationship between Overview of the Article
two sets of variables is summarized and reduced. The review of multivariate theory and distributions
Exploratory factor analysis (EFA) identifies latent relies heavily on matrix operations and algebra,
factors underlying the observations of the multivariate which can be found in the work of Schott4 and
outcomes. Discriminant analysis finds a linear most multivariate analysis textbooks. The rest of
combination of outcomes which best separate groups the article is devoted to giving an overview of
of subjects for the purpose of data reduction and multivariate methods listed in Figure 1. All methods
prediction. Cluster analysis forms groups of subjects apply to continuous multivariate outcomes, however
based on the proximity of multivariate outcomes. some can be generalized to analyze categorical
For confirmatory data analysis, specific research data (e.g., correspondence analysis for categorical
questions and hypotheses are posed at the outset, version of PCA). Recent advances, especially in the
and multivariate models are utilized for estimation area of high dimension, low sample size (HDLSS)
and hypothesis testing. The general linear multivari- research for genomics, metabolomics, proteomics, and
ate model assumes mean multivariate (continuous) medical imaging studies are summarized within each
outcomes as linear functions of a common, fixed set subsection if applicable.

Cluster analysis Mixed model

Discriminant analysis

Exploratory Confirmatory General linear


data analysis data analysis multivariate model

Principal component
Canonical correlation

Exploratory factor Structural equation


analysis model

FIGURE 1 | Taxonomy of multivariate methods.

36  2011 Wiley Periodicals, Inc. Volume 4, January/February 2012


WIREs Computational Statistics Multivariate methods

PRINCIPAL COMPONENT ANALYSIS sets of variables. Applications include relating types of


products purchased to consumers’ lifestyles and per-
For multivariate data with sufficient sample size, sonalities. When one set of variables is the predictor set
PCA is useful for dimension reduction, information
and the other set of variables is the outcome set, the
extraction, and visualization. Important information
objective of canonical correlation analysis becomes
from multiple intercorrelated continuous outcome
determining whether the predictor set affects the
variables can be extracted and expressed as a set
outcome set. Mathematically, canonical correlation
of new uncorrelated variables, known as principal
analysis is related to the singular value decomposition
components, which collectively account for the
of  XY (the covariance matrix between variables X in
majority of data variation. Principal components are
the first set and variables Y in the second set) with the
linear combinations of the original variables, and each
singular values the canonical correlations, and the left
accounts for a portion of variation of the original
and right singular vectors the coefficients linking the
variables, which indicates its relative importance.
original variables X and Y to their respective canonical
Dominant components possess large variation (more
variables. The procedure is equivalent to solving two
important) and can be selected to reduce outcome
eigenvalue decompositions, one for  XY  −1 −1
YY  YX  XX
dimension and achieve a succinct data summary and −1 −1
and the other for  YX  XX  XY  YY . Unlike PCA,
visualization. The amount of information reduction is
canonical correlations are invariant to changes in
inversely related to total variation explained by the
location and scale, and thus analysis of correlation
selected components.
between X and Y leads to the same canonical corre-
Mathematically, PCA is related to the eigenvalue
lations as the analysis of covariance between X and
decomposition of the covariance or correlation matrix
Y. When the Gaussian assumption is met, the four
of the outcome variables, for example,  =  ,
multivariate tests for the general linear multivariate
with diagonal  of ordered eigenvalues (largest to
model (discussed later in the article) are all simple
smallest) and orthonormal  for  =   = I p
(identity matrix of size p). The eigenvectors (columns functions of the squared canonical correlations.
of ) give the coefficients of the linear combinations The explosion of number of variables with mod-
used to define uncorrelated principal components and ern high-throughput analytical techniques highlights
compute the factor scores. The factor scores matrix the need for data compression and information extrac-
Y results from projecting observations of N subjects tion. Recent developments of PCA center on imposing
onto the principal components. The orthonormal the assumption of sparseness to give simple popula-
projection leads to independent principal components tion covariance structure.7 The methodology allows
with covariance matrix the eigenvalue matrix  (i.e., restricted variations of PCA to succeed, but cannot
V [rowi (Y)] = ). Dimension reduction is achieved be generalized to handle general covariance structure.
by considering the decomposition  = 1 1 1 + Jung and Marron8 studied the asymptotic behavior of

  the principal component directions and listed sets of
2 2 2 where  = 1 2 and  = 2k=1 k ,
sufficient conditions for consistency, strong inconsis-
and using a reduced factor scores matrix Y1 (with
tency, and subspace consistency. Success of PCA with
covariance matrix 1 ) to interpret and investigate the
HDLSS data requires nearly all variation to occur in
data structure. Here is the direct sum operator
population components far fewer in number than the
that creates a block diagonal matrix from 1 and
2 . To facilitate the interpretation, a rotation of the number of subjects. Parkhomenko et al.9 presented
components can be used to create sparse coefficients. sparse canonical correlation analysis with solutions
Abdi and Williams5 reviewed the two main types based on small subsets of variables.
of rotations: orthogonal and oblique rotations in a
detailed overview of PCA.
EXPLORATORY FACTOR ANALYSIS
Extensions of PCA include correspondence anal-
ysis and canonical correlation analysis. Correspon- Factor analysis can be either confirmatory or
dence analysis is useful for analyzing multivariate exploratory depending on the availability of a priori
categorical variables by providing factor scores for knowledge of the factor structure. The structural
both the rows and columns of the contingency table.6 equation model discussed earlier in the article
Canonical correlation analysis was developed to iden- can serve as the confirmatory method for testing
tify canonical variables that are linear combinations the hypothesized factor model. EFA, on the other
of two sets of variables and have maximum corre- hand, allows one to identify and characterize latent
lation (canonical correlation) with each other. Pairs factors (constructs) that underlie or attribute to the
of canonical variables are uncorrelated, and can be relationships of the observed variables. The procedure
used to summarize the relationship between the two was developed in the early 1900s to understand

Volume 4, January/February 2012  2011 Wiley Periodicals, Inc. 37


Overview wires.wiley.com/compstats

the causal relationship between the latent traits of the subset that lead to maximum separation among
of human intelligence and test scores obtained in the groups or populations. The identified variables
several domains. It was believed that the relationships or their functions can be used to develop a rule to
of the test scores can be fully explained by one classify future observations. For instance, a medical
single common latent intelligence factor, and that researcher may be interested in determining features
if this factor was removed, the test scores would that significantly differentiate patients who have had a
be uncorrelated. The model was later generalized to heart attack from those who have not yet had a heart
multiple factors. EFA can be viewed as a dimension attack, and using the identified features to predict
reduction tool as the number of factors typically is whether a patient is likely to have a heart attack in
much smaller than the number of variables. the future. Linear discriminant analysis finds linear
Like most of the exploratory multivariate combinations of variables which best separate the
methods, EFA models the covariance structure of groups.
the data. Contrasting to PCA which constructs new For a two-group linear discriminant analysis
variables as linear combinations of the original with two multivariate Gaussian populations of com-
variables, EFA assumes each observed variable is a mon covariance matrix, namely independent y1i ∼
linear combinations of the latent factors, namely for Np (µ1 , ) and y2j ∼ Np (µ2 , ) for i ∈ {1, . . . , N1 }
subject i ∈ {1, . . . , N} and j ∈ {1, . . . , N2 }, Fisher’s linear discriminant func-
tion has the discriminant weights (coefficients) vec-
 
yi =  yf i + ei , tor as =  −1 y1 − y2 applied linearly to original
variables. Here yk is the sample mean vector for
(p × 1) (p × m)(m × 1) (p × 1),  is the pooled covari-
group k (k = 1, 2), and 
ance matrix. Realizations of the discriminant func-
with  y a matrix of weights, f i a vector of
tion comprise discriminant scores for all subjects,
random, unobserved latent factors, and ei a vector
that is, as y1i and as y2j . Difference in the mean dis-
of random errors. The  assumption of independent criminant scores, which is exactly the Mahalanobis
f i and ei with V f i =  f and V (ei ) =  e results   −1  
statistic y1 − y2   y1 − y2 , achieves maximum
in a structured covariance matrix of yi , that is,
separation between the two groups. Generalizing the
V yi =  y  f  y +  e . The model then decomposes
two-group Fisher’s procedure to G groups requires
the covariance of yi into the portion that can be
quantitatively defining the separation of the G group
attributed to the common factors  y  f  y , and the
mean vectors. Under the MANOVA framework,
portion that cannot be accounted for by the common
separation of the G group mean vectors can be
factor  e . The communality or common variance is
determined by the ratio of the between group vari-
given by the diagonal elements of  y  f  y , while
ation to the within group variation. Adopting the
the uniqueness or specific variance is given by the
notation given in Table 1, a solution to the eigenequa-
diagonal elements of  e . The diagonal matrix  e
tion |Sh − λSe | =  0 yields  eigenvectors am for m ∈
indicates that errors {ei } are uncorrelated given
{1, . . . , s = min G − 1, p } that maximize the ratio of
the latent factors, and leads to the interpretation
the between group variation to the  within group vari-

that the inter-relationships between the p outcome
ation by maximizing the ratios am Sh am / am Se am .
variables are completely explained by the m latent
For the unique solution, standardized eigenvectors
factors. With   f = I m , the model reduces further are used as discriminant weights for defining discrimi-
such that V yi =  y  y +  e . Common approaches
nant functions. The maximum number of discriminant
for parameter estimation include the least square
functions, known as the rank or dimensionality of the
principle, which minimizes the sum of squared
separation, is given by s, leading to the determination
differences between elements of the population and
of one discriminant function for a two-group problem.
sample covariance matrices, and the ML principle
For any observation, discriminant score(s) can be
which assumes a Gaussian distribution for yi . Lawley
used for classification. If the mean discriminant score
and Maxwell10 gave a comprehensive review of these
is higher for group 1 (i.e., as y1 > as y2 ) in a two-group
methods.
comparison, the Fisher’s classification  rule assigns
 an
observation y∗ to group 1 if as y∗ > as y1 + y2 /2, and
to group 2 otherwise. Fisher’s classification assumes
DISCRIMINANT ANALYSIS common covariance of the two groups, but does not
When a set of associated variables are collected for two require normality. The rule is optimal for minimizing
or more groups or populations, discriminant analysis the total probability of misclassification when sample
allows identifying a subset of the variables or functions size is large. Cost of misclassification may be taken

38  2011 Wiley Periodicals, Inc. Volume 4, January/February 2012


WIREs Computational Statistics Multivariate methods

TABLE 1 Parameters and Constants for General Linear Multivariate Model Y = XB + E and Associated General Linear Hypothesis H0 :  = 0
Symbol Size Definition Least squares estimator
N 1×1 Sample size
p 1×1 Number of outcome variables
q 1×1 Number of predictors
νe 1×1 N −rank(X), error degrees of freedom
a 1×1 Number of between-subject contrasts
b 1×1 Number of within-subject contrasts
C a×q Between-subject contrast matrix
U p×b Within-subject contrast matrix
M a×a C(X  X)− C , middle matrix
B q×p Primary parameters 
B = (X  X) − X  Y
  
 p×p Error covariance matrix  = Y  I N − X X  X − X  Y/νe

 a×b CBU, Secondary parameters  = C
 BU ∼ Na,b (CBU, M,  ∗ )
∗ b ×b 
U U, hypothesis error covariance ∗ = U  U,
  νe 
∗ ∼ Wb (νe ,  ∗ )
 
 1 ×1 tr2 ( ∗ )/ b tr( 2∗ ) , sphericity parameter
b ×b Hypothesis sum of squares matrix  − 0 ) M−1 (
Sh = (  − 0 ) ∼ Wb (a ,  ∗ , ) ,
 = ( − 0 ) M−1 ( − 0 )
b ×b Error sum of squares matrix ∗ ∼ Wb (νe ,  ∗ )
Se = νe 

into account by considering the expected cost of the singular or near singular sample covariance
misclassification as the objective function. Evaluation matrix by shrinking its eigenvalues.14 The sparse
of classification rules can be made by computing linear discriminant analysis applies the method of
the observed error rate (proportion of incorrectly regularization to obtain sparse discriminant functions
classified subjects). To avoid underestimation of the with only a small number of nonzero components.15
error rate, the split-sample method is commonly
employed by taking one part of the sample (training CLUSTER ANALYSIS
sample) to derive a classification rule and the
other part of the sample (validation sample) to Cluster analysis is a tool for classifying subjects
compute the error rate and evaluate the rule. The into groups in a manner that subjects within a
jackknife leave-one-out approach (special case of group are homogeneous or similar while subjects in
cross-validation) provides an alternative solution different groups are heterogeneous or not similar.
The similarity or dissimilarity between a pair of
by iteratively omitting one observation from the
subjects is determined by a proximity measure of
development of a classification rule and summarizing
the multivariate outcomes. For continuous variables,
the misclassification of observations incorrectly
a common measure of similarity is the Pearson
classified by the respective rules developed without
product–moment correlation coefficient, and the
the holdout observation. most common dissimilarity measure is the Euclidean
Huberty11 provided a book-length review of distance. The choice of the proximity measure
applied discriminant analysis. Recent advances in depends on the objective of the study and the
discriminant analysis center on methods for high- type of measurement scale(s) used in observing the
dimensional data. The dimension reduction approach multivariate outcomes. For many applications of
includes the partial least squares regression that cluster analysis, an N × N proximity matrix of the
utilizes a reduced set of latent components, obtained N subjects, rather than the data matrix Y, is the
from maximizing the correlation between the outcome starting point of the calculations. Algorithms designed
and predictors, in place of the original, high- to perform cluster analysis can be divided into two
dimensional predictor variables.12 Partial least squares broad classes called hierarchical and nonhierarchical
regression and its variations are commonly used clustering methods.
in metabolomics and chemometrics.13 Regularization Hierarchical clustering can either begin with N
provides an alternative approach seeking to stabilize clusters, each containing one subject, and iteratively

Volume 4, January/February 2012  2011 Wiley Periodicals, Inc. 39


Overview wires.wiley.com/compstats

combine clusters until all subjects belong to one Classical assumptions adopted from the uni-
single cluster (bottom-up or agglomerative process), variate model include 1) homogeneous covariance
or begin with one cluster and successively split them structure given predictors (i.e., V [rowi (Y) |X] = ,
until N clusters is formed (top-down or divisive i ∈ {1, . . . , N}), 2) independence between outcomes
process). Both agglomerative and divisive methods from different subjects (rows of Y), 3) linearity
generate a tree diagram or dendogram to document between outcomes and predictors (i.e., rowi (Y) is
the process. Variations of the agglomerative method linearly related to rowi (X), i ∈ {1, . . . , N}), and 4)
can be derived by changing the criterion used to existence of a finite covariance matrix . Additional
combine subjects in different clusters. Examples are assumptions unique to the multivariate model are 1)
the nearest neighbor (single linkage) method of using no missing data, and 2) each outcome variable (col-
the minimum dissimilarity to combine clusters, and the umn of Y) is measured in a consistent way (i.e., no
farthest neighbor (complete linkage) method of using appreciable mistiming allowed). These assumptions
the maximum dissimilarity to combine clusters. The guarantee least squares estimations of both the pri-
divisive method is conceptually and computationally mary parameters B and the nuisance parameters 
more complex than the agglomerative method as it (Table 1).
allows clusters to be formed while information about The Gaussian assumption states that the rows
all subjects is taken into account at the first step. of E are identically and independently distributed,
Unlike hierarchical clustering, nonhierarchical namely rowi (E) ∼ Np (0, ). Equivalently E follows
k-means clustering starts with a predetermined a matrix Gaussian distribution, E ∼ NN,p (0, I N , ),
number of clusters and set of centroids or seeds.
as defined by Muller and Stewart17 (Chapter 8). The
Each subject is assigned to its nearest centroid
assumption facilitates distributional derivations for
before a new set of centroids based on the new
hypothesis testing. With full rank , the maximum
allocation is formed. The process continues until there
likelihood (ML) estimator for B is equivalent to its
is no reallocation of subjects or until reassignment
least squares estimator, and the ML estimator for 
meets some convergence criterion. In practice, the 
is νe /N.
hierarchical and nonhierarchical methods may be
Secondary (mean) parameters can be defined
combined to facilitate the identification of clusters.
One may used a hierarchical procedure to identify the as  = CBU, with both C and U contrast matrices
number of clusters and centroids, which can then be of known constants. The C matrix defines contrasts
input into the nonhierarchical procedure to refine the between groups or levels of predictors, and implicitly
cluster solution. Comprehensive discussions of cluster leads to computing linear combinations of columns
analysis can be found in the book written by Everitt.16 of X, the predictor variables. The U matrix defines
contrasts within subjects, and implicitly leads to
computing linear combinations of columns of Y,
the outcome variables. An estimable  and − testable
 
GENERAL LINEAR MULTIVARIATE
requires rank(C) = a ≤ q, C = C X  X X  X , and
MODEL rank(U) = b ≤ p. Under these regularity conditions,
The general linear multivariate model is defined by  is an unbiased estimator of  and follows a matrix

Gaussian distribution as detailed in Table 1. The
multivariate quadratic form of  ∗ leads to a central
 Y  = XB
  +  E, 
N×p N×q q×p N×p , Wishart distribution, the multivariate extension of
a central chi-square distribution.
with rows of the outcome matrix Y, error matrix The multivariate general linear (null) hypothesis
E, and design matrix X corresponding to subjects. regarding the secondary parameters may be stated
Columns of Y and E correspond to repeated measures H0 :  = 0 . The hypothesis and error sum of
or multivariate outcomes (continuous), columns squares matrices lie in the heart of testing the general
of X correspond to fixed predictors (discrete or linear hypothesis. By the definitions given in Table 1,
continuous), and B denotes the matrix of primary both matrices have multivariate quadratic forms and
(mean) parameters. Table 1 summarizes the definitions follow a Wishart distribution with the shared scale
of the parameters and constants for the model. parameters  ∗ . The error sum of squares matrix has
Detailed discussions about estimation and hypothesis a central Wishart form regardless of the underlying
testing of the general linear multivariate model can be hypothesis (null or alternative). The hypothesis sum of
found in the study of Muller and Stewart17 (Chapters squares matrix has a central Wishart form only under
3, 12, 16) and Timm2 (Chapters 3 and 4). the null hypothesis.

40  2011 Wiley Periodicals, Inc. Volume 4, January/February 2012


WIREs Computational Statistics Multivariate methods

Tests for general linear hypothesis have been when  = 1 (sphericity holds). The Geisser-Greenhouse
derived under different principles. Table 2 summarizes adjustment of the degrees of freedom uses the ML
the four multivariate approaches (HLT, PBT, WLK, estimator of , while the Huynh-Feldt adjustment
RLR) and the UNIREP method. All five statistics uses unbiased estimators of the numerator and
are functions of the hypothesis and error sum of denominator of . Both methods give tests that have
squares matrices. The four multivariate tests are approximate control of the Type I error rate. When
unbiased for their control of the target type I error the number of within-subject contrasts is one (b = 1),
rate (α) when the null hypothesis is true; however, the UNIREP and four multivariate tests become
none of them is uniformly most powerful among the equivalent and provide exactly the same p-value.
other three tests. The statistical powers vary with Applications for MANOVA, MANCOVA, and
the pattern of noncentrality parameters in , which repeated measures analyses belong to special cases for
implies one test may be preferred (more powerful) in testing general linear hypothesis with the general linear
some settings. All four multivariate tests are invariant multivariate model. The MANOVA analysis allows
to full rank linear transformations (excluding location the overall comparison of group means and always
shifts) of the original variables, and thus are suitable uses U = I b . The MANCOVA analysis evaluates
for analyzing multivariate outcomes measured with the overall group mean differences after adjusting
different metrics. for the effect of covariates, and like MANOVA,
Under the null hypothesis for one- and two- always assumes U = I b . The distinction between the
group comparisons, each of the four multivariate MANOVA and MANCOVA methods lies in the
statistics in Table 2 can be expressed exactly as a one- selection of the design matrix, and consequentially, the
to-one function of each other, and of an F random specification of the between-subject contrast matrix C.
variable with numerator degrees of freedom one and The analysis of repeated measures emphasizes more
denominator degrees of freedom νe . Furthermore, on the trend of changes across repeated measures and
the four tests are of exact size α and uniformly less on the overall differences. Thus, for this analysis,
most powerful among the class of unbiased and the choice of the within-subject contrast matrix U
scale-invariant tests. If the number of groups in the consists of trends that characterize changes over time
comparison is greater than two, then the statistics or space.
are not one-to-one functions of each other, and the
exact distributions are known only for special cases.
Approximations matching the first two moments lead MULTIVARIATE ANALYSIS
to using an F random variable for the HLT, PBT and OF VARIANCE
WLK statistics (Ref 17, Chapter 16).
The UNIREP approach stems from the null As its name suggests, MANOVA method extends the
approximation that the statistic [tr(Sh )/a]/ [tr(Se )/νe ] univariate analysis of variance18 to allow comparison
follows an F distribution with numerator degrees of of means of several variables among two or
freedom ab and denominator degrees of freedom more groups of independent and vector Gaussian
bνe . The parameter , defined in Table 1, quantifies observations with a common covariance structure.
the spread of population eigenvalues of  ∗ , with The full rank formulation gives
maximum sphericity requiring all eigenvalues be
equal and  = 1, and minimal sphericity requiring yjg = µg + ejg ,
only one nonzero eigenvalue and  = 1/b. The F
approximation with numerator degrees of freedom with µg the p × 1 mean vector of group g ∈
ab and denominator degrees of freedom bνe gives {1, . . . , G}, and yjg and ejg the outcome and error
an unbiased and uniformly most powerful test only vector for subject j ∈ {1, . . . , Ng } in group g,

TABLE 2 The Definitions of the General Linear Hypothesis Tests


Test Principle Statistic
Hotelling–Lawley (HLT) ANOVA analog tr(Sh S−1 )
 e
Pillai–Bartlett (PBT) Substitution law tr Sh (Sh + Se )−1
Wilks Likelihood (WLK) Likelihood ratio |Se (Sh + Se )−1 |
Roy’s Largest Root (RLR) Union–intersection Largest eigenvalue of Sh (Sh + Se )−1
   
UNIREP Box’s F approximation tr(Sh )/a / tr(Se )/νe

Volume 4, January/February 2012  2011 Wiley Periodicals, Inc. 41


Overview wires.wiley.com/compstats

 
respectively. The null hypothesis of no overall group parameters FR = µ1 − µ2 · · · µ1 − µG , and
difference leads to H0 : µ1 = · · · = µG for the full the null matrix 0 = 0.
rank model. The classic less than full rank formulation The less than full rank model, in contrast,
overparameterizes the model as includes intercept and G indicator variables as
predictors to have
yjg = µ + δ g + ejg .
 
1N1 1N1 0 0 0
With the sum-to-zero constraint on {δ g } (i.e.,  1N 0 1N2 0 0 

G  2 
X LTFR = . ,
g=1 δ g = 0), µ indicates the population mean of  .. ..

0 0 . 0 
balanced groups (i.e., µ = G g=1 µg /G), and δ g is the 1NG 0 0 0 1NG
deviation from the population mean for group g. The  
null hypothesis states H0 : δ 1 = · · · = δ G = 0 for the µ
 δ 1 
less than full rank model. The Gaussian assumption  
 
in combination with the least squares assumptions BLTFR =  δ2  .
leads to identically and independently distributed  .. 
 . 
error vectors, namely ejg ∼ Np (0, ). δ G
Current computational methods for MANOVA
are based on the fact that MANOVA can be expressed
The general linear hypothesis for H0 : δ 1 = · · · =
and interpreted as a general linear multivariate model.
δ G = 0 can be tested with a = G between-subject
This is accomplished by defining one indicator variable
contrasts for CLTFR = 0 I G , b = p within-subject
for each group. The full rank model includes G
contrasts  for U = I b, the secondary parameters
indicator variables as predictors to have 
LTFR = δ 1 · · · δ G , and the null matrix 0 = 0.
  By framing MANOVA hypotheses as the general
y11
 .  linear hypotheses, the multivariate tests listed in Table
 ..  2 can be utilized to compute p-values and assess
 
 y  statistical significance. The fact that U = I b implies
 N1 1 
 y    that MANOVA compares means across all outcome
 12  1N1 0 0 0
 .  variables. The tests are sensitive to a collection of
 ..   0 1N2 0 0 
    small differences, but have no capacity for identifying
Y =   , X FR = .. ,
 yN2 2   0 0 . 0  individual variables that contribute to the overall
 . 
 .  0 0 0 1NG difference.
 . 
  
 y1G 
 . 
 . 
 . 
yNG G
MULTIVARIATE ANALYSIS OF
  COVARIANCE
e11
 .  The MANCOVA model allows comparisons of
 ..  group mean vectors in the presence of covariates.
 
 e  A simple example would be comparing improvements
 N1 1 
   e  in quality of life measures (appetite, mood, coping,
µ1  12 
 .  and physical-welling) among therapeutic regimens
 µ   .. 
 2   after adjusting for the effect of age. The general
BFR =  . , E =   .
 ..   eN2 2  linear multivariate model for the MANCOVA design
 . 
µG  .  extends the MANOVA design by adding covariates
 . 
   as additional predictors in the design matrix X. The
 e1G 
 .  full rank MANCOVA
 .    model leads
 to considering
 .  X = X FR Z and B = BFR   , with columns of
eNG G Z covariates needed to be accounted for and  the
parameters that capture the effects of covariates on
The general linear hypothesis for H0 : µ1 = · · · = µG the mean outcome vectors. The secondary parameters
can be tested with a = G −  1 (G > 1) between-
  may be thought as nuisance parameters of no
subject contrasts for CFR = 1G−1 −I G−1 , b = p intrinsic interest. Similarly the less than full rank
within-subject contrasts for U = I b , the secondary MANCOVA model overparameterizes by setting

42  2011 Wiley Periodicals, Inc. Volume 4, January/February 2012


WIREs Computational Statistics Multivariate methods

   
X = X LTFR Z and B = BLTFR   . The between- and perform hypothesis testing on each subspace
subject contrast matrix is C = CFR 0  for the separately. The one-dimensional subspace has the
full rank formulation and C = CLTFR 0 for the within-subject contrast matrix U the vector 1p /p
less than full rank formulation. The within-subject for testing the average
 outcome across repeated
contrast matrix remains U = I b for the overall measurements. The p − 1 -dimensional subspace can
comparison. be defined by the pairwise differences  between

The MANCOVA design implicitly assumes the repeated measures (i.e., U = 1p−1 −I p−1 ) or the
effects of covariates are the same across the groups orthonormal polynomial trends (i.e., columns of
of comparison. This assumption can be evaluated by U corresponding to the linear  trend,
 quadratic
testing the significance of the interactions between trend, cubic trend, etc). These p − 1 within-subject
the covariates and indicator variables that define contrasts allow an overall assessment of the trend over
the groups. The interaction term can be examined repeated measures.
by expanding a MANCOVA design matrix with The multivariate and UNIREP methods listed in
additional columns from X FR  Z. Here  denotes Table 2 can be used for testing any repeated measures
the operator of horizontal direct product that create a hypothesis if no missing or mistimed data are present
new matrix by elementwise multiplication of pairs (i.e., balanced within-subject design). Non-iterative
of columns from X FR and Z. The corresponding approximation methods or mixed models can be used
between-subject contrasts identify individual columns when there are missing data. The UNIREP method
of X FR  Z and allow simultaneously testing the with no correction for degrees of freedom (i.e., F
equivalence of the effect of each covariate on each approximation with degrees of freedom ab and bνe )
outcome variable (i.e., U = Ib ) across the G groups. has exact control of the Type I error rate and is
uniformly most powerful when repeated measures
are exchangeable (i.e., with equal variance and equal
Repeated Measures Analysis correlation between any pair).
Repeated measures design is used when there
are multiple observations of the subject on a
single outcome variable at several points in time Generalizations
or space. Observations from any single subject One of the limitations of the general linear
may be correlated. As an example, in agricultural multivariate model is that there must be a common
experiments, the split plot design creates exchangeable design matrix for all outcome variables. The seemingly
subplots by splitting a field (a plot of land) and unrelated model (SUM) overcomes this limitation
randomly assigns one treatment of interest to each by allowing a distinct set of predictors for each
subplot. Fields are the independent sampling units variable. Much of the work on SUM was motivated
(subjects), and observations from subplots of a field by econometric applications and studies with time-
give repeated measures which have equal correlation varying predictors. The model expands the design
with each other. In biomedical and behavioral matrix and primary parameters by defining
research, repeated measurements are commonly
observed over time, which undermine exchangeability  
because the order of the measurements in time cannot β1 0 0 0
 
 0 β2 0 0
be inter-changed. Correlations between observations 
Y = X1 X2 · · · Xp  ..  + E,
further apart in time are likely to be weaker than 0 0 . 0
observations made closer in time. 0 0 0 βp
The general linear multivariate model takes
into account any correlation structure inherent to
repeated measures design. The coding scheme and  
with X k n × qk the design matrix for the kth
setup for between-subject comparisons with and
outcome and β k qk × 1 the corresponding primary
without adjusting for covariates follow the designs of
parameters. By stacking Y columnwise, the model
MANCOVA and MANOVA, respectively. The overall
can be fit using the ML or generalized least squares
within-subject comparison, as employed in MANOVA
estimation for β = β 1 β 2 · · · βp . The stacking

and MANCOVA, is typically of little interest p
especially when measurements are taken repeatedly model has y =vec(Y) with mean k=1
X k β and
over time. Common approaches to repeated measures covariance  ⊗ I N . Here vec(·) is the operator that
analysis decompose  the p-dimensional outcome stacks the columns of Y. Srivastava and Giles19
space into one- and p − 1 -dimensional subspaces, provided a book-length treatment of this model.

Volume 4, January/February 2012  2011 Wiley Periodicals, Inc. 43


Overview wires.wiley.com/compstats

The GCM generalizes the general linear LINEAR MIXED MODEL


multivariate model for investigating functional forms
The linear mixed model relaxes the requirement for
of repeated measures that are observed in a monotone
balanced within-subject designs to allow analyzing
order. Applications include the characterization of
missing and mistimed data. The need to accommodate
children’s heights as a function of age (when repeated
missing and mistimed data arose in large part from
measures are taken at different ages) and the
the growth of biomedical and behavioral research.
analysis of dose–response relationship (when repeated
Repeated measures designs, common in biomedical
measures are taken at different dose levels). The
and behavioral research, allow multiple observations
model assumes that the mean outcome vector is
from the same subject and often lead to missing
a linear function of the between-subject predictors
observations or observations measured beyond the
defined in the design matrix X and the within-
planned schedule. Cross-sectional designs at one single
subject predictors defined in the design matrix T,
time point or space may also result in missing
namely
data when high-capacity bioassays (e.g., microarrays)
or multidimensional behavioral assessments are
Y = XBT+ E, used.
     
N×p N×q q×m m×p N×p . The power of the linear mixed model stems
from allowing subject-specific modeling, namely for
  each subject i ∈ {1, . . . , N},
The full GCM has a full rank T m = p , whose rows
are the 0th to the (p − 1)th order polynomials of the
yi = X iβ + Z i bi + ei ,
repeated measures index (e.g., age for the growth data; (ni × 1) (ni × q)(q × 1) (ni × r) (r × 1) (ni × 1)
dose for the dose–response data). The full model can
be fit and tested using methods for a general linear with yi the outcome vector of the ith subject. Elements
multivariate model for YT −1 . The reduced model of yi are repeated measures or multivariate outcomes,
contains only some low order polynomials for the and the number of elements can vary subjects
trend, and can be fitted and tested using methods to accommodate missing observations. Predictors
for a restricted general linear multivariate model for (discrete or continuous) can have fixed effects
Y or methods for a linear mixed model for vec(Y). (characterized in β) on the outcome means (first
Details can be found in the work of Kshirsagar and moment) or random effects (characterized in bi ) that
Smith.20 lead to modeling the covariance (second moment) of
the multivariate outcomes. The fixed effects predictors
specified in the design matrix X i typically encompass
Advances in HDLSS Designs the random effects predictors specified in the design
All multivariate statistics in Table 2 become undefined matrix Zi . The consideration of both fixed and
when νe < b, the condition that occurs when the random effects (hence the name mixed model) permits
number of outcome variables is greater than the simultaneous modeling of the mean and covariance
sample size. High-throughput technologies used structures, and attributes to the model’s flexibility and
in genomic, metabolomic, and proteomic research complexity.
frequently give rise to data with thousands of variables Assumptions of the linear mixed model, include
but relatively limited number of subjects. Advances to (1) independence between outcomes from different
overcome the nonexistence of the error covariance subjects, (2) independence between random effects
matrix inverse needed in MANOVA statistics include bi and errors ei , (3) bi ∼ Nr (0, ), and (4) ei ∼
using the Moore–Penrose generalized inverse21 or Np (0, ). Together
 they lead to a structured mean
regularizing the error covariance matrix to create a where E yi = X i β  and a structured covariance
full rank matrix.22 Srivastava and Du23 inverted the matrix where V yi = Zi Zi + . Exchangeability of
diagonal matrix of sample variances to compute a within-subject observations such as with the split
form like the HLT statistic. Proposals that improve plot design gives a special case when  = σ 2 I p
the conventional UNIREP method includes Srivastava (independent errors), Zi = 1ni (random intercept
and Fujikoshi24 for a modified UNIREP statistic, only) and  = τ 2 (scalar). In turn, the covariance
and Ahmad et al.25 for a specialized test for matrix has a compound symmetry structure with
high-dimensional repeated measures studies. Most equal
 2 variance
 τ 2 + σ 2 , and equal correlation τ 2 /
of the existing HDLSS tests require large sample σ + τ2 .
approximations which may not be practical with The Gaussian assumptions allow the use of
HDLSS designs. ML estimation for an unbiased estimate of the

44  2011 Wiley Periodicals, Inc. Volume 4, January/February 2012


WIREs Computational Statistics Multivariate methods

primary parameters β, and ML or restricted maximum the general linear multivariate model where predictors
likelihood (REML) estimation for the secondary are linked to the response means, the structural
covariance parameters  and . The REML estimates equation model focuses on establishing variable inter-
arise from maximizing the reduced profile log- relationships and/or latent structures through fitting
likelihood equation. Both ML and REML estimates the variances and covariances of the entire observed
typically have biases especially in small samples, data. Both response and predictor variables are
with REML estimates less biased than the ML assumed to be random and typically centered at
estimates. The estimation process requires iterative their observed means to restrict confounding from
computations, which can be computational intensive the means. The response variables are endogenous for
for large data and may be slow to converge or being predicted within the model, while the predic-
not converge at all. Cheng et al.26 recommended tor variables are exogenous for being predicted by
centering, scaling, and full rank coding of all predictor variables outside the model. A non-recursive model
variables to improve computing speed, the chances of allows backward causation, causal loops, or bidirec-
convergence, and numerical accuracy. tional paths in the model, as opposed to a recursive
Accurate hypothesis testing and confidence inter- model which considers only unidirectional causation.
vals for linear mixed model require a large sample The structural equation model encompasses
approximation. Among the widely available methods, three processes: (1) path analysis for structural
the Kenward–Roger approximation for the degrees models of observed variables such as examining
of freedom of the Wald statistic of the REML esti- the inter-relationship between genes expressed under
mates appears to provide the most accurate control an experimental condition, (2) confirmatory factor
of the Type I error rate. However, as Muller et al.27 analysis for a priori measurement models such as
pointed out, the inflation of the Type I error rate testing the latent factor underlying the four indicators
can be substantial in small samples even with cor- of quality of life, and (3) a synthesis of path and
rectly specified covariance model. Alternately, if the confirmatory factor analysis. The structural model for
assumptions of the general linear multivariate model path analysis has
are satisfied (i.e., no missing and mistimed data and
no time-varying predictors), then MANOVA, MAN- yi = yi + Bxi +
i,
COVA, and UNIREP tests can be used. Even in small          
p×1 p×p p×1 p×q q×1 p×1 ,
samples, an appropriate MANOVA, MANCOVA, or
UNIREP test always controls the Type I error rate and for each independent subject i ∈ {1, . . . , N}, no missing
has a good power approximation, in sharp contrast data, and  a matrix with zeros on the diagonal. The
to mixed model tests. specification allows modeling any response variable
The flexibility of linear mixed model permits as a linear function of predictors and other response
analysis of HDLSS high-throughput data with existing variables. By assuming xi ∼ Nq (0,  x ),
i ∼ Np (0,   )
methods. However, small-sample limitations of mixed and independence between xi and
i , the covariance
model tests extend to HDLSS applications to matrix of the model is given by

   −1   −t  −1 


y Ip −  B x B +   I p −  Ip −  B x
V i =   −t .
xi  x B I p −  x

discourage considering the mixed model when sample The measurement models for confirmatory
discourage considering the mixed model when sample factor analysis
 relate the unobserved latent constructs
 
size is small. Furthermore, little is known about the (variables) ηi , δ i to observed variables yi , xi
performance of mixed model tests as the number of through
variables substantially outgrows the sample size.
yi =  y ηi + ei ,
(p × 1) (p × m)(m × 1) (p × 1),
STRUCTURAL EQUATION MODEL
The structural equation model provides a statistical and
solution for estimating and testing structural relations
between response and predictor variables, and is useful xi =  xδi + ζ i,
for causal inference and analysis of networks. Unlike (q × 1) (q × k)(k × 1) (q × 1).

Volume 4, January/February 2012  2011 Wiley Periodicals, Inc. 45


Overview wires.wiley.com/compstats

To ensure identifiability, the dimensions of the latent CONCLUSIONS


constructs have to be no more than the numbers of
The inter-relationship between observations of a sub-
observed variables, namely m ≤ p and k ≤ q. A syn-
ject gives the hallmark of multivariate data. In many
thesis of path and confirmatory factor analysis permits
applications, covariance structure can have more effect
relating endogenous latent constructs ηi to exogenous
on the validity and quality of a statistical analysis than
latent constructs δ i in addition to the measurement
any other data features. For exploratory as well as con-
models. Specifically,
firmatory data analysis, multivariate methods either
model the covariance structure as primary parameters
ηi =  η ηi + Bδ i +
i,
or take into account the structure as nuisance param-
(m × 1) (m × m)(m × 1) (m × k)(k × 1) (m × 1). eters. Dimension reduction methods such as PCA,
  EFA, and canonical correlation share similar goals
Assuming that
i ∼ Nm (0,  ), δ i ∼ Nk 0,  η , ei ∼ but diverge in their approaches to define and identify
Np (0,  e ), and ζ i ∼ Nq 0,  ζ are independent leads the new dimensions (principal components for PCA,
to the covariance matrix

    −1   −t   −1 


yi y Im − η B η B +   I m −  η  y + e y Im − η B η  x
V =   −t .
xi  x  η B I m −  η  y  x  η  x +  ζ

Parameter estimation for the structural equation latent constructs for EFA, and canonical variables for
model relies on minimizing differences between canonical correlation). Classification methods such as
observed covariance (correlation) matrix of all vari- discriminant analysis (supervised) and cluster analy-
ables and the expected covariance (correlation) matrix sis (unsupervised) differ with respect to the presence
of the model. Graphical tools are typically used to rep- of a leading univariate outcome. While benefiting
resent the structural and measurement models, with from its flexibility of allowing missing, mistimed data
squares or rectangles representing observed variables, and time-varying predictors, mixed models rely heav-
circles or ellipses representing latent variables, and ily on large samples to achieve unbiased hypothesis
arrows indicating directional relationships. Pearl28 testing. Alternately, assuming a general linear multi-
gave a book-length treatment of graphical mod- variate model whenever applicable permits the use of
els. Recent advances for analyzing high-dimensional MANOVA, MANCOVA, and UNIREP tests which
graphs including using the LASSO algorithm29 and result in good control of the Type I error rate even in
joint estimation.30 small samples.

ACKNOWLEDGMENTS
This work was supported by a UF CTSI core grant (NCRR U54RR025208 and UL1RR029890), NINDS
R21-NS065098, and NIDDK R01-DK072398-05.

REFERENCES
1. Mardia KV, Kent JT, Bibby JM. Multivariate Analysis. 6. Greenacre MJ. Correspondence analysis. Wiley Inter-
New York: Academic Press; 1979. discip Rev: Comp Stat 2010, 2:613–619. doi:10.1002/
wics.114.
2. Timm NH. Applied Multivariate Analysis. New York:
Springer; 2002. 7. Zou H, Hastie T, Tibshirani R. Sparse principal com-
ponent analysis. J Comp Graph Stat 2006, 15:
3. Anderson TW. An Introduction to Multivariate Statis- 265–286.
tical Analysis. New York: John Wiley & Sons; 2003.
8. Jung S, Marron JS. PCA consistency in high dimen-
4. Schott JR. Matrix Analysis for Statistics. New York: sion, low sample size context. Ann Stat 2009, 37:
John Wiley & Sons; 1997. 4104–4130.
5. Abdi H, Williams LJ. Principal component analysis. 9. Parkhomenko E, Tritchler D, Beyene J. Sparse canoni-
WIREs Comp Stat 2010, 2:433–459. cal correlation analysis with application to genomics

46  2011 Wiley Periodicals, Inc. Volume 4, January/February 2012


WIREs Computational Statistics Multivariate methods

data integration. Stat Appl Genet Mol Biol 2009, 20. Kshirsagar AM, Smith WB. Growth Curves. New York:
8:1–34. Marcel Dekker; 1995.
10. Lawley DN, Maxwell AF. Factor Analysis as a Statisti- 21. Srivastava MS. Multivariate theory for analyzing high
cal Model. London: Butterworth; 1971. dimensional data. J Jpn Stat Soc 2007, 37:53–86.
11. Huberty CJ. Applied Discriminant Analysis. New York: 22. Warton DI. Penalized normal likelihood and ridge regu-
John Wiley & Sons; 1994. larization of correlation and covariance matrices. J Am
12. Garthwaite PH. An interpretation of partial least Stat Assoc 2008, 103:340–349.
squares. J Am Stat Assoc 1994, 89:122–127. 23. Srivastava MS, Du M. A test for the mean vector with
13. Fonville JM, Richards SE, Barton RH, Boulange CL, fewer observations than the dimension. J Multivariate
Ebbels TMD, Nicholson JK, Holmes E, Dumas MC. Anal 2008, 99:386–402.
The evolution of partial least squares models and 24. Srivastava MS, Fujikoshi Y. Multivariate analysis of
related chemometric approaches in metabonomics and variance with fewer observations than the dimension.
metabolic phenotying. J Chemomet 2010, 24: 636–649. J Multivariate Anal 2006, 97:1927–1940.
14. Guo Y, Hastie T, Tibshirani R. Regularized discrimi- 25. Ahmad MR, Werner C, Brunner E. Analysis of high-
nant analysis and its application in microarrays. Bio- dimensional repeated measures designs: the one sample
statistics 2007, 8:86–100. case. Comp Stat Data Anal 2008, 53:416–427.
15. Qiao Z, Zhou L, Huang JZ. Sparse linear discriminant 26. Cheng J, Edwards LJ, Maldonado-Molina MM,
analysis with applications to high dimensional low sam- Komro KA, Muller KE. Real longitudinal data anal-
ple size data. IAENG Int J Appl Math 2009, 39:1. ysis for real people: building a good enough mixed
IJAM_39_1_06. model. Stat Med 2010, 29:504–520.
16. Everitt BS. Cluster Analysis. London: Edward Arnold; 27. Muller KE, Edwards LJ, Simpson SL, Taylor DJ. Statis-
1993. tical tests with accurate size and power for balanced
17. Muller KE, Stewart PW. Linear Model Theory; Uni- linear mixed models. Stat Med 2007, 26:3639–3660.
variate, Multivariate, and Mixed Models. New York: 28. Pearl J. Causality–Models, Reasoning, and Inference.
John Wiley & Sons; 2006. Cambridge, UK: Cambridge University Press; 2000.
18. Muller KE. Analysis of Variance Concepts and Com- 29. Meinshausen N, Buhlmann P. High-dimensional graphs
putations. WIREs Comp Stat 2009, 1:271–282. with the lasso. Ann Stat 2006, 34:1436–1462.
19. Srivastava VK, Giles DEA. Seeminly Unrelated Regres- 30. Guo J, Levina E, Michailidis G, Zhu J. Joint estima-
sion Equation Models: Estimation and Inference. New tion of multiple graphical models. Biometrika 2011,
York: Marcel Dekker; 1987. 98:1–15.

Volume 4, January/February 2012  2011 Wiley Periodicals, Inc. 47

You might also like