Multivariate Methods
Multivariate Methods
Multivariate methods
Yueh-Yun Chi∗
Since the early 20th century, the need for multivariate analysis has driven the
development of various types of multivariate methods for applications ranging
from social science to biomedical research. Multivariate data, as distinguished
from univariate and multivariable data, consist of more than one outcome variable
measured on a number of subjects. There exist exploratory data analysis methods
that aim to extract, summarize, and visualize empirical data information for the
purpose of formulating hypotheses as well as confirmatory data analysis methods
that allow testing specific research questions and hypotheses. Major exploratory
tools for dimension reduction and classification include principal component
analysis, canonical correlation, exploratory factor analysis, discriminant, and
cluster analysis. Advanced confirmatory modeling techniques encompass general
linear multivariate model, mixed model, and structural equation model. This article
aims to provide an overview of these methods as well as recent developments in
the area of high dimension, low sample size analysis for data from high-throughput
bioassays or high-resolution medical images. 2011 Wiley Periodicals, Inc.
blood type) while an ordinal scale provides mean- of predictors and can be used to test all sorts of
ingful ranking (e.g., pain level: minimal, moderate, general linear hypotheses defined by between- and
severe, unbearable). Both nominal and ordinal scale within-subject contrasts. Special cases for testing
data imply using specific methods for categorical data general linear hypotheses include multivariate anal-
analysis. Continuous data can be on either an interval ysis of variance (MANOVA) for comparison between
scale if all differences of the same size are equiva- groups, multivariate analysis of covariance (MAN-
lent or a ratio scale if ratios of the same size are COVA) for adjusting for covariate effect in group
equivalent. The majority of multivariate methods can comparison, and univariate approach to repeated
be applied to analyzing continuous data if additional measures (UNIREP) method for analysis of longi-
assumptions such as Gaussian distribution are valid. tudinal profiles. Generalizations of the general linear
multivariate model include the seemingly unrelated
model (SUM) which allows a distinct set of predictors
Taxonomy of Multivariate Methods
for different outcomes, and the growth curve model
Multivariate methods1–3 can be divided in two (GCM) which characterizes the functional forms of
categories: exploratory data analysis and confirmatory repeated measures (e.g., the rate of tumor cell growth).
data analysis methods (Figure 1). For exploratory Mixed model incorporates both fixed and random
data analysis, empirical information from the data effects such that mean and covariance structures of
is extracted, summarized, and visualized for an the multivariate data can be modeled simultaneously.
ultimate goal of formulating research hypotheses. Structural equation model, useful for pathway analysis
Principal component analysis (PCA) summarizes and causal inference, assumes both outcome and pre-
data dimensions by a handful of components dictor variables are random for the primary objective
that together account for a meaningful portion of of modeling their joint covariance structure.
outcome variation. Canonical correlation analysis
can be viewed as a generalization of PCA in
that information regarding the relationship between Overview of the Article
two sets of variables is summarized and reduced. The review of multivariate theory and distributions
Exploratory factor analysis (EFA) identifies latent relies heavily on matrix operations and algebra,
factors underlying the observations of the multivariate which can be found in the work of Schott4 and
outcomes. Discriminant analysis finds a linear most multivariate analysis textbooks. The rest of
combination of outcomes which best separate groups the article is devoted to giving an overview of
of subjects for the purpose of data reduction and multivariate methods listed in Figure 1. All methods
prediction. Cluster analysis forms groups of subjects apply to continuous multivariate outcomes, however
based on the proximity of multivariate outcomes. some can be generalized to analyze categorical
For confirmatory data analysis, specific research data (e.g., correspondence analysis for categorical
questions and hypotheses are posed at the outset, version of PCA). Recent advances, especially in the
and multivariate models are utilized for estimation area of high dimension, low sample size (HDLSS)
and hypothesis testing. The general linear multivari- research for genomics, metabolomics, proteomics, and
ate model assumes mean multivariate (continuous) medical imaging studies are summarized within each
outcomes as linear functions of a common, fixed set subsection if applicable.
Discriminant analysis
Principal component
Canonical correlation
the causal relationship between the latent traits of the subset that lead to maximum separation among
of human intelligence and test scores obtained in the groups or populations. The identified variables
several domains. It was believed that the relationships or their functions can be used to develop a rule to
of the test scores can be fully explained by one classify future observations. For instance, a medical
single common latent intelligence factor, and that researcher may be interested in determining features
if this factor was removed, the test scores would that significantly differentiate patients who have had a
be uncorrelated. The model was later generalized to heart attack from those who have not yet had a heart
multiple factors. EFA can be viewed as a dimension attack, and using the identified features to predict
reduction tool as the number of factors typically is whether a patient is likely to have a heart attack in
much smaller than the number of variables. the future. Linear discriminant analysis finds linear
Like most of the exploratory multivariate combinations of variables which best separate the
methods, EFA models the covariance structure of groups.
the data. Contrasting to PCA which constructs new For a two-group linear discriminant analysis
variables as linear combinations of the original with two multivariate Gaussian populations of com-
variables, EFA assumes each observed variable is a mon covariance matrix, namely independent y1i ∼
linear combinations of the latent factors, namely for Np (µ1 , ) and y2j ∼ Np (µ2 , ) for i ∈ {1, . . . , N1 }
subject i ∈ {1, . . . , N} and j ∈ {1, . . . , N2 }, Fisher’s linear discriminant func-
tion has the discriminant weights (coefficients) vec-
yi = yf i + ei , tor as = −1 y1 − y2 applied linearly to original
variables. Here yk is the sample mean vector for
(p × 1) (p × m)(m × 1) (p × 1), is the pooled covari-
group k (k = 1, 2), and
ance matrix. Realizations of the discriminant func-
with y a matrix of weights, f i a vector of
tion comprise discriminant scores for all subjects,
random, unobserved latent factors, and ei a vector
that is, as y1i and as y2j . Difference in the mean dis-
of random errors. The assumption of independent criminant scores, which is exactly the Mahalanobis
f i and ei with V f i = f and V (ei ) = e results −1
statistic y1 − y2 y1 − y2 , achieves maximum
in a structured covariance matrix of yi , that is,
separation between the two groups. Generalizing the
V yi = y f y + e . The model then decomposes
two-group Fisher’s procedure to G groups requires
the covariance of yi into the portion that can be
quantitatively defining the separation of the G group
attributed to the common factors y f y , and the
mean vectors. Under the MANOVA framework,
portion that cannot be accounted for by the common
separation of the G group mean vectors can be
factor e . The communality or common variance is
determined by the ratio of the between group vari-
given by the diagonal elements of y f y , while
ation to the within group variation. Adopting the
the uniqueness or specific variance is given by the
notation given in Table 1, a solution to the eigenequa-
diagonal elements of e . The diagonal matrix e
tion |Sh − λSe | = 0 yields eigenvectors am for m ∈
indicates that errors {ei } are uncorrelated given
{1, . . . , s = min G − 1, p } that maximize the ratio of
the latent factors, and leads to the interpretation
the between group variation to the within group vari-
that the inter-relationships between the p outcome
ation by maximizing the ratios am Sh am / am Se am .
variables are completely explained by the m latent
For the unique solution, standardized eigenvectors
factors. With f = I m , the model reduces further are used as discriminant weights for defining discrimi-
such that V yi = y y + e . Common approaches
nant functions. The maximum number of discriminant
for parameter estimation include the least square
functions, known as the rank or dimensionality of the
principle, which minimizes the sum of squared
separation, is given by s, leading to the determination
differences between elements of the population and
of one discriminant function for a two-group problem.
sample covariance matrices, and the ML principle
For any observation, discriminant score(s) can be
which assumes a Gaussian distribution for yi . Lawley
used for classification. If the mean discriminant score
and Maxwell10 gave a comprehensive review of these
is higher for group 1 (i.e., as y1 > as y2 ) in a two-group
methods.
comparison, the Fisher’s classification rule assigns
an
observation y∗ to group 1 if as y∗ > as y1 + y2 /2, and
to group 2 otherwise. Fisher’s classification assumes
DISCRIMINANT ANALYSIS common covariance of the two groups, but does not
When a set of associated variables are collected for two require normality. The rule is optimal for minimizing
or more groups or populations, discriminant analysis the total probability of misclassification when sample
allows identifying a subset of the variables or functions size is large. Cost of misclassification may be taken
TABLE 1 Parameters and Constants for General Linear Multivariate Model Y = XB + E and Associated General Linear Hypothesis H0 : = 0
Symbol Size Definition Least squares estimator
N 1×1 Sample size
p 1×1 Number of outcome variables
q 1×1 Number of predictors
νe 1×1 N −rank(X), error degrees of freedom
a 1×1 Number of between-subject contrasts
b 1×1 Number of within-subject contrasts
C a×q Between-subject contrast matrix
U p×b Within-subject contrast matrix
M a×a C(X X)− C , middle matrix
B q×p Primary parameters
B = (X X) − X Y
p×p Error covariance matrix = Y I N − X X X − X Y/νe
a×b CBU, Secondary parameters = C
BU ∼ Na,b (CBU, M, ∗ )
∗ b ×b
U U, hypothesis error covariance ∗ = U U,
νe
∗ ∼ Wb (νe , ∗ )
1 ×1 tr2 ( ∗ )/ b tr( 2∗ ) , sphericity parameter
b ×b Hypothesis sum of squares matrix − 0 ) M−1 (
Sh = ( − 0 ) ∼ Wb (a , ∗ , ) ,
= ( − 0 ) M−1 ( − 0 )
b ×b Error sum of squares matrix ∗ ∼ Wb (νe , ∗ )
Se = νe
into account by considering the expected cost of the singular or near singular sample covariance
misclassification as the objective function. Evaluation matrix by shrinking its eigenvalues.14 The sparse
of classification rules can be made by computing linear discriminant analysis applies the method of
the observed error rate (proportion of incorrectly regularization to obtain sparse discriminant functions
classified subjects). To avoid underestimation of the with only a small number of nonzero components.15
error rate, the split-sample method is commonly
employed by taking one part of the sample (training CLUSTER ANALYSIS
sample) to derive a classification rule and the
other part of the sample (validation sample) to Cluster analysis is a tool for classifying subjects
compute the error rate and evaluate the rule. The into groups in a manner that subjects within a
jackknife leave-one-out approach (special case of group are homogeneous or similar while subjects in
cross-validation) provides an alternative solution different groups are heterogeneous or not similar.
The similarity or dissimilarity between a pair of
by iteratively omitting one observation from the
subjects is determined by a proximity measure of
development of a classification rule and summarizing
the multivariate outcomes. For continuous variables,
the misclassification of observations incorrectly
a common measure of similarity is the Pearson
classified by the respective rules developed without
product–moment correlation coefficient, and the
the holdout observation. most common dissimilarity measure is the Euclidean
Huberty11 provided a book-length review of distance. The choice of the proximity measure
applied discriminant analysis. Recent advances in depends on the objective of the study and the
discriminant analysis center on methods for high- type of measurement scale(s) used in observing the
dimensional data. The dimension reduction approach multivariate outcomes. For many applications of
includes the partial least squares regression that cluster analysis, an N × N proximity matrix of the
utilizes a reduced set of latent components, obtained N subjects, rather than the data matrix Y, is the
from maximizing the correlation between the outcome starting point of the calculations. Algorithms designed
and predictors, in place of the original, high- to perform cluster analysis can be divided into two
dimensional predictor variables.12 Partial least squares broad classes called hierarchical and nonhierarchical
regression and its variations are commonly used clustering methods.
in metabolomics and chemometrics.13 Regularization Hierarchical clustering can either begin with N
provides an alternative approach seeking to stabilize clusters, each containing one subject, and iteratively
combine clusters until all subjects belong to one Classical assumptions adopted from the uni-
single cluster (bottom-up or agglomerative process), variate model include 1) homogeneous covariance
or begin with one cluster and successively split them structure given predictors (i.e., V [rowi (Y) |X] = ,
until N clusters is formed (top-down or divisive i ∈ {1, . . . , N}), 2) independence between outcomes
process). Both agglomerative and divisive methods from different subjects (rows of Y), 3) linearity
generate a tree diagram or dendogram to document between outcomes and predictors (i.e., rowi (Y) is
the process. Variations of the agglomerative method linearly related to rowi (X), i ∈ {1, . . . , N}), and 4)
can be derived by changing the criterion used to existence of a finite covariance matrix . Additional
combine subjects in different clusters. Examples are assumptions unique to the multivariate model are 1)
the nearest neighbor (single linkage) method of using no missing data, and 2) each outcome variable (col-
the minimum dissimilarity to combine clusters, and the umn of Y) is measured in a consistent way (i.e., no
farthest neighbor (complete linkage) method of using appreciable mistiming allowed). These assumptions
the maximum dissimilarity to combine clusters. The guarantee least squares estimations of both the pri-
divisive method is conceptually and computationally mary parameters B and the nuisance parameters
more complex than the agglomerative method as it (Table 1).
allows clusters to be formed while information about The Gaussian assumption states that the rows
all subjects is taken into account at the first step. of E are identically and independently distributed,
Unlike hierarchical clustering, nonhierarchical namely rowi (E) ∼ Np (0, ). Equivalently E follows
k-means clustering starts with a predetermined a matrix Gaussian distribution, E ∼ NN,p (0, I N , ),
number of clusters and set of centroids or seeds.
as defined by Muller and Stewart17 (Chapter 8). The
Each subject is assigned to its nearest centroid
assumption facilitates distributional derivations for
before a new set of centroids based on the new
hypothesis testing. With full rank , the maximum
allocation is formed. The process continues until there
likelihood (ML) estimator for B is equivalent to its
is no reallocation of subjects or until reassignment
least squares estimator, and the ML estimator for
meets some convergence criterion. In practice, the
is νe /N.
hierarchical and nonhierarchical methods may be
Secondary (mean) parameters can be defined
combined to facilitate the identification of clusters.
One may used a hierarchical procedure to identify the as = CBU, with both C and U contrast matrices
number of clusters and centroids, which can then be of known constants. The C matrix defines contrasts
input into the nonhierarchical procedure to refine the between groups or levels of predictors, and implicitly
cluster solution. Comprehensive discussions of cluster leads to computing linear combinations of columns
analysis can be found in the book written by Everitt.16 of X, the predictor variables. The U matrix defines
contrasts within subjects, and implicitly leads to
computing linear combinations of columns of Y,
the outcome variables. An estimable and − testable
GENERAL LINEAR MULTIVARIATE
requires rank(C) = a ≤ q, C = C X X X X , and
MODEL rank(U) = b ≤ p. Under these regularity conditions,
The general linear multivariate model is defined by is an unbiased estimator of and follows a matrix
Gaussian distribution as detailed in Table 1. The
multivariate quadratic form of ∗ leads to a central
Y = XB
+ E,
N×p N×q q×p N×p , Wishart distribution, the multivariate extension of
a central chi-square distribution.
with rows of the outcome matrix Y, error matrix The multivariate general linear (null) hypothesis
E, and design matrix X corresponding to subjects. regarding the secondary parameters may be stated
Columns of Y and E correspond to repeated measures H0 : = 0 . The hypothesis and error sum of
or multivariate outcomes (continuous), columns squares matrices lie in the heart of testing the general
of X correspond to fixed predictors (discrete or linear hypothesis. By the definitions given in Table 1,
continuous), and B denotes the matrix of primary both matrices have multivariate quadratic forms and
(mean) parameters. Table 1 summarizes the definitions follow a Wishart distribution with the shared scale
of the parameters and constants for the model. parameters ∗ . The error sum of squares matrix has
Detailed discussions about estimation and hypothesis a central Wishart form regardless of the underlying
testing of the general linear multivariate model can be hypothesis (null or alternative). The hypothesis sum of
found in the study of Muller and Stewart17 (Chapters squares matrix has a central Wishart form only under
3, 12, 16) and Timm2 (Chapters 3 and 4). the null hypothesis.
Tests for general linear hypothesis have been when = 1 (sphericity holds). The Geisser-Greenhouse
derived under different principles. Table 2 summarizes adjustment of the degrees of freedom uses the ML
the four multivariate approaches (HLT, PBT, WLK, estimator of , while the Huynh-Feldt adjustment
RLR) and the UNIREP method. All five statistics uses unbiased estimators of the numerator and
are functions of the hypothesis and error sum of denominator of . Both methods give tests that have
squares matrices. The four multivariate tests are approximate control of the Type I error rate. When
unbiased for their control of the target type I error the number of within-subject contrasts is one (b = 1),
rate (α) when the null hypothesis is true; however, the UNIREP and four multivariate tests become
none of them is uniformly most powerful among the equivalent and provide exactly the same p-value.
other three tests. The statistical powers vary with Applications for MANOVA, MANCOVA, and
the pattern of noncentrality parameters in , which repeated measures analyses belong to special cases for
implies one test may be preferred (more powerful) in testing general linear hypothesis with the general linear
some settings. All four multivariate tests are invariant multivariate model. The MANOVA analysis allows
to full rank linear transformations (excluding location the overall comparison of group means and always
shifts) of the original variables, and thus are suitable uses U = I b . The MANCOVA analysis evaluates
for analyzing multivariate outcomes measured with the overall group mean differences after adjusting
different metrics. for the effect of covariates, and like MANOVA,
Under the null hypothesis for one- and two- always assumes U = I b . The distinction between the
group comparisons, each of the four multivariate MANOVA and MANCOVA methods lies in the
statistics in Table 2 can be expressed exactly as a one- selection of the design matrix, and consequentially, the
to-one function of each other, and of an F random specification of the between-subject contrast matrix C.
variable with numerator degrees of freedom one and The analysis of repeated measures emphasizes more
denominator degrees of freedom νe . Furthermore, on the trend of changes across repeated measures and
the four tests are of exact size α and uniformly less on the overall differences. Thus, for this analysis,
most powerful among the class of unbiased and the choice of the within-subject contrast matrix U
scale-invariant tests. If the number of groups in the consists of trends that characterize changes over time
comparison is greater than two, then the statistics or space.
are not one-to-one functions of each other, and the
exact distributions are known only for special cases.
Approximations matching the first two moments lead MULTIVARIATE ANALYSIS
to using an F random variable for the HLT, PBT and OF VARIANCE
WLK statistics (Ref 17, Chapter 16).
The UNIREP approach stems from the null As its name suggests, MANOVA method extends the
approximation that the statistic [tr(Sh )/a]/ [tr(Se )/νe ] univariate analysis of variance18 to allow comparison
follows an F distribution with numerator degrees of of means of several variables among two or
freedom ab and denominator degrees of freedom more groups of independent and vector Gaussian
bνe . The parameter , defined in Table 1, quantifies observations with a common covariance structure.
the spread of population eigenvalues of ∗ , with The full rank formulation gives
maximum sphericity requiring all eigenvalues be
equal and = 1, and minimal sphericity requiring yjg = µg + ejg ,
only one nonzero eigenvalue and = 1/b. The F
approximation with numerator degrees of freedom with µg the p × 1 mean vector of group g ∈
ab and denominator degrees of freedom bνe gives {1, . . . , G}, and yjg and ejg the outcome and error
an unbiased and uniformly most powerful test only vector for subject j ∈ {1, . . . , Ng } in group g,
respectively. The null hypothesis of no overall group parameters FR = µ1 − µ2 · · · µ1 − µG , and
difference leads to H0 : µ1 = · · · = µG for the full the null matrix 0 = 0.
rank model. The classic less than full rank formulation The less than full rank model, in contrast,
overparameterizes the model as includes intercept and G indicator variables as
predictors to have
yjg = µ + δ g + ejg .
1N1 1N1 0 0 0
With the sum-to-zero constraint on {δ g } (i.e., 1N 0 1N2 0 0
G 2
X LTFR = . ,
g=1 δ g = 0), µ indicates the population mean of .. ..
0 0 . 0
balanced groups (i.e., µ = G g=1 µg /G), and δ g is the 1NG 0 0 0 1NG
deviation from the population mean for group g. The
null hypothesis states H0 : δ 1 = · · · = δ G = 0 for the µ
δ 1
less than full rank model. The Gaussian assumption
in combination with the least squares assumptions BLTFR = δ2 .
leads to identically and independently distributed ..
.
error vectors, namely ejg ∼ Np (0, ). δ G
Current computational methods for MANOVA
are based on the fact that MANOVA can be expressed
The general linear hypothesis for H0 : δ 1 = · · · =
and interpreted as a general linear multivariate model.
δ G = 0 can be tested with a = G between-subject
This is accomplished by defining one indicator variable
contrasts for CLTFR = 0 I G , b = p within-subject
for each group. The full rank model includes G
contrasts for U = I b, the secondary parameters
indicator variables as predictors to have
LTFR = δ 1 · · · δ G , and the null matrix 0 = 0.
By framing MANOVA hypotheses as the general
y11
. linear hypotheses, the multivariate tests listed in Table
.. 2 can be utilized to compute p-values and assess
y statistical significance. The fact that U = I b implies
N1 1
y that MANOVA compares means across all outcome
12 1N1 0 0 0
. variables. The tests are sensitive to a collection of
.. 0 1N2 0 0
small differences, but have no capacity for identifying
Y = , X FR = .. ,
yN2 2 0 0 . 0 individual variables that contribute to the overall
.
. 0 0 0 1NG difference.
.
y1G
.
.
.
yNG G
MULTIVARIATE ANALYSIS OF
COVARIANCE
e11
. The MANCOVA model allows comparisons of
.. group mean vectors in the presence of covariates.
e A simple example would be comparing improvements
N1 1
e in quality of life measures (appetite, mood, coping,
µ1 12
. and physical-welling) among therapeutic regimens
µ ..
2 after adjusting for the effect of age. The general
BFR = . , E = .
.. eN2 2 linear multivariate model for the MANCOVA design
.
µG . extends the MANOVA design by adding covariates
.
as additional predictors in the design matrix X. The
e1G
. full rank MANCOVA
. model leads
to considering
. X = X FR Z and B = BFR , with columns of
eNG G Z covariates needed to be accounted for and the
parameters that capture the effects of covariates on
The general linear hypothesis for H0 : µ1 = · · · = µG the mean outcome vectors. The secondary parameters
can be tested with a = G − 1 (G > 1) between-
may be thought as nuisance parameters of no
subject contrasts for CFR = 1G−1 −I G−1 , b = p intrinsic interest. Similarly the less than full rank
within-subject contrasts for U = I b , the secondary MANCOVA model overparameterizes by setting
X = X LTFR Z and B = BLTFR . The between- and perform hypothesis testing on each subspace
subject contrast matrix is C = CFR 0 for the separately. The one-dimensional subspace has the
full rank formulation and C = CLTFR 0 for the within-subject contrast matrix U the vector 1p /p
less than full rank formulation. The within-subject for testing the average
outcome across repeated
contrast matrix remains U = I b for the overall measurements. The p − 1 -dimensional subspace can
comparison. be defined by the pairwise differences between
The MANCOVA design implicitly assumes the repeated measures (i.e., U = 1p−1 −I p−1 ) or the
effects of covariates are the same across the groups orthonormal polynomial trends (i.e., columns of
of comparison. This assumption can be evaluated by U corresponding to the linear trend,
quadratic
testing the significance of the interactions between trend, cubic trend, etc). These p − 1 within-subject
the covariates and indicator variables that define contrasts allow an overall assessment of the trend over
the groups. The interaction term can be examined repeated measures.
by expanding a MANCOVA design matrix with The multivariate and UNIREP methods listed in
additional columns from X FR Z. Here denotes Table 2 can be used for testing any repeated measures
the operator of horizontal direct product that create a hypothesis if no missing or mistimed data are present
new matrix by elementwise multiplication of pairs (i.e., balanced within-subject design). Non-iterative
of columns from X FR and Z. The corresponding approximation methods or mixed models can be used
between-subject contrasts identify individual columns when there are missing data. The UNIREP method
of X FR Z and allow simultaneously testing the with no correction for degrees of freedom (i.e., F
equivalence of the effect of each covariate on each approximation with degrees of freedom ab and bνe )
outcome variable (i.e., U = Ib ) across the G groups. has exact control of the Type I error rate and is
uniformly most powerful when repeated measures
are exchangeable (i.e., with equal variance and equal
Repeated Measures Analysis correlation between any pair).
Repeated measures design is used when there
are multiple observations of the subject on a
single outcome variable at several points in time Generalizations
or space. Observations from any single subject One of the limitations of the general linear
may be correlated. As an example, in agricultural multivariate model is that there must be a common
experiments, the split plot design creates exchangeable design matrix for all outcome variables. The seemingly
subplots by splitting a field (a plot of land) and unrelated model (SUM) overcomes this limitation
randomly assigns one treatment of interest to each by allowing a distinct set of predictors for each
subplot. Fields are the independent sampling units variable. Much of the work on SUM was motivated
(subjects), and observations from subplots of a field by econometric applications and studies with time-
give repeated measures which have equal correlation varying predictors. The model expands the design
with each other. In biomedical and behavioral matrix and primary parameters by defining
research, repeated measurements are commonly
observed over time, which undermine exchangeability
because the order of the measurements in time cannot β1 0 0 0
0 β2 0 0
be inter-changed. Correlations between observations
Y = X1 X2 · · · Xp .. + E,
further apart in time are likely to be weaker than 0 0 . 0
observations made closer in time. 0 0 0 βp
The general linear multivariate model takes
into account any correlation structure inherent to
repeated measures design. The coding scheme and
with X k n × qk the design matrix for the kth
setup for between-subject comparisons with and
outcome and β k qk × 1 the corresponding primary
without adjusting for covariates follow the designs of
parameters. By stacking Y columnwise, the model
MANCOVA and MANOVA, respectively. The overall
can be fit using the ML or generalized least squares
within-subject comparison, as employed in MANOVA
estimation for β = β 1 β 2 · · · βp . The stacking
and MANCOVA, is typically of little interest p
especially when measurements are taken repeatedly model has y =vec(Y) with mean k=1
X k β and
over time. Common approaches to repeated measures covariance ⊗ I N . Here vec(·) is the operator that
analysis decompose the p-dimensional outcome stacks the columns of Y. Srivastava and Giles19
space into one- and p − 1 -dimensional subspaces, provided a book-length treatment of this model.
primary parameters β, and ML or restricted maximum the general linear multivariate model where predictors
likelihood (REML) estimation for the secondary are linked to the response means, the structural
covariance parameters and . The REML estimates equation model focuses on establishing variable inter-
arise from maximizing the reduced profile log- relationships and/or latent structures through fitting
likelihood equation. Both ML and REML estimates the variances and covariances of the entire observed
typically have biases especially in small samples, data. Both response and predictor variables are
with REML estimates less biased than the ML assumed to be random and typically centered at
estimates. The estimation process requires iterative their observed means to restrict confounding from
computations, which can be computational intensive the means. The response variables are endogenous for
for large data and may be slow to converge or being predicted within the model, while the predic-
not converge at all. Cheng et al.26 recommended tor variables are exogenous for being predicted by
centering, scaling, and full rank coding of all predictor variables outside the model. A non-recursive model
variables to improve computing speed, the chances of allows backward causation, causal loops, or bidirec-
convergence, and numerical accuracy. tional paths in the model, as opposed to a recursive
Accurate hypothesis testing and confidence inter- model which considers only unidirectional causation.
vals for linear mixed model require a large sample The structural equation model encompasses
approximation. Among the widely available methods, three processes: (1) path analysis for structural
the Kenward–Roger approximation for the degrees models of observed variables such as examining
of freedom of the Wald statistic of the REML esti- the inter-relationship between genes expressed under
mates appears to provide the most accurate control an experimental condition, (2) confirmatory factor
of the Type I error rate. However, as Muller et al.27 analysis for a priori measurement models such as
pointed out, the inflation of the Type I error rate testing the latent factor underlying the four indicators
can be substantial in small samples even with cor- of quality of life, and (3) a synthesis of path and
rectly specified covariance model. Alternately, if the confirmatory factor analysis. The structural model for
assumptions of the general linear multivariate model path analysis has
are satisfied (i.e., no missing and mistimed data and
no time-varying predictors), then MANOVA, MAN- yi = yi + Bxi +
i,
COVA, and UNIREP tests can be used. Even in small
p×1 p×p p×1 p×q q×1 p×1 ,
samples, an appropriate MANOVA, MANCOVA, or
UNIREP test always controls the Type I error rate and for each independent subject i ∈ {1, . . . , N}, no missing
has a good power approximation, in sharp contrast data, and a matrix with zeros on the diagonal. The
to mixed model tests. specification allows modeling any response variable
The flexibility of linear mixed model permits as a linear function of predictors and other response
analysis of HDLSS high-throughput data with existing variables. By assuming xi ∼ Nq (0, x ),
i ∼ Np (0, )
methods. However, small-sample limitations of mixed and independence between xi and
i , the covariance
model tests extend to HDLSS applications to matrix of the model is given by
discourage considering the mixed model when sample The measurement models for confirmatory
discourage considering the mixed model when sample factor analysis
relate the unobserved latent constructs
size is small. Furthermore, little is known about the (variables) ηi , δ i to observed variables yi , xi
performance of mixed model tests as the number of through
variables substantially outgrows the sample size.
yi = y ηi + ei ,
(p × 1) (p × m)(m × 1) (p × 1),
STRUCTURAL EQUATION MODEL
The structural equation model provides a statistical and
solution for estimating and testing structural relations
between response and predictor variables, and is useful xi = xδi + ζ i,
for causal inference and analysis of networks. Unlike (q × 1) (q × k)(k × 1) (q × 1).
Parameter estimation for the structural equation latent constructs for EFA, and canonical variables for
model relies on minimizing differences between canonical correlation). Classification methods such as
observed covariance (correlation) matrix of all vari- discriminant analysis (supervised) and cluster analy-
ables and the expected covariance (correlation) matrix sis (unsupervised) differ with respect to the presence
of the model. Graphical tools are typically used to rep- of a leading univariate outcome. While benefiting
resent the structural and measurement models, with from its flexibility of allowing missing, mistimed data
squares or rectangles representing observed variables, and time-varying predictors, mixed models rely heav-
circles or ellipses representing latent variables, and ily on large samples to achieve unbiased hypothesis
arrows indicating directional relationships. Pearl28 testing. Alternately, assuming a general linear multi-
gave a book-length treatment of graphical mod- variate model whenever applicable permits the use of
els. Recent advances for analyzing high-dimensional MANOVA, MANCOVA, and UNIREP tests which
graphs including using the LASSO algorithm29 and result in good control of the Type I error rate even in
joint estimation.30 small samples.
ACKNOWLEDGMENTS
This work was supported by a UF CTSI core grant (NCRR U54RR025208 and UL1RR029890), NINDS
R21-NS065098, and NIDDK R01-DK072398-05.
REFERENCES
1. Mardia KV, Kent JT, Bibby JM. Multivariate Analysis. 6. Greenacre MJ. Correspondence analysis. Wiley Inter-
New York: Academic Press; 1979. discip Rev: Comp Stat 2010, 2:613–619. doi:10.1002/
wics.114.
2. Timm NH. Applied Multivariate Analysis. New York:
Springer; 2002. 7. Zou H, Hastie T, Tibshirani R. Sparse principal com-
ponent analysis. J Comp Graph Stat 2006, 15:
3. Anderson TW. An Introduction to Multivariate Statis- 265–286.
tical Analysis. New York: John Wiley & Sons; 2003.
8. Jung S, Marron JS. PCA consistency in high dimen-
4. Schott JR. Matrix Analysis for Statistics. New York: sion, low sample size context. Ann Stat 2009, 37:
John Wiley & Sons; 1997. 4104–4130.
5. Abdi H, Williams LJ. Principal component analysis. 9. Parkhomenko E, Tritchler D, Beyene J. Sparse canoni-
WIREs Comp Stat 2010, 2:433–459. cal correlation analysis with application to genomics
data integration. Stat Appl Genet Mol Biol 2009, 20. Kshirsagar AM, Smith WB. Growth Curves. New York:
8:1–34. Marcel Dekker; 1995.
10. Lawley DN, Maxwell AF. Factor Analysis as a Statisti- 21. Srivastava MS. Multivariate theory for analyzing high
cal Model. London: Butterworth; 1971. dimensional data. J Jpn Stat Soc 2007, 37:53–86.
11. Huberty CJ. Applied Discriminant Analysis. New York: 22. Warton DI. Penalized normal likelihood and ridge regu-
John Wiley & Sons; 1994. larization of correlation and covariance matrices. J Am
12. Garthwaite PH. An interpretation of partial least Stat Assoc 2008, 103:340–349.
squares. J Am Stat Assoc 1994, 89:122–127. 23. Srivastava MS, Du M. A test for the mean vector with
13. Fonville JM, Richards SE, Barton RH, Boulange CL, fewer observations than the dimension. J Multivariate
Ebbels TMD, Nicholson JK, Holmes E, Dumas MC. Anal 2008, 99:386–402.
The evolution of partial least squares models and 24. Srivastava MS, Fujikoshi Y. Multivariate analysis of
related chemometric approaches in metabonomics and variance with fewer observations than the dimension.
metabolic phenotying. J Chemomet 2010, 24: 636–649. J Multivariate Anal 2006, 97:1927–1940.
14. Guo Y, Hastie T, Tibshirani R. Regularized discrimi- 25. Ahmad MR, Werner C, Brunner E. Analysis of high-
nant analysis and its application in microarrays. Bio- dimensional repeated measures designs: the one sample
statistics 2007, 8:86–100. case. Comp Stat Data Anal 2008, 53:416–427.
15. Qiao Z, Zhou L, Huang JZ. Sparse linear discriminant 26. Cheng J, Edwards LJ, Maldonado-Molina MM,
analysis with applications to high dimensional low sam- Komro KA, Muller KE. Real longitudinal data anal-
ple size data. IAENG Int J Appl Math 2009, 39:1. ysis for real people: building a good enough mixed
IJAM_39_1_06. model. Stat Med 2010, 29:504–520.
16. Everitt BS. Cluster Analysis. London: Edward Arnold; 27. Muller KE, Edwards LJ, Simpson SL, Taylor DJ. Statis-
1993. tical tests with accurate size and power for balanced
17. Muller KE, Stewart PW. Linear Model Theory; Uni- linear mixed models. Stat Med 2007, 26:3639–3660.
variate, Multivariate, and Mixed Models. New York: 28. Pearl J. Causality–Models, Reasoning, and Inference.
John Wiley & Sons; 2006. Cambridge, UK: Cambridge University Press; 2000.
18. Muller KE. Analysis of Variance Concepts and Com- 29. Meinshausen N, Buhlmann P. High-dimensional graphs
putations. WIREs Comp Stat 2009, 1:271–282. with the lasso. Ann Stat 2006, 34:1436–1462.
19. Srivastava VK, Giles DEA. Seeminly Unrelated Regres- 30. Guo J, Levina E, Michailidis G, Zhu J. Joint estima-
sion Equation Models: Estimation and Inference. New tion of multiple graphical models. Biometrika 2011,
York: Marcel Dekker; 1987. 98:1–15.