Multivariate Analysis An Overview
Multivariate Analysis An Overview
net/publication/301516298
CITATIONS READS
6 29,743
1 author:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Siddharth Kumar Singh on 20 April 2016.
Abstract
Introduction: Multivariate analysis (MVA) techniques allow more than two variables to be
analysed at once. Two general types of MVA technique: Analysis of dependence& Analysis of
interdependence. Technique is selected depending on type of data and reason for the analysis.
Cluster analysis: “Techniques for identifying separate groups of similar cases”. Also used to
summarize data by defining segments of similar cases in the data. Discriminant analysis: Is a
statistical technique for classifying individuals or objects into mutually exclusive and
exhaustive groups on the basis of a set of independent variables”. Factor analysis :
Multiple factor analysis (mfa): “Statistical method used to describe variability among
observed variables in terms of a potentially lower number of unobserved variables called
factors”. Correspondance Analysis: “Technique that generates graphical representations of the
interactions between modalities (or "categories") of two categorical variables”. Regression
Analysis: “Refers to any techniques for modelling and analyzing several variables, when the focus
is on the relationship between a dependent variable and one or more independent variables.”
Multiple Linear Regression Analysis (MLR): In Multiple linear regressions, several independent
variables are used to predict with a least square approach one direct variable. Multivariate
analysis of variance (MANOVA): It is a generalized form of univariate analysis of
variance (ANOVA). Conclusion: Because there are many potential problems and pitfalls in the
use of multivariable techniques in clinical research, these procedures should be used with care.
(Kumar S, Singh SK, Mishra P. Multivariate Analysis : An Overview. www.journalofdentofacialsciences.com,
2013; 2(3): 19-26)
Introduction
*Sr Lecturer, Sri Aurbindo College of Dentistry, Indore, “Application of methods that deal with
Madhya Pradesh reasonably large number of measurements made
Address for Correspondence: on each object in one or more samples
**Dr Sandeep Kumar simultaneously”.1 Many statistical techniques focus
Flat No. 304, Sanskar Block, SAIMS Campus, on just one or two variables. Multivariate analysis
Sanwar Road, Indore, Madhya Pradesh
(MVA) techniques allow more than two variables
e-mail: [email protected]
to be analysed at once. The ultimate goal of these
analyses is either explanation or prediction, i.e.,
more than just establishing an association.
Multivariate Analysis Methods: Two general
types of MVA technique:1
20 Kumar et al.
• Analysis of dependence: Where one (or more) • Regional analyses—classify cities into
variables are dependent variables, to be typologies based on demographic
explained or predicted by others. e.g. Multiple variables.
regression, Discriminant analysis, Manova, • Marketing research--classifying customers
Partial Least Square. based on product use.
• Analysis of interdependence: No variables • Chemistry—classification of compounds
thought of as “dependent”. Look at the based upon properties
relationships among variables, objects or cases.
Overview of cluster analysis [3]
E.g. cluster analysis, factor analysis, and
principal component analysis. Step 1=n objects measured on p variables
Selection of Technique depends on:1 Step 2=transform into N x N similarity matrix
9 The type of data under analysis: Nominal Step3=cluster formation (Mutually exclusive
data, Ordinal data clusters, Hierarchical clusters
9 The reason for the analysis: Classify data, Step 4=cluster profile
Data reduction Types of Cluster Analysis4
Classifying data Hierarchical
Hierarchical cluster analysis Two Step
Two step cluster analysis K Means
K means cluster analysis Hierarchical cluster: Perform successive fusions
Discriminant analysis or divisions of the data. Hierarchical clustering is
Data reduction one of the most straightforward methods. It can be
Factor analysis either agglomerative or divisive.
Correspondence analysis Agglomerative hierarchical clustering: Every
case being a cluster unto itself. At successive steps,
Level of Measurement and Multivariate similar clusters are merged. Proceeds by forming a
Statistical Technique:2 series of fusions of the n objects into groups
Independent Dependent Technique
Divisive clustering: Starts with everybody in
Variable Variable
one cluster and ends up with everyone in
Numerical Numerical Multiple
Regression individual clusters. It partitions the set of n objects
Nominal or Nominal Logistic into finer and finer subdivisions.
Numerical Regression Agglomerative Methods
Nominal or Numerical Cox Regression Single linkage or nearest neighbor method
Numerical (censored)
Nominal or Numerical ANOVA, Complete linkage or the farthest neighbor
Numerical MANOVA method
Nominal or Nominal (2 or Discriminant Average linkage
Numerical more values) Analysis Wards error sum of squares methods
Cluster Analysis:3 “Techniques for Divisive Methods
identifying separate groups of similar A splinter average distance methods.
cases”. Also used to summarize data by defining
Automatic interaction detection.
segments of similar cases in the data. This use of
cluster analysis is known as “dissection.” The output from both agglomerative and
divisive methods is typically summarized by use of
Application:
a dendrogram.
• Psychology—classifying individuals
according to personality types.
Types Procedure1
Discrete discriminant analysis Performed in two steps
Logistic discrimination 9 Select the factors
Descriptive
Error rate estimation
9 Initial solution
• The re-substitution method 9 Coefficient
• The Hold-out method 9 KMO+BARTLETT TEST: Check
• The U method or cross validation Bartlett table, KMO value 0-1(close to
1 better the significance at .05) , Using
• The Jackknife method
kieser criteria factors are selected,
APPLICATION Scree plot can be used for better
• Face identification judgement
• Bankruptcy detection Extraction
9 Principal component
• Marketing research
9 Correlation matrix
Factor Analysis1 9 Eigen value
Multiple Factor Analysis (MFA): “Statistical 9 Scree plot
method used to describe variability among Rotation
observed variables in terms of a potentially lower 9 No change
number of unobserved variables called factors”. Scores
Factor analysis attempts to identify underlying 9 Exclude case listwise
variables, or factors, that explain the pattern of Descriptive
correlations within a set of observed variables. 9 Untick
Factor analysis is often used in data reduction and Extraction
can also be used to generate hypotheses regarding 9 Scree plot
causal mechanisms or to screen variables for 9 Univariate untick
subsequent analysis 9 Select the factors
Type of factor analysis Rotation
9 Varimax-independent variables
Exploratory factor analysis : (EFA) is used to
9 Oblimin dependent variables
uncover the underlying structure of a relatively
large set of variables. The researcher's a NUMBER CALLED “LOADING”. Higher loading
priori assumption is that any indicator may be is selected and close loading is eliminated
associated with any factor. This is the most Types of factoring1
common form of factor analysis. Principal Component Analysis:2 PCA was
Confirmatory factor analysis : (CFA) seeks to invented in 1901 by Karl Pearson. The goal of
determine if the number of factors and the PCA is to decompose a data table with correlated
loadings of measured (indicator) variables on them measurements into a new set of uncorrelated (i.e.,
conform to what is expected on the basis of pre- orthogonal) variables. These variables are called,
established theory. The researcher's à priori depending upon the context, principal
assumption is that each factor is associated with a components, factors, eigenvectors, singular
specified subset of indicator variables vectors, or loadings. The results of the analysis are
Factor analysis is related to principal component often presented with graphs plotting the
analysis (PCA), but the two are not identical. The projections of the units onto the components, and
two methods become essentially equivalent if the the loadings of the variables.
error terms in the factor analysis model can be Canonical factor analysis: It is also called Rao's
assumed to all have the same variance. canonical factoring. Seeks factors which have the
highest canonical correlation with the observed
variables. It is unaffected by arbitrary rescaling of Also the results reflect relative associations, not
the data. just which rows are highest or lowest overall
Common factor analysis: It is also called Regression Analysis1: “Refers to any techniques
principal factor analysis (PFA) or principal axis for modelling and analyzing several variables,
factoring (PAF). Seeks the least number of factors when the focus is on the relationship between
which can account for the common variance a dependent variable and one or more
(correlation) of a set of variables. independent variables.” Helps to understand how
Image factoring: Based on the correlation matrix the typical value of the dependent variable
of predicted variables rather than actual variables, changes when any one of the independent
where each variable is predicted from the others variables is varied, while the other independent
using multiple regression. variables are held fixed.
Alpha factoring: Based on maximizing the TYPES2
reliability of factors, assuming variables are ¾ Multiple Linear Regression Analysis
randomly sampled from a universe of variables. ¾ Partial Least Square Regression (PLSR)
Correspondance Analysis6 ¾ Principal Component Regression (PCR)
“Technique that generates graphical ¾ Ridge Regression (RR)
representations of the interactions between ¾ Reduced Rank Regression (RRR) Or
modalities (or "categories") of two categorical Redundancy Analysis
variables”. It allows the visual discovery and
interpretation of these interactions, that is, of the ¾ Poisson’s Regression Analysis
departure from independence of the two variables ¾ Logistic Regression Analysis
Steps6 Multiple Linear Regression Analysis (MLR):1
Run a Chi-square test of independence on In Multiple linear regressions, several independent
these two variables variables are used to predict with a least square
approach one direct variable. If the independent
• If the test fails to reject the independence variables are orthogonal, the problem reduces to a
hypothesis, then Correspondence Analysis will set of univariate regressions. When the
not deliver any useful information, and can be independent variables are correlated, their
ignored. importance is estimated from the partial coefficient
• Only if the independence hypothesis is rejected of correlation. An important problem arises when
will C.A. be considered as the next step in the one of the independent variables can be predicted
analysis of the pair of variables. from the other variables. This is called
A PCA-like transformation then allows the multicolinearity.
modalities of the variables to be represented as The main approaches are:
points in factorial planes. ¾ Forward selection: which involves starting with
Interpretation no variables in the model, trying out the
Correspondence analysis plots should be variables one by one and including them if
interpreted by looking at points relative to the they are ‘statistically significant’?
origin ¾ Backward elimination: which involves starting
• Points that are in similar directions are with all candidate variables and testing them
positively associated one by one for statistical significance, deleting
any that are not significant.
• Points that are on opposite sides of the origin
are negatively associated ¾ Methods that are a combination of the above,
testing at each stage for variables to be
• Points that are far from the origin exhibit the
included or excluded.
strongest associations