Lecture-12 Canonical Correlation
Lecture-12 Canonical Correlation
2
Canonical Correlation
Canonical correlation analysis is the study of the linear relations between two sets of variables.
It is the multivariate extension of correlation analysis.
Although we will present a brief introduction to the subject here, you will probably need a text that
covers the subject in depth such as Tabachnick (1989).
Suppose you have given a group of students two tests of ten questions each and wish to determine
the overall correlation between these two tests.
Canonical correlation finds a weighted average of the questions from the first test and correlates
this with a weighted average of the questions from the second test.
The weights are constructed to maximize the correlation between these two averages.
This correlation is called the first canonical correlation coefficient.
You can create another set of weighted averages unrelated to the first and calculate their
correlation. This correlation is the second canonical correlation coefficient.
This process continues until the number of canonical correlations equals the number of variables in
the smallest group.
3
Canonical Correlation
Simple Correlation
We study the relation between two variables
Multiple Correlation
In multiple correlation we study the relationship a variable Y and set of variables (X1, X2, X3, …Xp). To
do this we take a linear combination of X1, X2, X3, …Xp and find the correlation with Y.
Canonical Correlation
In canonical correlation we study the relationship between two set of variables (Y1, Y2, Y3, …Yn) and
(X1, X2, X3, …Xn).
Canonical correlation requires that each set of variables be required to a single variable and then
find their correlation. Usually those two variables are found by taking linear combination of the
variables in each set, under certain prefixed criteria.
The variables obtained by these linear combinations are known as canonical variable and the
correlation between them as canonical correaltion.
4
What is Canonical Correlation analysis?
The Canonical Correlation is a multivariate analysis of correlation.
Canonical Correlation analysis is the analysis of multiple-‐X multiple-‐Y correlation. The Canonical
Correlation Coefficient measures the strength of association between two Canonical Variates.
A Canonical Variate is the weighted sum of the variables in the analysis.
The canonical variate is denoted CV. Similarly to the discussions on why to use factor analysis
instead of creating un weighted indices as independent variables in regression analysis, canonical
correlation analysis is preferable in analyzing the strength of association between two constructs.
This is such because it creates an internal structure, for example, a different importance of single
item scores that make up the overall score (as found in satisfaction measurements and aptitude
testing).
For multiple x and y the canonical correlation analysis constructs two variates CVX1 = a1x1 + a2x2 + a3x3
.... and CVY1 = b1y1 + b2y2 + b3y3 .....
A pair of canonical variates is called a canonical root.
5
Canonical Correlation
for example, if we calculate the canonical correlation between three variables for test scores and five
variables for aptitude testing, we would extract three pairs of canonical variates or three canonical
roots.
Note that this is a major difference from factor analysis. In factor analysis the factors are calculated
to maximize between-‐group variance while minimizing in-‐group variance.
They are factors because they group the underlying variables.
Canonical Variants are not factors because only the first pair of canonical variants groups the
variables in such way that the correlation between them is maximized.
The second pair is constructed out of the residuals of the first pair in order to maximize correlation
between them.
Therefore the canonical variants cannot be interpreted in the same way as factors in factor analysis.
Also the calculated canonical variates are automatically orthogonal, i.e., they are independent from
each other.
6
Canonical Correlation
Discriminant analysis, MANOVA, and multiple regression are all special cases of canonical
correlation.
It provides the most general multivariate framework. Because of this generality, it is probably the
least used of the multivariate procedures. Researchers would rather use the specific procedure
designed for their data. However, there are instances when canonical correlation techniques are
useful.
Basic Issues: Some of the issues that must be dealt with during a canonical correlation analysis are:
1. Determining the number of canonical variate pairs to use. The number of pairs possible is equal
to the smaller of the number of variables in each set.
2. The canonical variates themselves often need to be interpreted. As in factor analysis, you are
dealing with mathematically constructed variates that are usually difficult to interpret.
However, in this case, you must relate two constructed variates to each other.
7
Canonical Correlation
3. The importance of each variate must be evaluated from two points of view. You have to determine
the strength of the relationship between the variate and the variables from which it was created.
You also need to study the strength of the relationship between the corresponding X and Y variates.
4. Do you have a large enough sample size? In social science work you will often need a minimum of
ten cases per variable. In fields with more reliable data, you can get by with a little less.
• The canonical correlation coefficients test for the existence of overall relationships between two
sets of variables, and redundancy measures the magnitude of relationships. Lastly Wilk's lambda
(also called U value) and Bartlett's V are used as a Test of Significance of the canonical correlation
coefficient.
• Typically Wilk's lambda is used to test the significance of the first canonical correlation coefficient
and Bartlett's V is used to test the significance of all canonical correlation coefficients.
8
Self Assessment Questions
9
Applications
10
Summary
• Multivariate analysis is the best way to summarize a data tables with many
variables by creating a few new variables containing most of the information.
These new variables are then used for problem solving and display, i.e.,
classification, relationships, control charts, and more.
• Canonical correlation analysis is used to identify and measure the associations
among two sets of variables.
• To develop an understanding of various Multivariate methods and techniques are
used in business and management.
11
References
• Afifi, Abdelmonem, Susanne May, and Virginia A. Clark. 2012. Practical Multivariate Analysis. Fifth
Edition. Boca Raton, Florida, USA: Chapman & Hall CRC Press.
• Bartholomew, David J., Fiona Steele, Irini Moustaki and Jane I. Galbraith. 2002. The Analysis and
Interpretation of Multivariate Data for Social Scientists. Boca Raton, Florida, USA: Chapman &
Hall/CRC.
• Bilodeau, Martin and David Brenner. 1999. Theory of Multivariate Statistics. New York City, New
York, USA: Springer.
• Everitt, Brian S. and Graham Dunn. 2010. Applied Multivariate Data Analysis. Second Edition. New
York, New York, USA: John Wiley & Sons.
• Hair, Joseph F., William C. Black, Barry J. Babin, and Rolph E. Anderson. 2010 Multivariate Data
Analysis. Seventh edition. Upper Saddle River, New Jersey, USA: Pearson Prentice Hall.
• Manly, Bryan F. J. 2005. Multivariate Statistical Methods: A Primer. Third Edition. Boca Raton,
Florida, USA: Chapman & Hall/CRC.
12
THANK YOU
For queries
Email: [email protected]