0% found this document useful (0 votes)
54 views

Canonical Correlation Analysis: An Overview With Application To Learning Methods

Canonical correlation analysis (CCA) finds basis vectors for two sets of variables such that the correlation between the projections of the variables onto the basis vectors is maximized. CCA extracts successive pairs of canonical variates, with each pair having a canonical correlation coefficient representing its linear relationship. CCA differs from correlation in that it is not dependent on the coordinate system of variables and finds the directions that yield maximum correlations between the two sets of variables. CCA can be used to find nonlinear relationships through kernel CCA and has applications in areas like speaker recognition, image retrieval, and extracting semantic representations from multiple views of data.

Uploaded by

Sergio Rui Silva
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Canonical Correlation Analysis: An Overview With Application To Learning Methods

Canonical correlation analysis (CCA) finds basis vectors for two sets of variables such that the correlation between the projections of the variables onto the basis vectors is maximized. CCA extracts successive pairs of canonical variates, with each pair having a canonical correlation coefficient representing its linear relationship. CCA differs from correlation in that it is not dependent on the coordinate system of variables and finds the directions that yield maximum correlations between the two sets of variables. CCA can be used to find nonlinear relationships through kernel CCA and has applications in areas like speaker recognition, image retrieval, and extracting semantic representations from multiple views of data.

Uploaded by

Sergio Rui Silva
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 22

Canonical Correlation

Analysis: An overview with


application to learning
methods
By David R. Hardoon, Sandor Szedmak, John Shawe-Taylor
School of Electronics and Computer Science, University of
Southampton
Published in Neural Computaion, 2004

Presented by:
Shankar Bhargav
Canonical Correlation Analysis

Measuring the linear relationship between


two multi dimensional variables
Finding two sets of basis vectors such that
the correlation between the projections of
the variables onto these basis vectors is
maximized
Determine Correlation Coefficients
Canonical Correlation Analysis
More than one canonical correlations will
be found each corresponding to a different
set of basis vectors/Canonical variates
Correlations between successively
extracted canonical variates are smaller
and smaller
Correlation coefficients : Proportion of
correlation between the canonical variates
accounted for by the particular variable.
Differences with Correlation

Not dependent on the coordinate system


of variables
Finds direction that yield maximum
correlations
Find basis vectors for two sets of variables x, y
such that the correlations between the
projections of the variables onto these basis
vector

Sx = (x.wx) and Sy = (y.wy)

ρ= E[Sx Sy ]
√ E[Sx2] E[Sy2]

ρ= E[(xT wx yT wy)]
√E[(xT wx xT wx) ] E[(yT wy yT wy)]
ρ= max w x wy E[wxTx yT wy]
√E[wxTx xT wx ] E[wyT y yT wy]

ρ= max w x wy wxTCxy wy
√ wxTCxxwx wyTCyy wy

Solving this
with constraint wxTCxxwx =1
wyTCyy wy=1
Cxx-1CxyCyy-1Cyx wx = ρ2 wx
Cyy-1CyxCxx-1Cxy wy= ρ2 wy

Cxy wy = ρλx Cxx wx


Cyx wx = ρλy Cyy wy

λx=λy-1= wyTCyywy
√ wxTCxxwx
CCA in Matlab
[ A, B, r, U, V ] = canoncorr(x, y)

x, y : set of variables in the form of matrices


 Each row is an observation
 Each column is an attribute/feature
A, B: Matrices containing the correlation coefficient
r : Column matrix containing the canonical
correlations (Successively decreasing)
U, V: Canonical variates/basis vectors for A,B
respectively
Interpretation of CCA
Correlation coefficient represents unique
contribution of each variable to relation
Multicollinearity may obscure relationships
Factor Loading : Correlations between the
canonical variates (basis vector) and the
variables in each set
Proportion of variance explained by the
canonical variates can be inferred by
factor loading
Redundancy Calculation
Redundancy left =[ ∑ (loadingsleft2)/p]*Rc2

Redundancy right =[ ∑ (loadingsright2)/q]*Rc2

p – Number of variable in the first (left) set of variables


q – Number of variable in the second (right) set of
variables
Rc2 – Respective squared canonical correlation

Since successively extracted roots are uncorrelated we


can sum the redundancies across all correlations to
get a single index of redundancy.
Application
Kernel CCA can be used to find non linear
relationships between multi variates
Two views of the same semantic object to
extract the representation of the semantics
 Speaker Recognition – Audio and Lip
movement
 Image retrieval – Image features (HSV,
Texture) and Associated text
Use of KCCA in cross-modal
retrieval
 400 records of JPEG images for each class
with associated text and a total of 3 classes
 Data was split randomly into 2 parts for
training and test
 Features
Image – HSV Color, Gabor texture
Text – Term frequencies
 Results were taken for an average of 10 runs
Cross-modal retrieval
Content based retrieval: Retrieve images
in the same class
Tested with 10 and 30 images sets


where countjk = 1 if the image k in the set is of
the same label as the text query present in
the set, else countjk = 0.
Comparison of KCCA (with 5 and 30 Eigen
vectors) with GVSM
Content based retrieval
`
Mate based retrieval
Match the exact image among the
selected retrieved images
Tested with 10 and 30 images sets

 where countj = 1 if the exact matching image


was present in the set else it is 0
Comparison of KCCA (with 30 and 150 Eigen
vectors) with GVSM
Mate based retrieval
Comments
The good
 Good explanation of CCA and KCCA
 Innovative use of KCCA in image retrieval application

The bad
 The data set and the number of classes used

were small
 The image set size is not taken into account

while calculating accuracy in Mate based


retrieval
 Could have done cross-validation tests
Limitations and Assumptions of
CCA

At least 40 to 60 times as many cases as


variables is recommended to get relliable
estimates for two roots– BarciKowski & Stevens(1986)
Outliers can greatly affect the canonical
correlation
Variables in two sets should not be
completely redundant
Thank you

You might also like