Similarity Learning
Similarity Learning
Similarity learning is an area of supervised machine learning in artificial intelligence. It is closely related to
regression and classification, but the goal is to learn a similarity function that measures how similar or
related two objects are. It has applications in ranking, in recommendation systems, visual identity tracking,
face verification, and speaker verification.
Learning setup
There are four common setups for similarity and metric distance learning.
A common approach for learning similarity is to model the similarity function as a bilinear form. For
example, in the case of ranking similarity learning, one aims to learn a matrix W that parametrizes the
similarity function . When data is abundant, a common approach is to learn a siamese
network - A deep network model with parameter sharing.
Metric learning
Similarity learning is closely related to distance metric learning. Metric learning is the task of learning a
distance function over objects. A metric or distance function has to obey four axioms: non-negativity,
identity of indiscernibles, symmetry and subadditivity (or the triangle inequality). In practice, metric
learning algorithms ignore the condition of identity of indiscernibles and learn a pseudo-metric.
When the objects are vectors in , then any matrix in the symmetric positive semi-definite cone
defines a distance pseudo-metric of the space of x through the form
. When is a symmetric positive definite matrix, is a
metric. Moreover, as any symmetric positive semi-definite matrix can be decomposed as
where and , the distance function can be rewritten equivalently
. The distance
corresponds to the Euclidean distance between the transformed feature
vectors and .
Many formulations for metric learning have been proposed.[4][5] Some well-known approaches for metric
learning include Learning from relative comparisons[6] which is based on the Triplet loss, Large margin
nearest neighbor,[7] Information theoretic metric learning (ITML).[8]
In statistics, the covariance matrix of the data is sometimes used to define a distance metric called
Mahalanobis distance.
Applications
Similarity learning is used in information retrieval for learning to rank, in face verification or face
identification,[9][10] and in recommendation systems. Also, many machine learning approaches rely on
some metric. This includes unsupervised learning such as clustering, which groups together close or similar
objects. It also includes supervised approaches like K-nearest neighbor algorithm which rely on labels of
nearby objects to decide on the label of a new object. Metric learning has been proposed as a preprocessing
step for many of these approaches.[11]
Scalability
Metric and similarity learning naively scale quadratically with the dimension of the input space, as can
easily see when the learned metric has a bilinear form . Scaling to higher dimensions
can be achieved by enforcing a sparseness structure over the matrix model, as done with HDSL,[12] and
with COMET.[13]
Software
metric-learn (https://github.com/scikit-learn-contrib/metric-learn) is a free software Python
library which offers efficient implementations of several supervised and weakly-supervised
similarity and metric learning algorithms. The API of metric-learn is compatible with scikit-
learn.[14]
OpenMetricLearning (https://github.com/OML-Team/open-metric-learning) is a Python
framework to train and validate the models producing high-quality embeddings.
See also
Kernel method
Learning to rank
Latent semantic analysis
Further reading
For further information on this topic, see the surveys on metric and similarity learning by Bellet et al.[4] and
Kulis.[5]
References
1. Chechik, G.; Sharma, V.; Shalit, U.; Bengio, S. (2010). "Large Scale Online Learning of
Image Similarity Through Ranking" (http://www.jmlr.org/papers/volume11/chechik10a/chechi
k10a.pdf) (PDF). Journal of Machine Learning Research. 11: 1109–1135.
2. Gionis, Aristides, Piotr Indyk, and Rajeev Motwani. "Similarity search in high dimensions via
hashing." VLDB. Vol. 99. No. 6. 1999.
3. Rajaraman, A.; Ullman, J. (2010). "Mining of Massive Datasets, Ch. 3" (http://infolab.stanford.
edu/~ullman/mmds.html).
4. Bellet, A.; Habrard, A.; Sebban, M. (2013). "A Survey on Metric Learning for Feature Vectors
and Structured Data". arXiv:1306.6709 (https://arxiv.org/abs/1306.6709) [cs.LG (https://arxiv.
org/archive/cs.LG)].
5. Kulis, B. (2012). "Metric Learning: A Survey" (https://www.nowpublishers.com/article/Details/
MAL-019). Foundations and Trends in Machine Learning. 5 (4): 287–364.
doi:10.1561/2200000019 (https://doi.org/10.1561%2F2200000019).
6. Schultz, M.; Joachims, T. (2004). "Learning a distance metric from relative comparisons" (htt
ps://papers.nips.cc/paper/2366-learning-a-distance-metric-from-relative-comparisons.pdf)
(PDF). Advances in Neural Information Processing Systems. 16: 41–48.
7. Weinberger, K. Q.; Blitzer, J. C.; Saul, L. K. (2006). "Distance Metric Learning for Large
Margin Nearest Neighbor Classification" (http://books.nips.cc/papers/files/nips18/NIPS2005
_0265.pdf) (PDF). Advances in Neural Information Processing Systems. 18: 1473–1480.
8. Davis, J. V.; Kulis, B.; Jain, P.; Sra, S.; Dhillon, I. S. (2007). "Information-theoretic metric
learning" (http://www.cs.utexas.edu/users/pjain/itml/). International Conference in Machine
Learning (ICML): 209–216.
9. Guillaumin, M.; Verbeek, J.; Schmid, C. (2009). "Is that you? Metric learning approaches for
face identification" (http://hal.inria.fr/docs/00/58/50/36/PDF/verbeek09iccv2.pdf) (PDF). IEEE
International Conference on Computer Vision (ICCV).
10. Mignon, A.; Jurie, F. (2012). "PCCA: A new approach for distance learning from sparse
pairwise constraints" (http://hal.archives-ouvertes.fr/docs/00/80/60/07/PDF/12_cvpr_ldca.pd
f) (PDF). IEEE Conference on Computer Vision and Pattern Recognition.
11. Xing, E. P.; Ng, A. Y.; Jordan, M. I.; Russell, S. (2002). "Distance Metric Learning, with
Application to Clustering with Side-information" (https://ai.stanford.edu/~ang/papers/nips02-
metric.pdf) (PDF). Advances in Neural Information Processing Systems. 15: 505–512.
12. Liu; Bellet; Sha (2015). "Similarity Learning for High-Dimensional Sparse Data" (http://jmlr.or
g/proceedings/papers/v38/liu15.pdf) (PDF). International Conference on Artificial
Intelligence and Statistics (AISTATS). arXiv:1411.2374 (https://arxiv.org/abs/1411.2374).
Bibcode:2014arXiv1411.2374L (https://ui.adsabs.harvard.edu/abs/2014arXiv1411.2374L).
13. Atzmon; Shalit; Chechik (2015). "Learning Sparse Metrics, One Feature at a Time" (http://jml
r.org/proceedings/papers/v44/atzmon2015.pdf) (PDF). J. Mach. Learn. Research (JMLR).
14. Vazelhes; Carey; Tang; Vauquier; Bellet (2020). "metric-learn: Metric Learning Algorithms in
Python" (https://www.jmlr.org/papers/volume21/19-678/19-678.pdf) (PDF). J. Mach. Learn.
Research (JMLR). arXiv:1908.04710 (https://arxiv.org/abs/1908.04710).