18CSE397T - Computational Data Analysis Unit - 3: Session - 8: SLO - 2
18CSE397T - Computational Data Analysis Unit - 3: Session - 8: SLO - 2
Similarity Measure
Numerical measure of how alike two data objects are.
Often falls between 0 (no similarity) and 1 (complete similarity).
Dissimilarity Measure
Numerical measure of how different two data objects are.
Range from 0 (objects are alike) to ∞ (objects are different).
Proximity refers to a similarity or dissimilarity.
Similarity/Dissimilarity for Simple Attributes
Here, p and q are the attribute values for two data objects.
Attribute Type Similarity Dissimilarity
d={01 if p=q if p
Nominal s={10 if p=q if p≠q
≠q
s=1−∥p−q∥n−1
Ordinal (values mapped to integer 0 to n-1, d=∥p−q∥n−1
where n is the number of values)
Interval or
s=1−∥p−q∥,s=11+∥p−q∥ d=∥p−q∥
Ratio
dWE(i,j)=(∑k=1pWk(xik−xjk)2)12
Minkowski Distance
The Minkowski distance is a generalization of the Euclidean distance.
limλ→∞=(∑k=1p|xik−xjk|λ)1λ=max(|xi1−xj1|,...,|xip−xjp|)
Note that λ and p are two different parameters. Dimension of the data matrix remains finite.
Mahalanobis Distance
Let X be a N × p matrix. Then the ith row of X is
xTi=(xi1,...,xip)
The Mahalanobis distance is
dMH(i,j)=((xi−xj)TΣ−1(xi−xj))12
where ∑ is the p×p sample covariance matrix.
Self-check
Think About It!
Calculate the answers to these questions by yourself and then click the icon on the left to reveal the
answer.
1. We have X=⎛⎝⎜112322112222412⎞⎠⎟.
2. We have X=⎛⎝⎜2103372⎞⎠⎟.
Calculate the Minkowski distance (λ = 1, λ = 2, and λ → ∞ cases) between the first and
second objects.
Calculate the Mahalanobis distance between the first and second objects.
p = 1 0 0 0 0 0 0 0 0 0
q = 0 0 0 0 0 0 1 0 0 1