Euclidean Distance Matrix
Euclidean Distance Matrix
Abstract
Fundamentally, information about points within Euclidean space was incomplete. As the
complexities of the world developed, we started looking for methods to retrieve more
information about the distance within a Euclidean space. A Euclidean distance matrix (EDM) is
a matrix that shows the spacing of “n” points in Euclidean space. A Euclidean space is a finite
dimensional real vector space. The matrix is an nxn matrix representing n points in a space. In no
time, we realized the many underlying applications of interpoint distance calculations which is
conveniently represented by an EDM. EDMs are frequently used in self – localization, 3D
animations, network topology and molecular conformation. In this paper, we will look at how to
construct a map of a geographic area with a certain extent of accuracy. We will be attempting to
convert the minutes into a distance problem using EDMs. For that, we will need an eigenvalue
decomposition of a symmetric matrix. We will be using the travelling time, in minutes, from an
initial point to calculate an approximate distance of that point to the initial point. Basically, a
rough 2D impression.
What is a matrix?
A set of number or numbers displayed in the format of columns and rows is known as a matrix
(plural: matrices). It can represent any type of related data. They are used to display quick
approximation of problem that would take hours and hours of
calculation. Matrices are conveniently accepted as a data type for
several programming languages and are comparatively more
flexible than a static array.
Distance Matrix
A distance matrix is matrix that shows the distance between two points on an object. There are
two different types of distance matrices, metric and non-metric. For this paper we will be
focusing on the metric distance matrices. Primarily, the matrix
B 43
shows the distance of a point from itself. Consequently, the main
C 21 56 diagonal is 0. For example,
A B C D
A 0D 135 43 21 18 212135
B 43 0 0
56 180
C 21 56 A 0B 212 C
D 13 180 21 0
5 2
There are other alternate ways of displaying a distance matrix for example,
There are some general properties of a metric distance matrices which can be listed down as;
The hollow matrix
the entries on the main diagonal are all zero i.e. xii = 0 for all 1 ≤ i ≤ N
The non-negative matrix
all the off-diagonal entries are positive (xij > 0 if i ≠ j)
The symmetry of the matrix
the matrix is a symmetric matrix (xij = xji)
The Triangle Inequality
for any i and j, xij ≤ xik + xkj for all k This can be stated in terms of tropical matrix
multiplication.
Development of EDM
The contributions of Carl Menger, Schoenberg, Blumenthal, and Young and Householder
towards the development of EDM is commendable. Their work has made significant contribution
to the development of the applications and tools that use EDM for pivotal parts of the process.
The most important class of EDM was developed for data visualization however, in 1952 the
concept of Multidimensional scaling was introduced by Torgerson.
This formula plays a crucial part in its application as the existence and
derivation of any algorithm is useless without it.
Multidimensional Scaling
Multidimensional scaling (MDS) is the method of visual representation of the extent of similarity
of separate units of similar data. MDS is used in translating information about the pairwise
distances among the set of n objects and imprinting it onto a cartesian space. Simply,
multidimensional scaling is a method used to display any kinds of spatially using a map,
generally 2- dimensional. (WISH, 1978)
Hypothetically, if a person is shown a map and asked to construct a matrix of distances between
two cities. He can easily measure the distance and using the conversion, such as 2cm: 15 km of
the map scale and record it. E.g., 7 cm on map means the city is 105 km away. Problem arises
when one must go in reverse. If you are given the distances as vector matrix. Although many
geometrical methods are available, the most convenient of them is using multidimensional
scaling. The easiest of it is to produce a 2-dimensional map but the number of dimensions is not
limited to two.
We can use the above-mentioned formulas to calculate the distance between two points. Ext, we
set up all the vectors in form of a column and use the equation (1) mentioned above to create a 2-
D view. We can make use of tools such as Numpy, PyTorch and Tenserflow to modify the
simple data. (Albani, 2019) (Grainti)
Converting the Minute matrix into a Map
For our calculations we have assumed 3 imaginary towns, namely, Hurbes, Tanes and Galvat.
We have also assumed some values as travelling time between 3 town Hurbes, Tanes and Galvat.
The travelling time between the towns is.
From Hurbes to Hurbes: 0 minutes
From Hurbes to Tanes: 33 minutes
From Hurbes to Galvat: 128 minutes
From Tanes to Galvat: 158 minutes
H T G
T =
H 0 33 128
T 33 0 158
G 128 158 0
Before we begin with our calculations, lets establish the fact that out of many ways of doing it
we will be utilizing the classical MDS method. The Algorithm used is;
The first step is to geometrically center the matrix for that we use the formula:
1
J=I– 1 1T
n
[]
1 0 0 1
J= 0 1 0 - 1/3 1 . [ 1 1 1 ]
0 0 1 1
[ ]
0.67 −0.33 −0 . 33
J = −0 .33 0. 67 −0.33
−0.33 −0.33 0.67
Now we calculate the Gram matrix:
G= -1/2 JDJ
[ ]
−52.8 −56.39 −67.21
G= −56.39 −56.16 −70.63
−67.21 −70.63 −66.82
We use the eigenvalue decomposition to compute its eigenvalues ‘人’ and eigenvectors ‘v’
[ ]
−188.81 0 0
Λ= 0 1.87 0
0 0 11.15
[ ] [ ] [ ]
0.86 −12.17 −0.48
v1 = 0.9 , v2 = 10.61 , v3 = −0.64
1 1 1
[ ]
0.53 −0.75 −0.37
µ = 0.56 0.65 −0.49
0.62 0.06 0.78
Now we can finally reconstruct the map, we use the normalized vectors as anchors and scale
them up or down to match the actual map.
0
1 2 3 4 5 6 7 8 9
0 √2 3 2 √10 √34
√ 2 0 √ 5 √ 26 2 √5
0 √ 5 √ 26 2 √5
3 √5 0 √ 13 √ 13 √ 5 0 √13 √13
2 √ 10 √ 26
√ 26 √ 13 0 √2
√ 13 0 √2 2 √5 √ 13 √ 2 0
√34 2 √5 √ 13 √ 2 0
Once we have completed are distance matrix we observe the smallest distance between two data
points (d1 and d2) and cluster them together, once again we form a new distance matrix by
substituting the two data points with a cluster ‘A’ and repeat this process to form another cluster
‘B’. We observe the remaining data points and add them to the already existing clusters till no
further data points are left and only the clusters remain. We can represent are results in the form
of a dendrogram to compare similarities between the data points.
While this process might looks trivial, cluster analysis is quite an intuitive tool in machine
learning. A good example for its use in machine learning would be if you’re trying to teach a
computer to differentiate between two objects like cats and dogs, if you plot them as data points
on a graph with weight and height on x and y axis, you can observer that data points representing
cats would be clustered in one corner and data points representing dogs would be clustered in the
other. While a computer might not be able to differentiate between the two, using cluster analysis
we can teach the computer to look for the cluster nearest to the data point to make a decision.
Bibliography
Albani, S. (2019). Euclidean Distance Matrix Trick. University of Oxford.
Ivan Dokmanic, R. P. (2015 ). Euclidean Distance Matrices - Essential Theory, Algorithms and
Applications.