0% found this document useful (0 votes)
54 views

New Proofs

This document provides three proofs of the triangle inequality for the scalar log-determinant divergence. It also proves that the square root of the Jensen-Shannon divergence satisfies the triangle inequality for scalars, showing it is a metric on the space of positive numbers. The first proof uses the arithmetic-geometric mean inequality and properties of logarithms. The second proof applies the exponential function and logarithm properties. A third elementary proof is also provided for the log-determinant divergence triangle inequality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

New Proofs

This document provides three proofs of the triangle inequality for the scalar log-determinant divergence. It also proves that the square root of the Jensen-Shannon divergence satisfies the triangle inequality for scalars, showing it is a metric on the space of positive numbers. The first proof uses the arithmetic-geometric mean inequality and properties of logarithms. The second proof applies the exponential function and logarithm properties. A third elementary proof is also provided for the log-determinant divergence triangle inequality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

TRIANGLE INEQUALITIES FOR LOG-DETERMINANT AND

JENSEN-SHANNON DIVERGENCES

VAN-QUY NGUYEN, DUC-CHIN VAN, BA-CAN QUOC VO

Abstract. In this note we provide three elementary proofs of the triangle


inequality for the scalar log-determinant divergence which was proved in [S.Sra.
Proc. AMS. 144 (2016)]. We also prove that the square root of the Jensen-
Shannon divergence satisfies the triangle inequality, and hence, a metric on
(0, ∞).

1. Introduction

For 0 ≤ a ≤ b, let us consider the following least square problem with respect to
some distance function d on R:

(1) min (d2 (x, a) + d2 (x, b)).


x∈[a,b]

For the Euclidean distance dE (a, b) = |a − b|, it is obvious that the arithmetic
mean (a + b)/2 is the unique solution of (1). While with respect to the Riemannian

distance dR (a, b) = | log a − log b| = dE (log a, log b), the geometric mean ab is the
unique solution of (1). And the difference
a+b √
(2) D(a, b) = − ab
2
is nothing but the distance between two solutions of problem (1) with respect to
different distance functions dE and dR .
Now let A and B be positive definite matrices in the algebra M n of n×n matrices
over C. The Euclidean distance DE (A, B) and the Riemannian distance DR (A, B)
are defined as
n
!1/2
X
2 1/2 2 −1
DE (A, B) = (Tr ((A − B) )) and DR (A, B) = log λi (A B) ,
i=1

2010 Mathematics Subject Classification. 15A24, 65F10, 47H10.


Key words and phrases. Triangle inequality, metric property, Jensen-Shannon divergence, log-
determinant divergence.
1
2 VAN-QUY NGUYEN, DUC-CHIN VAN, BA-CAN QUOC VO

where λi (A−1 B) are eigenvalues of the matrix A−1/2 BA−1/2 .


In many applications, positive definite matrices play the roles as data points.
Therefore, the following least squares problem for matrices is very meaningful:

(3) min (D2 (A, X) + D2 (B, X)),


X>0

where D is some distance on the cone Pn of positive definite elements of M n . It


is also well-known that the matrix arithmetic mean (A + B)/2 and the matrix
geometric mean A]B = A1/2 (A−1/2 BA−1/2 )1/2 A1/2 are unique positive definite
solution of (3) in DE (A, B) and DR , respectively. Here, for a real-valued function
f and a Hermitian matrix A ∈ M n the matrix f (A) is understood by means of the
functional calculus.
In recent years, many researchers have paid attention to different distance func-
tion on the set Dn of positive definite matrices. Along with the traditional Rie-
mannian metric dR (A, B), there are other important functions:

• The Bures-Wasserstein distance which is adapted from the theory of opti-


mal transport [1] is defined as:
 1/2
DB (A, B) = Tr (A + B) − 2Tr ((A1/2 BA1/2 )1/2 ) ;

• The Hellinger metric or Bhattacharya metric in quantum information [4]:


1/2
DH (A, B) = Tr (A + B) − 2Tr (A1/2 B 1/2 ) .

• The log-determinant distance in quantum information and machine learning


[5]:
   1/2
A+B 1
DL (A, B) = log det − log det(AB) .
2 2

Mention that the three distances mentioned above are matrix generalizations of the
difference (2) for scalars. In addition, all distances DB , DH and DL are metrics on
Pn (see [1, 4, 6].) Interestingly, the proofs of the metric property of DL was based
on the following scalar case.

Theorem 1. [6, Lemma 3.4] Define the scalar version of DL as


q √
δl (a, b) = log[(a + b)/2 ab], a, b > 0.
TRIANGLE INEQUALITIES FOR LOG-DETERMINANT AND JENSEN-SHANNON DIVERGENCES
3

Then δl satisfies the triangle inequality

(4) δl (a, b) ≤ δl (a, c) + δl (b, c),

and hence δl is a metric on (0, ∞).

Interestingly, Cherian et al. [3] claimed that DL might not be metric, whereas
Chebbi and Moahker [2] conjectured that DL is a metric. Finally, in [5] mention
that DL is a metric, and the proof was given in [6]. Given that the proof of Theorem
1 by Sra is short but it is not trivial. He has to prove the positive definiteness of
 −β
x+y
the function for x, y > 0 and β > 0 which is not elementary.
2
Recently, using the metric property of DL , Virosztek [7] proved that for f (t) =
t log t (t > 0) the square root of the quantum Jensen-Shannon divergence given by
  
1 A+B
DJS (A, B) = Tr (f (A) + f (B)) − Tr f
2 2
is a metric on the cone of positive matrices, and hence in particular on the quantum
state space. Nevertheless, we did not see any elementary proof of this fact even for
scalars.
Motivated by works mentioned above, in this note we provide three elementary
proofs of Theorem 1 using the AGM inequality and some basic properties of the
logarithmic and exponential functions. Since we do not know any elementary proof
of Vorisztek’s result for scalars, here we show that the square root of the scalar
Jensen-Shannon divergence
  1/2
1 a+b
(5) dJS (a, b) = (f (a) + f (b)) − f
2 2
satisfies the triangle inequality, and hence, is a metric on (0, ∞).

2. Proofs

Without loss of generality, we use the natural logarithm in all proofs. Now we
give 2 different proofs of Theorem 1.
The first proof of Theorem 1. Squaring both sides of (4) and simplifying
like terms, we obtain
s
a+b a+c b+c a+c b+c
(6) ln √ ≤ ln √ + ln √ + 2 ln √ · ln √ ,
2 ab 2 ac 2 bc 2 ac 2 bc
4 VAN-QUY NGUYEN, DUC-CHIN VAN, BA-CAN QUOC VO

or, equivalently,
s
2c(a + b) a+c b+c
(7) ln ≤2 ln √ · ln √ .
(a + c)(b + c) 2 ac 2 bc
It is obvious that (7) is true when 2c(a+b) ≤ (a+c)(b+c). Therefore, it is sufficient
to prove (7) for the case when 2c(a + b) > (a + c)(b + c) which is equivalent to
(a − c)(c − b) > 0. Mention that for any x ≥ 0,
x
(8) ≤ ln(1 + x) ≤ x.
x+1
On account of (8) we have
 
2c(a + b) (a − c)(c − b) (a − c)(c − b)
ln = ln 1 + ≤ ,
(a + c)(b + c) (a + c)(b + c) (a + c)(b + c)
and
√ √  √ √
( a − c)2 ( a − c)2

a+c
ln √ = ln 1 + √ ≥ .
2 ac 2 ac a+c
Similarly, we also have
√ √
b+c ( c − b)2
ln √ ≥ .
2 bc b+c
Therefore, inequality (7) follows if
s
√ √ √ √
(a − c)(c − b) ( a − c)2 ( c − b)2
≤2 · ,
(a + c)(b + c) a+c b+c
or, equivalently,
√ √ √ √
( a + c)2 ( b + c)2 ≤ 4(a + c)(b + c).
√ √ √ √
The last inequality follows from the fact that ( a+ c)2 ≤ 2(a+c) and ( b+ c)2 ≤
2(b + c).

In the difference to the first proof, here we apply the exponent function and two
basic facts of the natural logarithmic function.
The second proof of Theorem 1.
Applying the exponential function to both sides of (6) we obtain
1/2
a+b a+c c+b 2

a+c c+b
ln √ ln √
(9) √ ≤ √ √ e 2 ac 2 cb .
2 ab 2 ac 2 cb
Using the fact that et ≥ 1 + t (t ≥ 0) we have
1/2  1/2
a+c c+b

2 ln 2a+c
√ ln c+b

e ac 2 cb ≥ 1 + 2 ln √ ln √ .
2 ac 2 cb
TRIANGLE INEQUALITIES FOR LOG-DETERMINANT AND JENSEN-SHANNON DIVERGENCES
5

Therefore, (9) follows if


 1/2 !
a+b a+c c+b a+c c+b
(10) √ ≤ √ √ 1 + 2 ln √ ln √ .
2 ab 2 ac 2 cb 2 ac 2 cb
1
On account of the fact that ln t ≥ 1 − t for any t ≥ 1, the inequality (10) follows if
the following inequality

 √ 1/2 √ !1/2 
a+c c+b  2 ac 2 cb  ≥ a√
+b
√ √ 1+2 1− 1−
2 ac 2 cb a+c c+b 2 ab

holds true. The last inequality is equivalent to the following


√ √ √ √
(11) (a − c)(c − b) ≤ 2((a + c)(b + c)( a − c)( c − b))1/2 .

From here it is obvious that if c ≤ a or c ≥ b, then the original inequality holds.


For the case a < c < b, squaring both side of (11) and simplifying like terms, we
obtain
√ √ 2  √ √ 2
4 (a + c) (c + b) ≥ a+ c c+ b

which is true according to the AGM inequality.

Now, we give an elementary proof of the metric property for the Jensen-Shannon
divergence. Using the same idea, we give the third proof of Theorem 1.
We need the following lemma.

Lemma 2. If x, y are positive real numbers satisfying x + y = 2, then

(12) x ln x + y ln y + ln x ln y ≤ 0.

Proof. Without loss of generality, we may assume that x ≤ y. The inequality (12)
can be rewritten as h(x) ≤ 0, where

h(t) = t ln t + (2 − t) ln(2 − t) + ln t ln(2 − t).

Notice that the function h(t) is continuous on (0, 1] and


 
0 (1 − t) (2 − t) ln(2 − t) + t ln t
h (t) = > 0,
t(2 − t)
where the positivity of h0 (t) follows from the Jensen inequality
(2 − t) ln(2 − t) + t ln t 2−t+t 2−t+t
> ln = 0, t ∈ (0, 1).
2 2 2
6 VAN-QUY NGUYEN, DUC-CHIN VAN, BA-CAN QUOC VO

Therefore, h(t) is increasing on the interval (0, 1]. Consequently,

h(x) ≤ h(1) = 0.

Now we are ready to show that the square root of the scalar Jensen-Shannon
divergence (5) satisfies the triangle inequality.

Theorem 3. Let a, b, c be positive real numbers. Then

(13) dJS (a, b) + dJS (b, c) ≥ dJS (a, c).

Proof. Notice that


 
a ln a + b ln b a + b a + b 1 2a 2b
d2JS (a, b) = − ln = a ln + b ln .
2 2 2 2 a+b a+b
It is easy to see that the function dJS (a, b) is increasing in a and decreasing in b.
Therefore, if b ≥ a, then dJS (a, c) ≤ dJS (b, c). Similarly, if b ≤ c, then dJS (a, c) ≤
dJS (a, b). Thus, it is sufficient to consider the case c < b < a.
Now, let us consider the function

g(x) = 2(dJS (x, b) − dJS (x, c)).

The function g(x) is continuous on [b, +∞), and


 
2x 2x
1 ln x+b ln x+c
g 0 (x) =  q −q .
2 2x 2b
x ln x+b + b ln x+b 2x 2c
x ln x+c + c ln x+c

We now show that g 0 (x) > 0 for any x > b. For this, we need to show that ϕ(b) >
2x
ln
ϕ(c), where ϕ(t) = √ 2x
x+t
2t
. Indeed, the function ϕ(t) is continuous on the
x ln x+t +t ln x+t

interval [c, +∞) and


2t 2t 2x 2x 2t 2x
ln x+t + ln x+t + ln x+t ln x+t
ϕ0 (t) = − x+t r x+t
3 .
2x 2t
2 x ln x+t + t ln x+t

Since 2t
x+t + 2x
x+t = 2, by Lemma 2, we have ϕ0 (t) ≥ 0 for any t > c. Therefore, the
function ϕ(t) is increasing on [c, +∞), and hence, ϕ(b) > ϕ(c). That means, the
the function g(x) is increasing on the interval [b, +∞). Therefore,
g(b) g(a)
dJS (b, b) + dJS (b, c) = √ ≤ √ = dJS (a, b) − dJS (a, c).
2 2
TRIANGLE INEQUALITIES FOR LOG-DETERMINANT AND JENSEN-SHANNON DIVERGENCES
7

From here it imlies that

dJS (a, c) ≤ dJS (a, b) + dJS (b, c).

Using similar arguments as in the proof in Theorem 3, we give a new proof of


Theorem 1.
The third proof of Theorem 1. Firstly mention that the function
  1/2  1/2
1+t 1
h(t) = ln √ = ln(1 + t) − ln 2 − ln t
2 t 2

is increasing on [1, ∞) and decreasing on (0, 1]. Therefore, the function δl (a, b) is
increasing in a and decreasing in b. Hence, for c ≤ a, we have δl (a, b) ≤ δl (c, b).
Similarly, for c ≥ b, we have δl (a, b) ≤ δl (c, a). Thus, in these cases, the triangle
inequality (4) is true.
Now, suppose that a < c < b. If we show that the following function

H(x) = δl (c, x) − δl (x, a)

is increasing on [c, b], then from H(c) ≤ H(b) we obtain

δl (c, c) − δl (c, a) ≤ δl (c, b) − δl (b, a)

which is equivalent to the triangle inequality (4).


We have
x−c x−a
H 0 (x) = − .
4x(x + c)δl (c, x) 4x(x + a)δl (x, a)
In order to show that H 0 (x) ≥ 0 for any x ∈ (c, b), we need to prove that the
x−t
function h(t) = (x+t)δl (t,x) is increasing on [a, c]. Indeed, we have
 
t+x (t−x)2
2 ln √
2 xt
− 4xt
h0 (t) = − 3 .
16xt2 (x + 2
t) δl (x, t)

If we put u = (t + x)2 , v = 4xt, then u ≥ v and

(t − x)2
  u u − v
t+x u u
2 ln √ − = ln − = ln + 1 − ≤ 0,
2 xt 4xt v v v v

where the inequality follows from (8). Therefore, the function h(t) is increasing on
[a, c]. Hence, the function H(x) is increasing on [c, b]. We finish the proof.
8 VAN-QUY NGUYEN, DUC-CHIN VAN, BA-CAN QUOC VO

Acknowledgement. We would like to express our sincere thanks to Dr. Trung


Hoa Dinh for proposing these problems to us. We also thank him for valuable
comments and suggestions that help improve this note.

References

[1] R.Bhatia, T.Jain, Y.Lim. On the Bures-Wasserstein distance between positive definite ma-
trices. Expositiones Mathematicae. DOI10.1016/j.exmath.2018.01.002
[2] Z.Chebbi, M.Moahker. Means of Hermitian positive-definite matrices based on the log-
determinant α-divergence function. Linear Algebra Appl. 436 (2012), 1872-1889.
[3] A.Cherian, S.Sra, A.Banerjee, N.Papanikolopoulos. Efficient Similarity Search for Covari-
ance Matrices via the Jensen-Bregman Log-Det Divergence. International Conference on
Computer Vision (ICCV), Nov. 2011.
[4] D.Spehner, F.Illuminati, M.Orszag, W.Roga. Geometric measures of quantum correlations
with Bures and Hellinger distances. ArXiv e-prints, November 2016.
[5] S.Sra. A new metric on the manifold of kernel matrices with application to matrix geometric
means. Advances in Neural Information Processing Systems (NIPS), Dec. 2012.
[6] S.Sra. Positive definite matrices and the S-divergence. Proc. Amer. Math. Soc. 144 (2016),
2787-2797.
[7] R.Virosztek. The metric property of the quantum Jensen-Shannon divergence. Advances in
Math. 380 (2021), 107595.

Van-Quy Nguyen, Department of Mathematics, Ha Noi University of Science, Ha Noi,


Viet Nam,

Duc-Chin Van, Luong The Vinh Highschool, Yen Xa, Tan Trieu, Thanh Tri, Ha Noi,
Viet Nam,

Ba-Can Quoc Vo, Archimedes Academy, 10 Trung Yen, Trung Hoa, Cau Giay, Ha Noi,
Viet Nam,

You might also like