0% found this document useful (0 votes)
2 views

HMMs and Tensor Networks

Uploaded by

HARI Nandakumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

HMMs and Tensor Networks

Uploaded by

HARI Nandakumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Tensor Rings for Learning Circular Hidden

Markov Models

Mohammad Ali Javidian, Vaneet Aggarwal, and Zubin Jacob


School of Electrical and Computer Engineering
Purdue University
West Lafayette, IN, 47907
{mjavidia, vaneet, zjacob}@purdue.edu

Abstract
In this paper, we propose circular Hidden Quantum Markov Models (c-
HQMMs), which can be applied for modeling temporal data in quantum
datasets (with classical datasets as a special case). We show that c-HQMMs
are equivalent to a constrained tensor network (more precisely, circular
Local Purified State with positive-semidefinite decomposition) model. This
equivalence enables us to provide an efficient learning model for c-HQMMs.
The proposed learning approach is evaluated on six real datasets and demon-
strates the advantage of c-HQMMs on multiple datasets as compared to
HQMMs, circular HMMs, and HMMs.

1 Introduction
Hidden Markov Models (HMMs) are commonly used for modeling temporal data, usually,
in cases where the underlying probability distribution is unknown, but certain output
observations are known [Rabiner and Juang, 1986, Zucchini and MacDonald, 2009]. Hidden
Quantum Markov Models (HQMMs) [Monras et al., 2010, Clark et al., 2015] can be thought
of a reformulation of HMMs in the language of quantum systems (see section 2 for the formal
definition). It has been shown that quantum formalism allows for more efficient description
of a given stochastic process as compared to the classical case [Monras et al., 2010, Clark
et al., 2015, Adhikary et al., 2021]. In this paper, we propose circular HQMMs, and validate
them to be more efficient than HQMMs.
We note that circular HMMs (c-HMMs) have been proposed to model HMMs, where the
initial and terminal hidden states are connected through the state transition probability
[Arica and Vural, 2000]. c-HMMs have found application in speech recognition [Zheng
and Yuan, 1988, Shahin, 2006], biology and meteorology [Holzmann et al., 2006], shape
recognition [Arica and Vural, 2000, Cai et al., 2007], biomedical engineering [Coast et al.,
1990], among others. Given the improved performance of c-HMMs as compared to HMMs, it
remains open if such an extension can be done for HQMMs, which is the focus of this paper.
Even though multiple algorithms for learning HQMMs have been studied, direct learning of
the model parameters is inefficient and ends up to poor local-optimal points [Adhikary et al.,
2020]. In order to deal with this challenge, a tensor network based approach is used to learn
HQMMs [Adhikary et al., 2021], based on a result that HQMM is equivalent to uniform
locally purified states (LPS) tensor network. The model in [Adhikary et al., 2021] deals
with infinite horizon HQMM, which involves uniform Kraus operators and thus uniform
LPS. In this paper, we model a finite sequence of random variables, which allows us to have
different Kraus operators at each time instant. We further extend the finite-horizon HQMMs
to circular HQMMs (c-HQMMs).

35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia.
In order to train the parameters of the c-HQMM, we show the equivalence of c-HQMM with
a restricted class of tensor networks. In order to do that, we first define a class of tensor
networks, called circular LPS (c-LPS). Then, we show that c-HQMM is equivalent to c-LPS
where certain matrices formed from the decomposition are positive semi-definite (p.s.d.).
Finally, we propose an algorithm to train c-LPS with p.s.d. restrictions, thus providing an
efficient algorithm for learning c-HQMMs.
The results in this paper show equivalence of finite-horizon HQMMs and c-HQMMs to the
corresponding tensor networks. Further, we show that c-HMM is equivalent to a class of
tensor networks (circular Matrix Product State (c-MPS)) with non-negative real entries.
This allows for an alternate approach of learning c-HMMs, and may be of independent
interest. The key contributions of this work are summarized as follows:
• We propose c-HQMM for modeling finite-horizon temporal data.
• We show that c-HQMMs are equivalent to c-LPS tensor networks with positive semi-definite
matrix structure in the decomposition. Further, equivalent tensor structures for finite horizon
HQMMs and c-HMMs are also provided.
• Learning algorithm for c-HQMM is provided using the tensor network equivalence.
In order to validate the proposed framework of c-HQMM and the proposed learning algorithm,
we compare with standard HMMs (equivalent to MPS with non-negative real entries in
decomposition), c-HMMs, and HQMMs. Evaluation on realistic datasets demonstrate the
improved performance of c-HQMMs for modeling temporal data.

2 Related Work and Background


In this section, we briefly review the key related literature on hidden Markov models and
tensor networks, with relevant definitions.
Hidden Markov Models (HMMs) [Rabiner and Juang, 1986, Zucchini and MacDonald, 2009]
are a class of probabilistic graphical models that have found greatest use in problems that
enjoy an inherent temporality. These problems consist of a process that unfolds in time,
i.e., we have states at time t that are influenced directly by a state at t − 1. HMMs have
found application in such problems, for instance speech recognition [Juang and Rabiner,
1991], gesture recognition [Wilson and Bobick, 1999], face recognition [Nefian and Hayes,
1998], finance [Mamon and Elliott, 2007], computational biology [Siepel and Haussler, 2004,
Koski, 2001, Krogh et al., 1994], among others. A finite-horizon hidden Markov model or
HMM, as shown in Figure (1a), consists of a discrete-time, discrete-state Markov chain, with
hidden states Xt ∈ {1, · · · , d}, t ∈ {1, · · · , N }1 , plus an observation model p(ot |xt ). The
corresponding joint distribution has the form:
p(X1:N , O1:N ) = p(X1:N )p(O1:N |X1:N ) = p(x1 )ΠN t=2 p(xt |xt−1 ) Πt=1 p(ot |xt ) (1)
  N 

As shown in Figure (1a), evolution of hidden states is governed by column-stochastic matrices


Ai ’s, called transition matrices, and emission matrices are column-stochastic matrices Ci ’s
that determine the observation probabilities. Despite the fact that HMMs are a powerful
and versatile tool for statistical modeling of complex time-series data and stochastic dynamic
systems, many real-world problems include circular data (e.g., measurements in the form
of angles or other periodic values), such as biology, climatology, oceanography, geophysics,
and astronomy. In these problems, the periodic nature of the boundary requires a Hidden
Markov topology which is both temporal (has a sequential order) and ergodic to allow the
revisits of a state as the boundary returns to the starting point and repeats itself [Arica and
Vural, 2000]. c-HMMs were proposed to address these problems. Formally, a c-HMM is a
modification of HMM model, where the initial and terminal hidden states are connected
through the state transition probability AN , as shown in Figure (1b). The corresponding
joint distribution has the form:
p(X1:N , O1:N ) = p(X1:N )p(O1:N |X1:N ) = p(x1 |xN )ΠNt=2 p(xt |xt−1 ) Πt=1 p(ot |xt ) (2)
  N 

1
Finite horizon implies that N is finite, and thus the distributions can depend on the time-index.
This paper focuses on finite horizon probability distributions.

2
AN

A1 A2 AN-1 A1 A2 AN -1
Markov process (Latent) : X1 X2 ··· XN Markov process (Latent) : X1 X2 ··· XN

C1 C2 CN C1 C2 CN

Observations : O1 O2 ··· ON Observations : O1 O2 ··· ON

(a) Representation of an HMM with observation (b) Representation of a c-HMM with observation
variables Oi , latent variables Xi , transition ma- variables Oi , latent variables Xi , transition ma-
trices Ai , and emission matrices Ci . trices Ai , and emission matrices Ci .
Figure 1: Hidden Markov Model and Cyclic Hidden Markov Model.

Hidden Quantum Markov Model (HQMM) was introduced in [Monras et al., 2010] to model
evolution from one quantum state to another, while generating classical output symbols.
To produce an output symbol, a measurement or Kraus operation [Kraus et al., 1983] is
performed on the internal state of the machine. To implement a Kraus operation, one can
use an auxiliary quantum system, called ancilla. In every time step, the internal state of
the HQMM interacts with its ancilla, which is then read out by a projective measurement.
After every measurement, the ancilla is reset into its initial state, while the internal state
of the HQMM remains hidden [Clark et al., 2015]. As in the classical case, an HQMM
can be composed by the repeated application of the quantum sum rule (plays the role of
transition matrices in HMMs) and quantum Bayes rule (plays the role of emission matrices
in HMMs) [Adhikary et al., 2019] encoded using the sets of Kraus operators {Kt,w } and
{Kt,x }, respectively, for t ∈ {1, · · · , N }:

X
ρ0t = Kt,w ρt−1 Kt,w (quantum sum rule)
w

Kt,x ρ0t Kx†


ρt = †
(quantum Bayes rule)
tr( x Kt,x ρ0t Kt,x )
P

We can condense these two expressions into a single term for a given observation x by setting
Kt,x,w = Kt,x Kt,w , for t = 1, · · · , N :
P †
w Kt,x,w ρt−1 Kt,x,w
ρt = †
(state update rule)
)
P
tr( w Kt,x,w ρt−1 Kt,x,w

We now formally define HQMMs using the Kraus operator-sum representation (the definition
is modified from [Adhikary et al., 2020] to account for finite N ).
Definition 1 (HQMM). An N-horizon d-dimensional Hidden Quantum Markov Model with
a set of discrete observations O is a tuple (Cd×d , {Ki,x,wx }), where the initial state ρ0 ∈ Cd×d
and the Kraus operators {Ki,x,wx } ∈ Cd×d , for all x ∈ O, i ∈ {1, · · · , N }, wx ∈ N, satisfy the
following constraints:
• ρ0 is a density matrix of arbitrary rank, and
• the full set of Kraus operators across all observables provide a quantum operation,2 i.e.,

x,wx Ki,x,wx Ki,x,wx = I for all i ∈ {1, · · · , N }.
P

The joint probability of a given sequence is given by:


! !
X † X †
p(x1 , · · · , xN ) = I~T
KN,xN ,wx ⊗ KN,xN ,wxN · · · K1,x 1 ,wx1
⊗ K1,x1 ,wx1 ρ~0 (3)
N
wxN wx1

where I~T , ρ~0 indicate vectorization (column-first convention) of identity matrix and ρ0 ,
respectively. This model is illustrated in Figure (3a).
2
P † P †
P †
If w Kw Kw = I, then K = w Kw ρKw is called a quantum channel. However, if w
Kw Kw <
I, then K is called a stochastic quantum operation.

3
This representation was used in [Srinivasan et al., 2018] to show that any d dimensional
HMM can be simulated as an equivalent d2 dimensional HQMM [Srinivasan et al., 2018,
Algorithm 1].
HQMMs enable us to generate more complex random output sequences than HMMs, even
when using the same number of internal states [Clark et al., 2015, Srinivasan et al., 2018].
In other words, HQMMs are strictly more expressive than classical HMMs [Adhikary et al.,
2021].
Tensor Network is a set of tensors (high-dimensional arrays), where some or all of its indices
are contracted according to some pattern [Oseledets, 2011, Orús, 2014]. They have been used
in the study of many-body quantum systems [Orús, 2019, Montangero et al., 2018]. Further,
they have been adopted for supervised learning in large-scale machine learning [Stoudenmire
and Schwab, 2016, Wang et al., 2017, 2018]. Some of the classes of tensor networks we use
in this work include variants of Matrix Product States (MPSs) and Locally Purified States
(LPSs).
One class of tensor networks is Matrix Product State (MPS), where an order-N tensor
Td×···×d , with rank r has entry (x1 , · · · , xN ) (xi ∈ {1, · · · , d) given as
r
α ,α α ,αN
X α0 ,α1 α1 ,α2
Tx1 ,··· ,xN = Aα0 N −2 N −1 N −1
0 A1,x1 A2,x2 · · · AN −1,xN −1 AN,xN Aα
N
N
(4)
{αi }N
i=0
=1

where Ak , k ∈ {0, N + 1}, is a vector of dimension r, where element (αk ) is denoted as


Aαk . Further, Ak , k ∈ {1, · ·α· ,,α
N } is an order-3 tensor of dimension d × r × r, where element
k

(x, αL , αR ) is denoted as Ak,x L R


.

x1 xN x1 xN

T T
(a) An order-N Tensor T (b) An order-N circular Tensor T
x1 x2 xN −1 xN x1 x2 xN −1 xN
α1 ... αN −1 α1 ... αN −1
A1 A2 AN −1 AN A1 A2 AN −1 AN
αN
(c) An order-N MPS T (d) An order-N c-MPS T .
x1 x2 xN −1 xN x1 x2 xN −1 xN
α1 ... αN −1 α1 ... αN −1
A1 A2 AN −1 AN A1 A2 AN −1 AN
αN
β1 β2 βN −1 βN β1 β2 βN −1 βN
0
αN
Ā1 Ā2 ... ĀN −1 ĀN Ā1 Ā2 ... ĀN −1 ĀN
0 0
α10 αN −1 α10 αN −1

¯x1 ¯x2 ¯xN −1 ¯xN ¯x1 ¯x2 ¯xN −1 ¯xN


(e) An order-N LPS T (f) An order-N c-LPS T
Figure 2: Tensor diagrams corresponding to different tensor networks. Black end dots
indicate boundary vectors.

Another class of tensor network that is studied in this paper is the Locally Purified State
(LPS). An order-N MPS with d-dimensional indices, admits an LPS representation of
puri-rank r and purification dimension µ when the entries of T can be written as:
r µ
α ,α00 β ,α0 ,α01 β ,α0 ,α02
X X
Tx1 ,··· ,xN = A0 0 Aβ1,x
1 ,α0 ,α1
1
1
A1,x 1
0 β2 ,α1 ,α2
A2,x 2
2
A2,x 2
1

{αi ,α0i }N
i=0
=1 {βi }N
i=1
=1 (5)
β −1 ,α
N −2 ,αN −1 β −1 ,α
N −2 ,αN −1 β ,αN −1 ,αN βN ,α0N −1 ,α0N αN ,α0N
· · · ANN−1,x N −1
ANN−1,x N −1
N
AN,xN
AN,xN AN +1

4
α ,α0
where Ak , k ∈ {0, N + 1}, is an r × r matrix, where the element (αk , αk0 ) is denoted as Ak k k .
Further, Ak , k ∈ {1, · · · , N }, is an order-4 tensor of dimension d × µ × r × r, where the
element (x, β, αL , αR ) is denoted as Aβ,αk,x
L ,αR
, and elements belong to R or C, as defined
based on the context.
The tensor networks can be represented using tensor diagrams, where tensors are represented
by boxes, and indices in the tensors are represented by lines emerging from the boxes. The
lines connecting tensors between each other correspond to contracted indices, whereas lines
that do not go from one tensor to another correspond to open indices Orús [2014]. The
tensor diagrams corresponding to tensor networks MPS and LPS can be seen in Figure (2c)
and (2e), respectively.

Relation between HMMs and Tensor Networks: As shown recently in [Glasser


et al., 2019, Adhikary et al., 2021], tensor networks have direct correspondence with HMMs.
In particular, non-negative matrix product states (MPS) are HMMs [Glasser et al., 2019],
and uniform locally purified states are QHMMs [Adhikary et al., 2021]. Note that equivalence
assumes that the tensor networks are normalized as the probabilities, while we will not
explicitly normalize the tensor networks in the proofs, while will be accounted in the learning.

Learning of HQMM: Two state-of-the-art algorithms for learning HQMMs were pro-
posed in [Srinivasan et al., 2018, Adhikary et al., 2020]. Both algorithms use an iterative
maximum-likelihood algorithm to learn Kraus operators to model sequential data using an
HQMM. The proposed algorithm in [Srinivasan et al., 2018] is slow and there is no theoretical
gaurantee that the algorithm steps towards the optimum at every iteration [Adhikary et al.,
2020]. The proposed algorithm in [Adhikary et al., 2020], however, uses a gradient-based
algorithm. Although, the proposed algorithm in [Adhikary et al., 2020] is able to learn an
HQMM that outperforms the corresponding HMM, this comes at the cost of a rapid scaling
in the number of parameters. In order to deal with this issue, equivalence between HQMMs
and Tensor Networks have been considered to achieve efficient learning [Glasser et al., 2019,
Adhikary et al., 2021].

3 Proposed c-HQMM

In this section, we propose Circular HQMM (c-HQMM) for modeling temporal data.

Definition 2 (c-HQMM). An N -horizon d-dimensional circular Hidden Quantum Markov


Model (c-HQMM) with a set of discrete observations O is a tuple (Cd×d , {Ki,x,wx }, tr(·)),
where the Kraus operators are given as {Ki,x,wx } ∈ Cd×d , for all x ∈ O, i ∈ {1, · · · , N }, wx ∈
N. The full set of Kraus operators across all observables provide a quantum operation, i.e.,

i,x,wx Ki,x,wx Kx,wx = I, for all i ∈ {1, · · · , N }. The joint probability of a given sequence
P
is given by:

! !!
X X
p(x1 , · · · , xN ) = tr K̄N,xN ,wxN ⊗ KN,xN ,wxN ··· K̄1,x1 ,wx1 ⊗ K1,x1 ,wx1
wxN wx1
(6)
where tr(·) indicates the trace of the resulting matrix. This model is illustrated in Figure
(3b).

We note that this representation along with the algorithm proposed in [Srinivasan et al., 2018,
Algorithm 1] can be used to show that any d dimensional circular HMM can be simulated as
an equivalent d2 dimensional c-HQMM.

5
x1 x2 xN −1 xN x1 x2 xN −1 xN

K1 K2 ... KN −1 KN K1 K2 ... KN −1 KN

β1 β2 βN −1 βN β1 β2 βN −1 βN

¯1
K ¯2
K ... ¯ N −1
K ¯N
K ¯1
K ¯2
K ... ¯ N −1
K ¯N
K

¯x1 ¯x2 ¯xN −1 ¯xN ¯x1 ¯x2 ¯xN −1 ¯xN


(a) A Hidden Quantum Markov Model. The right- (b) A Circular Hidden Quantum Markov Model.
most connecting line at the boundary represents
the application of the identity. Black end dots
indicate boundary vectors.
Figure 3: Hidden Quantum Markov Model and circular Hidden Quantum Markov Model:
with observation variables xi , where Ki are Ki,xi ,wxi and βi = |wxi | is determined by the
Kraus-rank.

4 Proposed c-LPS Model


In this section, we propose circular LPS (c-LPS) which is an extension of LPS. For this
purpose, we first briefly review circular MPS (c-MPS) model. Circular MPS (c-MPS) is an
extension of MPS, where an order-N c-MPS T , with d-dimensional indices and rank r has
the entries given as:
r
αN −1 ,αN
X
Tx1 ,··· ,xN = Aα N ,α1 α1 ,α2
1,x1 A2,x2 · · · AN,xN (7)
{αi }N
i=1
=1

where Ak , k ∈ {1, · · · , N }, is an order-3 tensors of dimension d × r × r, as shown in Figure


(2d), where element (x, αL , αR ) is denoted as Aα L ,αR
k,x . c-MPS are studied in the literature
as tensor rings. Tensor rings [Zhao et al., 2016, Mickelin and Karaman, 2020] have found
application in compression of convolutional neural networks [Wang et al., 2018], image and
video compression [Zhao et al., 2019], data completion [Wang et al., 2017], among others.
Now, we introduce circular LPS ( c-LPS) as a tensor ring extension of LPS, where an order-N
with d-dimensional indices, puri-rank r, and purification dimension µ has entries given as:
r µ
0 0
X X β1 ,α0N ,α01 βN ,αN −1 ,αN βN ,αN −1 ,αN
Tx1 ,··· ,xN = Aβ1,x
1 ,αN ,α1
1
A 1,x 1
· · · A N,xN A N,xN (8)
{αi ,α0i }N
i=1
=1 {βi }N
i=1
=1

where Ak , k ∈ {1, · · · , N }, is an order-4 tensor of dimension d × µ × r × r, as shown in Figure


(2f), where the element (x, β, αL , αR ) is denoted as Aβ,α k,x
L ,αR 3
.

5 c-HQMM are c-LPS with Positive Semi-Definite Matrix


Structure
We first note that non-negative MPS (denoting by MPSR≥0 ) are HMM [Glasser et al., 2019].
In other words, any HMM can be mapped to an MPS with non-negative elements, and
any MPSR≥0 can be mapped to a HMM. Similarly, local quantum circuits with ancillas are
locally purified states [Glasser et al., 2019]. The authors of Adhikary et al. [2021] recently
considered an infinite time model of HQMM, where Kraus operators do not depend on time,
and showed the equivalence of these HQMMs with the uniform LPS with a positive definite
matrix structure. However, our work considers a non-uniform finite-time structure by having
Kraus operators depend on time. We note that the equivalent tensor structure corresponding
to c-HMM and c-HQMM are open, which is studied in this section.
The next result describes the relation between c-HQMM and c-LPS:
3
Since Born machines (BMs) are LPS of purification dimension µ = 1, we can similarly define
circular BMs.

6
Theorem 1. c-HQMM model is equivalent to a c-LPS structure where the decomposition
entries Ab,a
i,x
1 ,a2
are complex, and the r × r matrices formed by Ab,·,·
i,x for all i, x, b are positive
semi-definite (p.s.d.).

Proof. See [Javidian et al., 2021] for the proof.

The proof structure can be directly specialized to HQMM, where we can obtain the following
result:
Lemma 1. HQMM model is equivalent to a LPS structure where the decomposition entries
Ab,a
i,x
1 ,a2
are complex, and the r × r matrices formed by Ab,·,·
i,x for all i, x, b are positive semi-
definite (p.s.d.) for i ∈ {1, · · · , N }. Further, r × r matrix A0 is p.s.d. and r × r matrix
AN +1 is the identity matrix.

Proof. See [Javidian et al., 2021] for the proof.

Remark 1. Non-terminating uniform LPS (uLPS) are equivalent to HQMM [Adhikary


et al., 2021]. In uLPS boundary vectors originates from density matrices of arbitrary rank.
As shown in [Adhikary et al., 2021], the evaluation functional of uLPS can be rescaled and
transformed into a one that will converge to I~T when N → ∞. To have the equivalency of
finite-horizon HQMM and LPS, we need to restrict LPS models to the evaluation functional
I~T , as stated in Lemma 1.

Further, we note that the prior works do not relate c-HMM to tensor networks, to the best
of our knowledge. In the following result, we relate the c-HMM to c-MPS.
Lemma 2. c-HMM model is equivalent to a c-MPS model, where the entries of each
decomposition are real and non-negative.

Proof. See [Javidian et al., 2021] for the proof.

A1 A2 ... AN

Ā1 Ā2 ... ĀN

Figure 4: Contraction of tensor ring to compute: ZT =


P
X1 ,··· ,XN TX1 ,··· ,XN .

6 Learning Algorithm for Circular LPS Models


In this section we propose an algorithm for learning c-LPS Models as in Theorem 1 via a
maximum likelihood estimation (MLE) approach. The proposed algorithm is a modification
of the algorithm proposed in [Glasser et al., 2019] for learning LPS models, except that
we take into account the positive semi-definite nature of the decomposition and the cyclic
structure.
Problem 1 (MLE for Distribution Approximation). Assume that {xi = (xi1 , · · · , xiN )}ni=1
is a sample of size n from an experiment with N discrete random variables. To estimate this
discrete multivariate distribution, we use c-LPS model as defined in section 4. So, we have:
r µ
X X β1 ,αN ,α1 β1 ,α0 ,α01 β ,α ,αN βN ,α0 ,α0N
p(x1 , · · · , xN ) u A1,x 1
A1,x1 N N
· · · AN,xN
N −1
AN,xNN −1 ,
{αi ,α0i }N
i=1
=1 {βi }N
i=1
=1

7
where the tensor decomposition entries follow the structure in Theorem 1. Our objective here
βi ,αL ,αR
is to estimate tensor elements of the c-LPS, i.e., w = Ai,xi i i , for i = 1, · · · , N . For this
purpose, we minimize the negative log-likelihood:

X Txi
L=− log (9)
i
ZT

where Txi is given by the contraction of c-LPS, and ZT =


P
xi Txi is a normalization factor.

To find the optimal solution, we calculate the derivative of the log-likelihood with respect to
w as follows:

X ∂w Tx ∂w ZT
∂w L = − i
− (10)
i
Txi ZT

We use a mini-batch gradient-descent algorithm to minimize the negative log-likelihood. At


each step of the optimization, the sum is computed over a batch of training instances. The
parameters in the tensor network are then updated by a small step in the inverse direction of
the gradient. To satisfy the condition in Theorem 1, we project the r × r matrices formed by
Ab,·,·
i,x for all i, x, b to the positive semi-definite (p.s.d.) matrices using the standard Singular
Value Decomposition (SVD) method [Dattorro, 2010]. Note that we use Wirtinger derivatives
with respect to the conjugated tensor elements. Now, we explain how to compute ∂w ZT , ZT ,
∂w Txi , and Txi in equation (10). For a c-LPS of puri-rank r, the normalization ZT can be
computed by contracting the tensor network:

X
ZT = Tx1 ,··· ,xN (11)
x1 ,··· ,xN

This contraction is performed, as shown in Figure (4), from left to right by contracting at
each step the two vertical indices (corresponding to di and d¯i with respect to the supports
of Xi and X̄i ) and then each of the two horizontal indices (with respect to αi s and αi0 s,
respectively). Finally, we trace out the indices corresponding to the rings. In this contraction,
intermediate results from the contraction of the first i tensors are stored in Ei , and the same
procedure is repeated from the right with intermediate results of the contraction of the last
N − i tensors stored in Fi+1 . The derivatives of the normalization for each tensor are then
computed as

𝐴𝑖
∂ w ZT 𝐸𝑖−1 𝐹𝑖+1
= k
j
l
∂w Āj,k,l
i,m m

Computing Txi for a training sample and its derivative is done in the same way, except that
the contracted index corresponding to an observed variable is now fixed to its observed value.
We note that a similar approach to learn the model can be used for HQMM, HMM, and
c-HMM structures.

8
7 Numerical Evaluations: Maximum Likelihood Estimation on
Real Data

To evaluate the performance of the proposed algorithm for learning c-HQMMs, we used
the same datasets as used in [Glasser et al., 2019] and learn HMM, c-HMM, HQMM, and
c-HQMM using their respective tensor representations. HMM is equivalent to MPSR≥0 , and
is a baseline for other structures. We note that equivalence of both c-HMMs and c-HQMMs
to tensor networks first appear in this paper. Further, the equivalence of LPS and HQMM for
finite N with non-uniform Kraus operators is also studied for the first time in this paper. We
compare the performance of training HMM, c-HMM, HQMM, and c-HQMM using equivalent
tensor representations on six different real data of categorical variables, where following
parameters are used:

• Bond dimension/rank of the tensor networks: r = 2, 3, 4, 5 and 6.


• Learning rate was chosen using a grid search on powers of 10 going from 10−5 to
105 .
• Batch size, i.e., the number of training samples per minibatch, was set to 20.
• Number of iterations was set to a maximum of 1000.
• The dimension of the purification index, i.e., µ for LPS and c-LPS was set to 2.

Each data point reported here is the lowest negative log-likelihood obtained from 10 trials
with different initialization of tensors.
Results: The obtained results, summarized in Figure 5, show that (1) The tensor repre-
sentations can be used to learn different HMMs. (2) We observe that despite the different
algorithm choice, on almost all data sets, c-LPS and LPS lead to better modeling of the data
distribution for the same rank as compared to MPS. (3) The results indicate that c-HQMM
outperforms HQMM, HMM, and c-HMM. (4) In many cases, the performance difference
between LPS and c-LPS for rank 5 and 6 is more significant than the cases with the rank 2
and 3. Further, the improvement depends on the dataset. We also note that we plot negative
of log likelihoods, so the gap in the likelihoods is larger. The results suggest that in generic
settings HQMM and c-HQMM should be preferred over both HMM and c-HMM models,
respectively. Further, c-HQMM gives the best performance among the considered models.

Biofam data set Lymphography SPECT Heart


Negative log-likelihood per sample

Negative log-likelihood per sample


Negative log-likelihood per sample

13.5 14.5 13
12.5 14 12.5
11.5 13.5 12
10.5 13
11.5
9.5 12.5
11
8.5 12
7.5 11.5 10.5
6.5 11 10
5.5 10.5 9.5
D=2 D=3 D=4 D=5 D=6 D=2 D=3 D=4 D=5 D=6 D=2 D=3 D=4 D=5 D=6
HMM c-HMM HQMM c-HQMM HMM c-HMM HQMM c-HQMM HMM c-HMM HQMM c-HQMM

(a) (b) (c)


Voting Records Primary Tumor Solar Flare
Negative log-likelihood per sample
Negative log-likelihood per sample

Negative log-likelihood per sample

11.5 9.4 7.5


11 9.2
9 7
10.5
8.8
10 6.5
8.6
9.5 8.4 6
9 8.2
8.5 8 5.5
D=2 D=3 D=4 D=5 D=6 D=2 D=3 D=4 D=5 D=6 D=2 D=3 D=4 D=5 D=6
HMM c-HMM HQMM c-HQMM HMM c-HMM HQMM c-HQMM HMM c-HMM HQMM c-HQMM

(d) (e) (f)


Figure 5: Maximum likelihood estimation with tensor rings MPS, c-MPS, LPS, and c-LPS
for learning HMM, c-HMM, HQMM, and c-HQMM from the data on different data sets: a)
biofam data set of family life states from the Swiss Household Panel biographical survey
[Müller et al., 2007]; data sets from the UCI Machine Learning Repository [Dua and Graff,
2017]: b) Lymphography, c) SPECT Heart, d) Congressional Voting Records, e) Primary
Tumor, and f) Solar Flare.

9
8 Conclusion
This paper proposes a new class of hidden Markov models, that we called circular Hidden
Quantum Markov Models (c-HQMMs). c-HQMMs can be used to model temporal data in
quantum datasets (with classical datasets as a special case). We proved that c-HQMMs are
equivalent to circular LPS models with positive-semidefinite constraints on certain matrix
structure in the LPS decomposition. Leveraging this result, we proposed an MLE based
algorithm for learning c-HQMMs from data via c-LPS. We evaluated the proposed learning
approach on six real datasets, demonstrating the advantage of c-HQMMs on multiple datasets
as compared to HQMMs, circular HMMs, and HMMs.

Acknowledgments and Disclosure of Funding


This research was supported by the Defense Advanced Research Projects Agency (DARPA)
Quantum Causality [Grant No. HR00112010008].

References
Sandesh Adhikary, Siddarth Srinivasan, and Byron Boots. Learning quantum graphi-
cal models using constrained gradient descent on the stiefel manifold. arXiv preprint
arXiv:1903.03730, 2019.
Sandesh Adhikary, Siddarth Srinivasan, Geoff Gordon, and Byron Boots. Expressiveness
and learning of hidden quantum markov models. In International Conference on Artificial
Intelligence and Statistics, pages 4151–4161. PMLR, 2020.
Sandesh Adhikary, Siddarth Srinivasan, Jacob Miller, Guillaume Rabusseau, and Byron
Boots. Quantum tensor networks, stochastic processes, and weighted automata. In
International Conference on Artificial Intelligence and Statistics, pages 2080–2088. PMLR,
2021.
Nafiz Arica and FT Yarman Vural. A shape descriptor based on circular hidden markov
model. In Proceedings 15th International Conference on Pattern Recognition. ICPR-2000,
volume 1, pages 924–927. IEEE, 2000.
Jinhai Cai, Ming Ee, and Robert Smith. Image retrieval using circular hidden markov models
with a garbage state. In Proceedings of the Image and Vision Computing Conference New
Zealand 2007, pages 115–120. University of Waikato, 2007.
Lewis A Clark, Wei Huang, Thomas M Barlow, and Almut Beige. Hidden quantum
markov models and open quantum systems with instantaneous feedback. In ISCS 2014:
Interdisciplinary Symposium on Complex Systems, pages 143–151. Springer, 2015.
Douglas A Coast, Richard M Stern, Gerald G Cano, and Stanley A Briller. An approach to
cardiac arrhythmia analysis using hidden markov models. IEEE Transactions on biomedical
Engineering, 37(9):826–836, 1990.
Jon Dattorro. Convex optimization & Euclidean distance geometry. Lulu. com, 2010.
Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. URL http://archive.
ics.uci.edu/ml.
Ivan Glasser, Ryan Sweke, Nicola Pancotti, Jens Eisert, and Ignacio Cirac. Expres-
sive power of tensor-network factorizations for probabilistic modeling. In H. Wal-
lach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett,
editors, Advances in Neural Information Processing Systems, volume 32. Curran
Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/
b86e8d03fe992d1b0e19656875ee557c-Paper.pdf.
Hajo Holzmann, Axel Munk, Max Suster, and Walter Zucchini. Hidden markov models for
circular and linear-circular time series. Environmental and Ecological Statistics, 13(3):
325–347, 2006.

10
Mohammad Ali Javidian, Vaneet Aggarwal, and Zubin Jacob. Learning circular hidden
quantum markov models: A tensor network approach. arXiv preprint arXiv:2111.01536,
2021.
Biing Hwang Juang and Laurence R Rabiner. Hidden markov models for speech recognition.
Technometrics, 33(3):251–272, 1991.
Timo Koski. Hidden Markov models for bioinformatics, volume 2. Springer Science &
Business Media, 2001.
Karl Kraus, Arno Böhm, John D Dollard, and WH Wootters. States, effects, and operations:
fundamental notions of quantum theory. lectures in mathematical physics at the university
of texas at austin. Lecture notes in physics, 190, 1983.
Anders Krogh, Michael Brown, I Saira Mian, Kimmen Sjölander, and David Haussler. Hidden
markov models in computational biology: Applications to protein modeling. Journal of
molecular biology, 235(5):1501–1531, 1994.
Rogemar S Mamon and Robert James Elliott. Hidden Markov models in finance, volume 4.
Springer, 2007.
Oscar Mickelin and Sertac Karaman. On algorithms for and computing with the tensor ring
decomposition. Numerical Linear Algebra with Applications, 27(3):e2289, 2020.
Alex Monras, Almut Beige, and Karoline Wiesner. Hidden quantum markov models and
non-adaptive read-out of many-body states. arXiv: Quantum Physics, 2010.
Simone Montangero, Montangero, and Evenson. Introduction to Tensor Network Methods.
Springer, 2018.
Nicolas Séverin Müller, Matthias Studer, and Gilbert Ritschard. Classification de parcours
de vie à l’aide de l’optimal matching. XIVe Rencontre de la Société francophone de
classification (SFC 2007), pages 157–160, 2007.
Ara V Nefian and Monson H Hayes. Hidden markov models for face recognition. In
Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal
Processing, ICASSP’98 (Cat. No. 98CH36181), volume 5, pages 2721–2724. IEEE, 1998.
Román Orús. A practical introduction to tensor networks: Matrix product states and
projected entangled pair states. Annals of Physics, 349:117–158, 2014.
Román Orús. Tensor networks for complex quantum systems. Nature Reviews Physics, 1(9):
538–550, 2019.
Ivan V Oseledets. Tensor-train decomposition. SIAM Journal on Scientific Computing, 33
(5):2295–2317, 2011.
Lawrence Rabiner and Biinghwang Juang. An introduction to hidden markov models. ieee
assp magazine, 3(1):4–16, 1986.
Ismail Shahin. Enhancing speaker identification performance under the shouted talking
condition using second-order circular hidden markov models. Speech Communication, 48
(8):1047–1055, 2006.
Adam Siepel and David Haussler. Combining phylogenetic and hidden markov models in
biosequence analysis. Journal of Computational Biology, 11(2-3):413–428, 2004.
Siddarth Srinivasan, Geoff Gordon, and Byron Boots. Learning hidden quantum markov
models. In International Conference on Artificial Intelligence and Statistics, pages 1979–
1987. PMLR, 2018.
E Miles Stoudenmire and David J Schwab. Supervised learning with quantum-inspired tensor
networks. arXiv preprint arXiv:1605.05775, 2016.

11
Wenqi Wang, Vaneet Aggarwal, and Shuchin Aeron. Efficient low rank tensor ring completion.
In Proceedings of the IEEE International Conference on Computer Vision, pages 5697–5705,
2017.
Wenqi Wang, Yifan Sun, Brian Eriksson, Wenlin Wang, and Vaneet Aggarwal. Wide
compression: Tensor ring nets. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 9329–9338, 2018.
Andrew D Wilson and Aaron F Bobick. Parametric hidden markov models for gesture
recognition. IEEE transactions on pattern analysis and machine intelligence, 21(9):
884–900, 1999.
Qibin Zhao, Guoxu Zhou, Shengli Xie, Liqing Zhang, and Andrzej Cichocki. Tensor ring
decomposition. arXiv preprint arXiv:1606.05535, 2016.
Qibin Zhao, Masashi Sugiyama, Longhao Yuan, and Andrzej Cichocki. Learning efficient
tensor representations with ring-structured networks. In ICASSP 2019-2019 IEEE inter-
national conference on acoustics, speech and signal processing (ICASSP), pages 8608–8612.
IEEE, 2019.
Y-C Zheng and B-Z Yuan. Text-dependent speaker identification using circular hidden
markov models. In ICASSP-88., International Conference on Acoustics, Speech, and
Signal Processing, pages 580–581. IEEE Computer Society, 1988.
Walter Zucchini and Iain L MacDonald. Hidden Markov models for time series: an introduc-
tion using R. Chapman and Hall/CRC, 2009.

12

You might also like