2017_Joint Sensing Matrix and Sparsifying Dictionary Optimization for Tensor Compressive Sensing
2017_Joint Sensing Matrix and Sparsifying Dictionary Optimization for Tensor Compressive Sensing
Abstract—Tensor Compressive Sensing (TCS) is a multidi- Property (RIP) [1], the mutual coherence [6] and the null
mensional framework of Compressive Sensing (CS), and it is space property [2]. These properties have been used to provide
advantageous in terms of reducing the amount of storage, eas- sufficient conditions on sensing matrices and to quantify the
ing hardware implementations and preserving multidimensional worst-case reconstruction performance [2], [6], [7]. Random
arXiv:1601.07804v1 [cs.LG] 28 Jan 2016
structures of signals in comparison to a conventional CS system. matrices such as Gaussian or Bernoulli matrices have been
In a TCS system, instead of using a random sensing matrix and
shown to fulfill these conditions, and hence are widely used
a predefined dictionary, the average-case performance can be
further improved by employing an optimized multidimensional as the sensing matrix in CS applications. In view of the fact
sensing matrix and a learned multilinear sparsifying dictionary. that the mainstream view in the signal processing community
In this paper, we propose a joint optimization approach of considers the average-case performance rather than the worst-
the sensing matrix and dictionary for a TCS system. For the case performance, later on, it is shown that the average-
sensing matrix design in TCS, an extended separable approach case reconstruction performance can be further enhanced by
with a closed form solution and a novel iterative non-separable optimizing the sensing matrix according to the aforementioned
method are proposed when the multilinear dictionary is fixed. conditions, e.g., [8]–[12]. On the other hand, instead of us-
In addition, a multidimensional dictionary learning method that ing a fixed signal-sparsifying basis, e.g., a Discrete Wavelet
takes advantages of the multidimensional structure is derived, Transform (DWT), one can further enhance CS performance
and the influence of sensing matrices is taken into account in the
learning process. A joint optimization is achieved via alternately
by employing a basis which is learned from a training data
iterating the optimization of the sensing matrix and dictionary. set to abstract the basic atoms that compose the signal en-
Numerical experiments using both synthetic data and real images semble. The process of learning such a basis is referred to
demonstrate the superiority of the proposed approaches. as “sparsifying dictionary learning” and it has been widely
investigated in the literature [13]–[17]. In addition, by further
Keywords—Multidimensional system, compressive sensing, tensor exploiting the interaction between the sensing matrix and the
compressive sensing, dictionary learning, sensing matrix optimiza-
tion.
sparsifying dictionary, joint optimization of the two has also
been considered in [18]–[20].
However, in the process of sensing and reconstruction, the
I.
I NTRODUCTION conventional CS framework considers vectorized signals, and
The traditional signal acquisition-and-compression multidimensional signals are mapped in a vector format in a CS
paradigm removes the signal redundancy and preserves system. At the sensing node, such a vectorization requires the
the essential contents of signals to achieve savings on hardware to be capable of simultaneously multiplexing along
storage and transmission, where the minimum sampling all data dimensions, which is hard to achieve especially when
ratio is restricted by the Shannon-Nyquist Theorem at the one of the dimensions is along a timeline. Secondly, a real-
signal sampling stage. The wasteful process of sensing-then- world vectorized signal requires an enormous sensing matrix
compressing is replaced by directly acquiring the compressed that has as many columns as the number of signal elements.
version of signals in Compressive Sensing (CS) [1]–[3], a new Consequently such an approach imposes large demands on the
sampling paradigm that leverages the fact that most signals storage and processing power. In addition, the vectorization
have sparse representations (i.e., there are only a few non-zero also results in a loss of structure along the various dimensions,
coefficients) in some suitable basis. Successful reconstruction the presence of which is beneficial for developing efficient
of such signals is guaranteed for a sufficient number of reconstruction algorithms. For these reasons, applying conven-
randomly taken samples that are far fewer in number than that tional CS to applications that involve multidimensional signals
required in the Shannon-Nyquist Theorem. Therefore CS is is challenging.
very attractive for applications such as medical imaging and Extending CS to multidimensional signals has attracted
wireless sensor networks where data acquisition is expensive growing interests over the past few years. Most of the related
[4], [5]. work in the literature focuses on CS for 2D signals (i.e., matri-
Achieving successful CS reconstruction has been character- ces), e.g., matrix completion [21], [22], and the reconstruction
ized by a number of properties, e.g., the Restricted Isometry of sparse and low rank matrices [23]–[25]. In [26], Kronecker
product matrices are proposed for use in CS systems, which
Xin Ding and Ian J. Wassell are with the Computer Lab, University of makes it possible to partition the sensing process along signal
Cambridge, UK (e-mail: xd225, [email protected]).
Wei Chen is with the State Key Laboratory of Rail Traffic Control and dimensions and paves the way to developing CS for tensors,
Safety, Beijing Jiaotong University, China, and also with the Computer Lab, i.e., signals with two or more dimensions. Tensor CS (TCS)
University of Cambridge, UK (e-mail: [email protected]). has been studied in [27]–[30], where the main focus is on
2
algorithm development for reconstruction. To the best of our RN1 ×...×Nn . The mode-i vectors of a tensor are determined
knowledge, there is no prior work concerning the enhancement by fixing every index except the one in the mode i and the
of TCS via optimizing the sensing matrices at various dimen- slices of a tensor are its two dimensional sections determined
sions in a tensor. In addition, although dictionary learning by fixing all but two indices. By arranging all the mode-i
techniques have been considered for tensors [31]–[33], it is vectors as columns of a matrix, the mode-i unfolding matrix
still not clear how to conduct tensor dictionary learning to X(i) ∈ RNi ×N1 ...Ni−1 Ni+1 ...Nn is obtained. The mode-k tensor
incorporate the influence of sensing matrices in TCS. by matrix product is defined as: Z = X ×k A, where
In this paper, we investigate joint sensing matrix design and A ∈ RJ×Nk , Z ∈ RN1 ×...×Nk−1 ×J×Nk+1 ×...×Nn and it is
dictionary learning for TCS systems. Unlike the optimization calculated by: Z = f oldi (AX(i) ), where f oldi (·) means
for a conventional CS system where a single sensing matrix folding up a matrix along mode i to a tensor. The matrix
and a sparsifying basis for vectorized signals are obtained, we Kronecker product and vector outer product are denoted by
produce a multiplicity of them functioning along various tensor A ⊗ B and a ◦ b, respectively. The lp norm of a vector is
Pn 1
dimensions, thereby maintaining the advantages of TCS. The defined as: ||x||p = ( i=1 |xi |p ) p . For vectors, matrices and
contributions of this work are as follows: tensors, the l0 norm is given by the number of nonzero entries.
• We are the first to consider the optimization of a mul- IN denotes the N × N identity matrix. The operator (·)−1 ,
tidimensional sensing matrix and dictionary for a TCS (·)T and tr(·) represent matrix inverse, matrix transpose and
system and a joint optimization of the two is designed, the trace of a matrix, respectively. The number of elements for
which also includes particular cases of optimizing the a vector, matrix or tensor is denoted by len(·).
sensing matrix for a given multilinear dictionary and
learning the dictionary for a given multidimensional II. C OMPRESSIVE S ENSING (CS) AND T ENSOR
sensing matrix. C OMPRESSIVE S ENSING (TCS)
• We propose a separable approach for sensing matrix
A. Sensing Model
design by extending the existing work for conventional
CS. In this approach, the optimization is proved to be Consider a multidimensional signal X ∈ RN1 ×...×Nn . Con-
separable, i.e., the sensing matrix along each dimension ventional CS takes measurements from its vectorized version
can be independently optimized, and the approach has via:
closed form solution. y = Φx + e, (1)
• We put forth a non-separable method for sensing matrix N
Q
where x ∈ R (N = i Ni ) denotes the vectorized signal,
design using a combination of the state-of-art measures Φ ∈ RM×N (M < N ) is the sensing matrix, y ∈ RM
for sensing matrix optimization. This approach leads to represents the measurement vector and e ∈ RM is a noise
the best reconstruction performance in our comparison, term. The vectorized signal is assumed to be sparse in some
but it is iterative and hence needs more computing power
sparsifying basis Ψ ∈ RN ×N̂ (N ≤ N̂ ), i.e.,
to implement.
• We propose a multidimensional dictionary learning ap- x = Ψs, (2)
proach that couples the optimization of the multidimen-
sional sensing matrix. This approach extends KSVD [14] where s ∈ RN̂ is the sparse representation of x and it has only
and coupled-KSVD [18] to take full advantages of the K (K ≪ N̂ ) non-zero coefficients. Thus the sensing model
multidimensional structure in tensors with a reduced can be rewritten as:
number of iterations required for the update of dictionary y = ΦΨs + e = As + e, (3)
atoms.
The proposed approaches are demonstrated to enhance the where A = ΦΨ ∈ RM×N̂ is the equivalent sensing matrix.
performance of existing TCS systems via the use of extensive Even though CS has been successfully applied to practical
simulations using both synthetic data and real images. sensing systems [34]–[36], the sensing model has a few
The remainder of this paper is organized as follows. Section drawbacks when it comes to tensor signals. First of all, the
II formulates CS and TCS, and introduces the related theory. multidimensional structure presented in the original signal X
Section III reviews the sensing matrix design approaches is omitted due to the vectorization, which loses information
for CS and presents the proposed methods for TCS sensing that can lead to efficient reconstruction algorithms. Besides, as
matrix design. In Section IV, the related dictionary learning stated by (1), the sensing system is required to operate along
techniques are reviewed, followed by the elaboration of the all dimensions of the signal simultaneously, which is difficult
proposed multidimensional dictionary learning approach and to achieve in practice. Furthermore, the size of Φ associated
the joint optimization algorithm is presented. Experimental with the vectorized signal becomes too large to be practical
results are given in Section V and Section VI concludes the for applications involving multidimensional signals.
paper. TCS tackles these problems by utilizing separable sensing
operators along tensor modes and its sensing model is:
A. Multilinear Algebra and Notations
Y = X ×1 Φ1 ×2 Φ2 ... ×n Φn + E, (4)
Boldface lower-case letters, boldface upper-case letters and
M1 ×...Mn
non-boldface letters denote vectors, matrices and scalars, re- where Y ∈ R represents the measurement, E ∈
spectively. A mode-n tensor is an n-dimensional array X ∈ RM1 ×...Mn denotes the noise term, Φi ∈ RMi ×Ni (i =
3
√
1, ..., n) are sensing matrices and Mi < Ni . The multidimen- satisfies δ2K < 2 − 1; while for the noisy case and the not
sional signal is assumed to be sparse in a separable sparsifying exactly sparse case, the reconstructed signal is still a good
basis Ψi ∈ RNi ×N̂i (i = 1, ..., n), i.e., approximation of the original signal under the same condition.
The theoretical guarantees of successful reconstruction for the
X = S ×1 Ψ1 ×2 Ψ2 ... ×n Ψn , (5) greedy approaches have also been investigated in [38], [39].
where S ∈ RN̂Q 1 ×...N̂n
is the sparse representation that has The RIP essentially measures the quality of the equivalent
sensing matrix A, which closely relates to the design of Φ and
only K (K ≪ i N̂i ) non-zero coefficients. The equivalent
Ψ. However, since the RIP is not tractable, another measure is
sensing model can then be written as:
often used for CS projection design, i.e., the mutual coherence
Y = S ×1 A1 ×2 A2 ... ×n An , (6) of A [6] and it is defined by:
where Ai = Φi Ψi (i = 1, ..., n) are the equivalent sensing µ(A) = max |aTi aj |, (11)
matrices. 1≤i, j≤N̂ , i6=j
Using the TCS sensing model in (4), the sensing procedure where ai denotes the ith column of A. It has been shown
in (1) is partitioned into a few processes having smaller sensing that the reconstruction error of the l1 minimization problem
matrices Φi ∈ RMi ×Ni (i = 1, ..., n) and yet it maintains the is bounded if µ(A) < 1/(4K − 1). Based on the concept of
multidimensional structure of the original signal X. It is also mutual coherence, optimal projection design approaches are
useful to mention that the TCS model in (6) is equivalent to: derived, e.g., in [8], [9], [18].
y = (An ⊗ An−1 ⊗ ... ⊗ A1 )s, (7) When it comes to TCS, the reconstruction approaches for CS
can still be utilized owing to the relationship in (7). However,
as derived in [29]. By denoting A = An ⊗ An−1 ⊗ ... ⊗ A1 , for the algorithms where explicit usage of A is required, e.g,
it becomes a conventional CS model akin to (3), except that OMP, the implementation is restricted by the large dimension
the sensing matrix in (7) has a multilinear structure. of A. By extending the CS reconstruction approaches to
utilize tensor-based operations, TCS reconstruction algorithms
B. Signal Reconstruction employing only small matrices Ai (i = 1, ..., n) have been
developed in [29], [30], [40], [41]. These methods maintain
In conventional CS, the problem of reconstructing s from
the theoretical guarantees of conventional CS when A obeys
the measurement vector y captured using (3) is modeled as a
the condition on the RIC or the mutual coherence, but reduce
l0 minimization problem as follows:
the computational complexity and relax the storage memory
min ||s||0 , s.t. ||y − As|| ≤ ε, (8) requirement.
s
Even so, the conditions on A are not intuitive for a prac-
where ε is a tolerance parameter. Many algorithms have been tical TCS system, which explicitly utilizes multiple separable
developed to solve this problem, including Basis Pursuit (BP) sensing matrices Ai (i = 1, ..., n) instead of a single matrix
[1]–[3], [37], i.e., conducting convex optimization by relaxing A. Fortunately, the authors of [26] have derived the follow-
the l0 norm in (8) as the l1 norm, and greedy algorithms such as ing relationships to clarify the corresponding conditions on
Orthogonal Matching Pursuit (OMP) [38] and Iterative Hard Ai (i = 1, ..., n).
Thresholding (IHT) [39]. The reconstruction performance of Theorem 2: Let Ai (i = 1, ..., n) be matrices with RICs
the l1 minimization approach has been studied in [7], [37], δK (A1 ), ..., δK (An ), respectively, and their mutual coherence
where the well known Restricted Isometry Property (RIP) are µ(A1 ), ..., µ(An ). Then for the matrix A = An ⊗An−1 ⊗
was introduced to provide a sufficient condition for successful ... ⊗ A1 , we have
signal recovery. n
Definition 1: A matrix A satisfies the RIP of order K with a Y
the Restricted Isometry Constant (RIC) δ K being the smallest µ(A) = µ(Ai ), (12)
i=1
number such that n
Y
(1 − δK )||s||22 ≤ ||As||22 ≤ (1 + δK )||s||22 (9) δK (A) ≤ (1 + δK (Ai )) − 1. (13)
i=1
holds for all s with ||s||0 ≤ K. √
Theorem 1: Assume that δ2K < 2 − 1 and ||e||2 ≤ ε.
Then the solution ŝ to (8) obeys In [26], these relationships are then utilized to derive the
reconstruction error bounds for a TCS system.
||ŝ − s||2 ≤ C0 K −1/2 ||s − sK ||1 + C1 ε (10)
√ √
where C0 = 2+(2 √
2−2)δ2K
1−( 2+1)δ2K
, C1 = 4 √1+δ2K
1−( 2+1)δ2K
, δ2K is the III. O PTIMIZED M ULTILINEAR P ROJECTIONS FOR TCS
RIC of matrix A, sK is an approximation of s with all but the In this section, we show how to optimize the multilinear
K largest entries set to zero. sensing matrix when the dictionaries Ψi (i = 1, ..., n) for each
The previous theorem states that for the noiseless case, any dimension are fixed. We first introduce the related design ap-
sparse signal with fewer than K non-zero coefficients can be proaches for CS, then present the proposed methods for TCS,
exactly recovered if the RIC of the equivalent sensing matrix including a separable and a non-separable design approach.
4
A. Sensing Matrix Design for CS B. Multidimensional Sensing Matrix Design for TCS
We observe that the sufficient conditions on the RIC or In contrast to the aforementioned methods, we consider
the mutual coherence for successful CS reconstruction, as optimization of the sensing matrix for TCS. Compared to the
reviewed in Section II-B, only describe the worst case bound, design process in conventional CS, the main distinction for
which means that the average recovery performance is not the TCS is that we would like to optimize multiple separable
reflected. In fact, the most challenging part of CS sensing sensing matrices Φi (i = 1, ..., n), rather than a single matrix
matrix design lies in deriving a measure that can directly reveal Φ. In this section, in addition to extending the approaches in
the expected-case reconstruction accuracy. (14) and (15) to the TCS case, we also propose a new approach
In [8], Elad et al. proposed the notion of averaged mutual for TCS sensing matrices design by combining the state-of-
coherence, based on which an iterative algorithm is derived for art ideas in [10], [12], [20]. To simplify our exposition, we
optimal sensing matrix design. This approach aims to minimize elaborate our methods in the following sections for the case
the largest absolute values of the off-diagonal entries in the of n = 2, i.e., the tensor signal becomes a matrix, but note that
Gram matrix of A, i.e., GA = AT A. It has been shown the methods can be straightforwardly extended to an n mode
to outperform a random Gaussian sensing matrix in terms of tensor case (n > 2).
reconstruction accuracy, but is time-consuming to construct As reviewed in Section II-B, the performance of existing
and can ruin the worst case guarantees by inducing large off- TCS reconstruction algorithms relies on the quality of A,
diagonal values that are not in the original Gram matrix. In where A = A2 ⊗ A1 when n = 2. Therefore, when the
order to make any subset of columns in A as orthogonal multilinear dictionary Ψ = Ψ2 ⊗Ψ1 is given, one can optimize
as possible, Sapiro et al. proposed in [18] to make GA as Φ (where Φ = Φ2 ⊗ Φ1 ) using the methods for CS as
close as possible to an identity matrix, i.e., ΨT ΦT ΦΨ ≈ IN̂ . introduced in Section III-A.
It is then approximated by minimizing ||Λ − ΛΓT ΓΛ||2F , However, when implementing a TCS system, it is still
where Γ comes from the eigen-decomposition of ΨT Ψ, i.e., necessary to obtain the separable matrices, i.e., Φ1 and Φ2 .
ΨT Ψ = VΛVT , and Γ = ΦV. This approach is also One intuitive solution is to design Φ using the aforementioned
iterative, but outperforms Elad’s method. Considering the fact approaches for CS and then to decompose Φ by solving the
that A has minimum coherence when the magnitudes of all following problem:
the off-diagonal entries of GA are equal, Xu et al. proposed
an Equiangular Tight Frame (ETF) based method in [9]. The min ||Φ − Φ2 ⊗ Φ1 ||2F , (16)
problem is modeled as: minGt ∈H ||ΨT ΦT ΦΨ−Gt ||2F , where Φ1 ,Φ2
Theorem
3: Assume
for i = 1, 2, N̄i = rank(Ψi ), Ψi = which represent the separable sub-problems of the following
ΛΨi 0 design approach:
UΨi T
VΨi is an SVD of Ψi and ΛΨi ∈ RN̄i ×N̄i .
0 0
Let Φ̂i ∈ RMi ×Ni (i = 1, 2) be matrices with rank(Φ̂i ) = min ||Φ2 ⊗ Φ1 ||2F , (22)
Φ1 ,Φ2
Mi and Mi ≤ N̄i is assumed. Then
s.t. (Φ2 ⊗ Φ1 )(Ψ2 ⊗ Ψ1 )(ΨT2 ⊗ ΨT1 )(ΦT2 ⊗ ΦT1 ) = IM1 M2 ,
• the following equation is a solution to (17):
T −1 and it is in fact a multidimensional extension of the CS sensing
V ΛΨi 0
Φ̂i = U [ IMi 0 ] UTΨi , (18) matrix design approach proposed in [10].
0 0
Proof: Since the equivalent sensing matrices designed using
Approach I are Parseval tight frames, it follows from the
where i = 1, 2, U ∈ RMi ×Mi and V ∈ RN̄i ×N̄i are
derivation in [10] that the sub-problems in (21) have the
arbitrary orthonormal matrices;
same solution as in (18). The problem in (22) can be proved
• the resulting equivalent sensing matrices Âi = separable simply by revealing the fact that ||Φ2 ⊗ Φ1 ||2F =
Φ̂i Ψi (i = 1, 2) are Parseval tight frames, i.e.,
||Φ2 ||2F ||Φ1 ||2F , and when Φi Ψi ΨTi ΦTi = IMi is satisfied for
||ÂTi z||2 = ||z||2 , where z ∈ RN̂i is an arbitrary vector. both i = 1 and 2, the constraint in (22) is also satisfied.
• the minimum of (17) is N̂1 N̂2 − M1 M2 ; By decomposing the original problems into independent
• separately solving the sub-problems sub-problems, the sensing matrices can be designed in parallel
and the problem becomes easier to solve. However, the CS
min ||IN̂i − ΨTi ΦTi Φi Ψi ||2F (19) sensing matrix design approaches are not always separable
Φi
after being extended to the multidimensional case, because
for i = 1, 2 leads to the same solutions as (18) and the a variety of different criteria can be used for sensing matrix
resulting objective in (17) has the same minimum, i.e., design as reviewed in Section III-A, and in many cases
N̂1 N̂2 − M1 M2 . the decomposition is not provable. We will propose a non-
Proof: The proof is given in Appendix A. separable approach in the following section.
2) A Non-separable Design Approach: Taking into account:
Algorithm 1 Design Approach I i) the impact of sensing cost on reconstruction performance
[10]; ii) the benefit of making the equivalent sensing matrix
Input: Ψi (i = 1, 2). so that it has similar properties to those of the sparsifying
Output: Φ̂i (i = 1, 2). dictionary [12]; and iii) the conventional requirement on the
1: for i = 1, 2 do mutual coherence, we put forth the following Design Approach
2: Calculate optimized Φ̂i using (18); II:
3: end √
4: Normalization for i = 1, 2: Φ̂i = Ni Φ̂i /||Φ̂i ||F . min (1 − β)||(Ψ)T Ψ − (Ψ)T (Φ)T ΦΨ||2F
Φ1 ,Φ2
Clearly, Approach I is separable, which means that we can + α||Φ||2F + β||IN̂1 N̂2 − (Ψ)T (Φ)T ΦΨ||2F , (23)
independently design each Φi according to the corresponding
sparsifying dictionary Ψi in mode i. This observation stays where Ψ = Ψ2 ⊗ Ψ1 , Φ = Φ2 ⊗ Φ1 , α and β are tuning
consistent when we consider the situation in an alternative way. parameters. As investigated in [10] and [20], α ≥ 0 controls
Applying the method in (14) to acquire the optimal Φ1 and Φ2 the sensing energy; while β ∈ [0, 1] balances the impact of
independently, we are actually trying to make any subset of the first and third terms to achieve optimal performance under
columns in A1 and A2 , respectively, as orthogonal as possible. different conditions of the measurement noise. The choice of
As a result, the matrix A = A2 ⊗ A1 that is obtained will these parameters will be investigated in Section V-A.
also be as orthogonal as possible. This follows from the fact To solve (23), we adopt a coordinate descent method.
that for any two columns of A, we have Denoting the objective as f (Φ1 , Φ2 ), we first compute its
gradient with respect to Φ1 and Φ2 , respectively, and the result
|aTp aq | = |[(a2 )Tl ⊗ (a1 )Ts ][(a2 )c ⊗ (a1 )d ]| is as follows:
= |[(a2 )Tl (a2 )c ][(a1 )Ts (a1 )d ]|, (20)
∂f
where a, a1 and a2 denote the column of A, A1 and A2 , = 4||GAj ||2F (Ai GAi ΨTi ) − 4β||Aj ||2F (Ai ΨTi )
respectively, and p, q, l, s, c, d are the column indices. ∂Φi
Using the second statement of Theorem 3, we can derive + 2α||Φj ||2F Φi + 4(β − 1)||Ψj ATj ||2F (Ai GΨi ΨTi ),
the following corollary. (24)
Corollary 1: The solution in (18) also solves the following
problems for i = 1, 2: where i, j ∈ {1, 2} and j 6= i, GAi = ATi Ai and GΨi =
ΨTi Ψi .
min ||Φi ||2F , s.t. Φi Ψi ΨTi ΦTi = IMi , (21) For generality, we also provide the result for the n > 2 case
Φi
6
Till now, we have considered optimizing the multidimen- where λ1R is the largest singular value of R̃p and 1
u1R ,
vR
sional sensing matrix when the sparsifying dictionaries for are the corresponding left and right singular vectors. The
each tensor mode are given. For the purpose of joint optimiza- update column p of Ψ is obtained after normalization: ψ̂ p =
tion, we will proceed to optimize the dictionaries by coupling ψ̂ p /||ψ̂p ||2 . The above process is then iterated to update every
fixed sensing matrices. The joint optimization will eventually atom of Ψ.
be achieved by alternatively optimizing the sensing matrices Clearly the sensing matrix has been taken into account
and the sparsifying dictionaries. during the dictionary learning process, which has been shown
to be beneficial for CS reconstruction performance [18]. In
IV. J OINTLY L EARNING T HE M ULTIDIMENSIONAL order to learn multidimensional separable dictionaries for high
D ICTIONARY AND S ENSING M ATRIX dimensional signals, and to achieve joint optimization of the
multidimensional dictionary and sensing matrix, we will derive
In this section, we first propose a sensing-matrix-coupled a coupled-KSVD algorithm for a tensor, i.e., cTKSVD, in the
method for multidimensional sparsifying dictionary learning. following section. Again for simplicity we will still describe
Then it is combined with the previously introduced optimiza- the main flow for 2-D signals, i.e., n = 2.
tion approach for a multilinear sensing matrix to yield a joint
optimization algorithm. In the spirit of the coupled KSVD B. The cTKSVD Approach
method [18], our approach for dictionary learning can be Consider a training sequence of 2-D signals X1 , ..., XT , we
viewed as a sensing-matrix-coupled version of a tensor KSVD obtain a tensor X ∈ RN1 ×N2 ×T by stacking them along the
algorithm. We start by briefly introducing the coupled KSVD third dimension. Denoting the stack of the sparse represen-
method.
tations Si ∈ RN̂1 ×N̂2 , (i = 1, ..., T ) by S ∈ RN̂1 ×N̂2 ×T ,
we propose the following optimization problem to learn the
A. Coupled KSVD multidimensional dictionary:
The Coupled KSVD (cKSVD) [18] is a dictionary learning min ||Z − S ×1 D1 ×2 D2 ||2F , s.t., ∀i, ||Si ||0 ≤ K, (32)
approach for vectorized signals. Let X = [ x1 ... xT ] be Ψ1 ,Ψ2 ,S
a N × T matrix containing a training sequence of T signals in which
x1 , ..., xT . The cKSVD aims to solve the following problem,
γ2X γY2
i.e., to learn a dictionary Ψ ∈ RN ×N̂ from X: Z= , Y i = X ×i Φi + Ei , (33)
γY1 Y
min γ||X − ΨS||2F + ||Y − ΦΨS||2F , s.t. ∀i, ||si ||0 ≤ K,
γIN̂1 γIN̂2
Ψ,S D1 = Ψ1 , D2 = Ψ2 , (34)
(27) Φ1 Φ2
7
and γ > 0 is a tuning parameter. of the atom for current update. A HOSVD is carried out for
The problem in (32) aims to minimize the representation R̃p2 and the update steps corresponding to (37) - (40) now
error ||X − S ×1 Ψ1 ×2 Ψ2 ||2F and the overall projection error become:
||Y − S ×1 A1 ×2 A2 ||2F with constraints on the sparsity of
each slice of the tensor. In addition, it also takes into account (d̂2 )p2 = vR1
, D1 S̃:,p2 ,: = u1R ◦ (λ1R ω 1R ), (42)
the projection errors induced by Φ1 and Φ2 individually. T
2 −1
T
1
Using an available sparse reconstruction algorithm for the (ψ̂ 2 )p2 = (γ IN2 + Φ2 Φ2 ) γIN2 Φ2 vR , (43)
TCS, e.g., Tensor OMP (TOMP) [30], and initial dictionaries (ψ̂ 2 )p2 = (ψ̂ 2 )p2 /||(ψ̂ 2 )p2 ||2 , (44)
Ψ1 , Ψ2 , the sparse representation S can be estimated first.
Then we update the multilinear dictionary alternately. We first D1 S̃:,p2 ,: = ||(ψ̂ 2 )p2 ||2 u1R ◦ (λ1R ω 1R ), (45)
update the atoms of Ψ1 with Ψ2 fixed. The objective in (32)
is rewritten as: in which S̃:,p2 ,: represents the lateral slice at index p2 and
its updated elements can also be calculated using LS. The
X
||Rp1 − (d1 )p1 ◦ (d2 )q2 ◦ s(p1 −1)N̂2 +q2 ||2F , (35)
q2
dictionary Ψ2 is then updated iteratively. The whole process
P P of updating S, Ψ1 , Ψ2 is repeated to obtain the final solution
where Rp1 = Z − q1 6=p1 q2 (d1 )q1 ◦ (d2 )q2 ◦ s(q1 −1)N̂2 +q2 ; of (32).
p1 is the index of the atom for the current update and q1 , q2 The uncoupled version of the proposed cTKSVD method
denote the indices of the remaining atoms of Ψ1 and all the (denoted by TKSVD) can be easily obtained by modifying the
atoms of Ψ2 , respectively; d1 , d2 are columns of D1 , D2 ; s is problem in (32) to:
the mode-3 vector of S. Then to satisfy the sparsity constraint
in (32), we only keep the non-zero entries of s(p1 −1)N̂2 +q2 and
min ||X − S ×1 Ψ1 ×2 Ψ2 ||2F , s.t. ∀i, ||Si ||0 ≤ K, (46)
the corresponding subset of Rp1 to obtain: Ψ1 ,Ψ2 ,S
X
||R̃p1 − (d1 )p1 ◦ (d2 )q2 ◦ s̃(p1 −1)N̂2 +q2 ||2F . (36) and it can be solved following the same procedures as
q2
described previously for cTKSVD except that the steps of
Assuming that after carrying out a Higher Order SVD pseudo-inverse and normalization are no longer needed.
(HOSVD) [44] for R̃p1 , the largest singular value is λ1R and The proposed cTKSVD for multidimensional dictionary
the corresponding singular vectors are u1R , vR
1
and ω 1R , we learning is different to the KHOSVD method [32], i.e., another
eliminate the largest error by: tensor-based dictionary learning approach obtained by extend-
ing the KSVD method. The learning process of KHOSVD
1
(d̂1 )p1 = u1R , D2 S̃p1 ,:,: = vR ◦ (λ1R ω1R ), (37) follows the same train of thought as with the conventional
where S̃p1 ,:,: denotes the horizontal slice of S at index p1 that KSVD method, except that to eliminate the largest error in each
contains only non-zero mode-2 vectors. The atom of Ψ1 is iteration, a HOSVD [44], i.e., SVD for tensors, is employed.
then calculated using the pseudo-inverse as: However, the process of KHOSVD does not take full advantage
of the multilinear structure and involves duplicated updating
(ψ̂ 1 )p1 = (γ 2 IN1 + ΦT1 Φ1 )−1 γIN1 ΦT1 u1R . (38)
of the atoms, which leads to a slow convergence speed. The
proposed cTKSVD approach is distinct from KHOSVD in the
The current update is then obtained after normalization: following respects. First, during the update of each atom, a
(ψ̂ 1 )p1 slice of the coefficient is updated accordingly in cTKSVD;
(ψ̂ 1 )p1 = , (39) while only a vector is updated in KHOSVD. Therefore, in
||(ψ̂ 1 )p1 ||2 cTKSVD, each iteration of the outer loop contains N̂1 + N̂2
1
D2 S̃p1 ,:,: = ||(ψ̂ 1 )p1 ||2 vR ◦ (λ1R ω 1R ). (40) inner iterations, which is N̂1 N̂2 for KHOSVD (and for KSVD).
It means that cTKSVD requires HOSVD to be executed
Since D2 and the support indices of each mode-2 vector N̂1 N̂2 − N̂1 − N̂2 fewer times than for the KHOSVD method
in S̃p1 ,:,: are known, the updated coefficients S̃p1 ,:,: can be and hence reduces the complexity. In addition, KHOSVD does
easily calculated by the Least Square (LS) solution. The above not take into account the influence from the sensing matrix.
process is repeated for all the atoms to update the dictionary The benefit of coupling of the sensing matrices in cTKSVD
Ψ1 . will be shown by simulations in Section V-B.
The next step is to update Ψ2 with the obtained Ψ1 fixed.
It follows a similar procedure to that described previously. Here, we also provide the problem formulation when one
Specifically, the objective in (32) is rewritten as: needs to learn 3-D sparsifying dictionaries. The cTKSVD for
X cases where n > 2 can be modeled following a similar strategy.
||R̃p2 − (d1 )q1 ◦ (d2 )p2 ◦ s̃(q1 −1)N̂2 +p2 ||2F , (41) For a training sequence consisting of T stacked 3-D signals
q1 X ∈ RN1 ×N2 ×N3 ×T , we learn the dictionaries by solving:
where s̃ is the mode-3 vector with only non-zero entries,
is the corresponding subset of Rp2 , Rp2 = Z −
R̃p2 P min ||Z−S×1 D1 ×2 D2 ×3 D3 ||2F , s.t., ∀i, ||Si ||0 ≤ K,
P Ψ1 ,Ψ2 ,Ψ3 ,S
q1 q2 6=p2 (d1 )q1 ◦ (d2 )q2 ◦ s(q1 −1)N̂2 +q2 and p2 is the index (47)
8
11
4 x 10
in which 10 3
γ 2 G1
γG2 γIN̂1 6
Z= , D1 = Ψ1 , (48) 10
γG3 G4 Φ1 2.5
MSE
MSE
8
10
γIN̂2 γIN̂3
D2 = Ψ2 , D3 = Ψ3 , (49)
Φ2 Φ3 10
10
2
MSE
MSE
The problem can then be solved following similar steps to 0.7
10 0.125
those introduced earlier in this section.
We have now derived the method of learning the sparsifying 0.9
0.124
10
dictionaries when the multilinear sensing matrix is fixed. 0.123
0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2
Combining this approach with the methods of optimizing the β α
sensing matrices elaborated in Section III-B, we can then (c) (d)
jointly optimize Φ1 , Φ2 and Ψ1 , Ψ2 by alternating between
Fig. 1: MSE performance of sensing matrices generated by
them. The overall procedure is summarized in Algorithm 3.
Approach II with different values of α and β. (a) σ 2 = 0, α =
1; (b) σ 2 = 0, β = 0.8; (c) σ 2 = 10−2 , α = 1; (d) σ 2 =
Algorithm 3 Joint Optimization 10−2 , β = 0.2.
(0) (0)
Input: Ψi (i = 1, 2), Φi (i = 1, 2), X, α, β, η, γ,
iter = 0.
Output: Φ̂i (i = 1, 2), Ψ̂i (i = 1, 2).
1: Repeat until convergence: A. Optimal Multidimensional Sensing Matrix
(iter) (iter+1)
2: For Ψ̂i (i = 1, 2) fixed, optimize Φ̂i (i = This section is intended to examine the proposed separable
1, 2) using one of the approaches given in Section III-B; approach I and non-separable approach II for multidimensional
(iter) (iter+1) sensing matrix design. Before doing so, we first test the tuning
3: For Ψ̂i , Φ̂i (i = 1, 2) fixed, solve (32) using parameters for Approach II, i.e., the non-separable design
TOMP to obtain Ŝ; approach presented in Section III-B-2. As detailed in Section
4: For p1 = 1 to N̂1 III-B-1, Approach I has a closed form solution and there are
5: Compute R̃p1 using (32) - (35); no tuning parameters involved.
6: Do HOSVD to R̃p1 to obtain λ1R , u1R , vR 1
and ω1R ; We evaluate the Mean Squared Error (MSE) performance
(iter+1) of different sensing matrices generated using Approach II with
7: Update (ψ̂ 1 )p1 , D2 S̃p1,:,: using (38) - (40) and
various parameters and the results are reported by averaging
calculate S̃p1,:,: by LS;
over 500 trials. A random 2D signal S ∈ R64×64 with sparsity
8: end
K = 80 is generated, where the randomly placed non-zero
9: For p2 = 1 to N̂2 elements follow an i.i.d zero-mean unit-variance Gaussian
10: Compute R̃p2 using (32) and (41); distribution. Both the dictionaries Ψi ∈ R64×256 (i = 1, 2)
11: Do HOSVD to R̃p2 to obtain λ1R , u1R , vR 1
and ω1R ; and the initial sensing matrices Φi ∈ R40×64 (i = 1, 2)
(iter+1)
12: Update (ψ̂ 2 )p2 , D1 S̃:,p2 ,: using (43) - (45) and are generated randomly with i.i.d zero-mean unit-variance
calculate S̃:,p2 ,: by LS; Gaussian distributions, and the dictionaries are then column
13: end normalized
√ while the sensing matrices are normalized by:
14: iter = iter + 1; Φi = 64Φi /||Φi ||F . When taking measurements, random
additive Gaussian noise with variance σ 2 is induced. A con-
stant step size η = 1e − 7 is used for Approach II and the BP
solver SPGL1 [45] is employed for reconstructions.
V. E XPERIMENTAL R ESULTS Fig. 1 illustrates the results for the parameter tests. In Fig. 1
In this section, we evaluate the proposed approaches via (a) and (c), the parameter β is evaluated for the noiseless (σ 2 =
simulations using both synthetic data and real images. We first 0) and high noise (σ 2 = 10−2 ) cases, respectively, when α =
test the sensing matrix design approaches proposed in Section 1. From both (a) and (c), we can see that when β = 0 or 1, the
III-B with the sparsifying dictionaries being given. Then the MSE is larger than that for the other values, which means that
cTKSVD approach is evaluated when the sensing matrices are both terms of Approach II that are controlled by β are essential
fixed. Finally the experiments for the joint optimization of the for obtaining optimal sensing matrices. In addition, we can
two are presented. see that when β becomes larger in the range of [0.1, 0.9], the
9
2
10
Approach I Approach I
0
10 Approach II Approach II
SS SS
Gaussian 0 Gaussian
10
MSE
MSE
−2
10
−2
10
−4
10 −4
10
(a) (a)
2 Approach I 2
10 10
Approach II
SS
0
Gaussian 0
10 10
MSE
MSE
−2 −2
10 10
−4 −4
10 10 Approach I
Approach II
SS
−6 −6
10 10 Gaussian
(b) (b)
Fig. 2: MSE performance of different sensing matrices for Fig. 3: MSE performance of different sensing matrices for (a)
(a) the BP, (b) the OMP when Mi (i = 1, 2) varies. the BP, (b) the OMP when K varies. (M1 = M2 = 40, N1 =
(K = 80, N1 = N2 = 64, N̂1 = N̂2 = 256 and σ 2 = 10−4 ) N2 = 64, N̂1 = N̂2 = 256 and σ 2 = 10−4 )
MSE decreases slightly in (a), but increases slightly in (b). This [18]. We hence also include it in the comparisons and denote it
indicates the choice of β under different conditions of sensing by Separable Sapiro’s approach (SS). The previously described
noise, which is consistent with that observed in [20]. Thus in synthetic data is generated for the experiments and both BP
the remaining experiments, we take β = 0.8 when sensing and OMP are investigated for the reconstruction.
noise is low and β = 0.2 when the noise is high. Fig. 1 (b) Different sensing matrices are first evaluated using BP and
and (d) demonstrate the MSE results for the tests of parameter OMP when the number of measurements varies. A small
α. It is observed that α = 1 is optimal for the noiseless case amount of noise (σ 2 = 10−4 ) is added when taking measure-
while it becomes α = 0.6 when high noise exists. Therefore a ments and the parameters are chosen as: α = 1, β = 0.8.
larger α is preferred when low noise is involved, which needs From Fig. 2, it can be observed that both the proposed
to be reduced accordingly when the noise becomes higher. approaches perform much better than the Gaussian sensing
We then proceed to examine the performance of both the matrices, among which Approach II has better performance.
proposed approaches. As this is the first work to optimize the In general, the SS method performs worse than Approach I,
multidimensional sensing matrix, we take the i.i.d Gaussian although the difference is not obvious at some points. Note that
sensing matrices that are commonly used in CS problems SS is an iterative method while Approach I is non-iterative.
for comparison. Besides, since Sapiro’s approach [18] has the The proposed approaches are again observed to be superior
same spirit to that of Approach I (as reviewed in Section III-A), to the other methods when the number of measurements is
it can be easily extended to the multidimensional case, i.e., fixed but the signal sparsity K is varied, as shown in Fig. 3.
individually generating Φi (i = 1, 2) using the approach in Compared to Approach I, Approach II exhibits better perfor-
10
0.045 0.04
cTKSVD (γ=1/256) (MSE: 0.0186) cTKSVD
cTKSVD (γ=1/128) (MSE: 0.0185) cKSVD
0.04
cTKSVD (γ=1/64) (MSE: 0.0183) 0.035 TKSVD
BCDEFG
cTKSVD (γ=1/32) (MSE: 0.0184)
0.035
cTKSVD (γ=1/16) (MSE: 0.0185) 0.03
cKSVD (γ=1/32) (MSE: 0.0203)
MSE
0.03
A
0.025
0.025
0.02
0.02
0.015 0.015
0 20 40 60 80 100 1000 2000 3000 4000 5000 6000
I T
(a) (a)
0.02 0.04
cTKSVD (γ=1/256) (MSE: 0.0336) cTKSVD
cTKSVD (γ=1/128) (MSE: 0.0335) cKSVD
0.035 TKSVD
cTKSVD (γ=1/64) (MSE: 0.0336)
PQRSTU
cTKSVD (γ= ! "#$%& '()*+,-
0.015
cTKSVD (γ./0123 45678 9:;<>?@ 0.03
cKSVD (γ=1/64) (MSE: 0.0350)
MSE
ARE
0.025
0.01
0.02
0.005 0.015 H6 J5 L4 M3 N2 O1
0 20 40 60 80 100 10 10 10 10 10 10
σ2
(b) (b)
Fig. 4: Convergence behavior of cTKSVD with different values Fig. 5: MSE performance of different dictionaries when (a) T
of γ compared to that of cKSVD with its optimal parameter varies (σ 2 = 0), (b) σ 2 varies (T = 5000). (K = 4, M1 =
setting when (a) M1 = M2 = 7; (b) M1 = M2 = 3. M2 = 7, N1 = N2 = 10, N̂1 = N̂2 = 18)
40 40
35
35
30
30
f 25
e
d
c
b 20 25
a
`
_
15
20
10 ¡¢£¤¥¦
ghijklmn §¨©ª«¬®¯°±²³´µ
15
5 opqrstuvw ¶·¸¹º»¼½¾¿ÀÁÂÃ
xyz{|}~ ÄÅÆÇÈÉÊËÌ
0 10
0 5 10 15 20 25 30 3 4 5 6 7
VWXYZ[\]^
Mi (i=1, 2)
(a)
Fig. 6: Convergence behavior of various joint optimization
methods. (T = 5000, K = 4, M1 = M2 = 6, N1 = N2 =
8, N̂1 = N̂2 = 16, σ 2 = 0) 38
ÕÖרÙÚÛÜ
36 ÝÞßàáâãäå
æçèéêëìí
34 îïðñòóôõö÷øùúûü
4, it can be seen that cTKSVD exhibits stable convergence 32
ýþÿS
VI. C ONCLUSIONS
Λ2A2
2
In this paper, we propose to jointly optimize the multidimen- 0 ΛA1 0
Denote Σ = ⊗ = diag(ν A2 ⊗
sional sensing matrix and dictionary for TCS systems. To ob- 0 0 0 0
Λ2Ai 0
tain the optimized sensing matrices, a separable approach with ν A1 ), ν Ai = diag( ), then we have
closed form solutions has been presented and a joint iterative 0 0
approach with novel design measures has also been proposed.
T T
The iterative approach certainly has higher complexity, but also ||IN̂1 N̂2 − (VA2 ⊗ VA1 )Σ(VA 2
⊗ VA 1
)||2F . (51)
exhibits better performance. An approach to learning the mul-
tidimensional dictionary has been designed, which explicitly Let ν Ai = [(vi )1 , ..., (vi )Mi , 0]T , then the sub-vector of
takes the multidimensional structure into account and removes the diagonal of Σ containing its non-zero values is: ν̂ =
the redundant updates in the existing multilinear approaches [(v2 )1 (v1 )1 , ..., (v2 )1 (v1 )M1 , ..., (v2 )M2 (v1 )1 , ..., (v2 )M2 (v1 )M1 ]T .
13
[23] M. Golbabaee and P. Vandergheynst, “Compressed sensing of simulta- [44] L. De Lathauwer, B. De Moor, and J. Vandewalle, “A multilinear sin-
neous low-rank and joint-sparse matrices,” arXiv, 2012. gular value decomposition,” SIAM J. on Matrix Analysis. Applications.,
[24] R. Chartrand, “Nonconvex splitting for regularized low-rank+ sparse vol. 21, no. 4, pp. 1253–1278, 2000.
decomposition,” IEEE Trans. Signal Process., vol. 60, no. 11, pp. 5810– [45] E. V. D. Berg and M. P. Friedlander, “Probing the pareto frontier for
5819, 2012. basis pursuit solutions,” SIAM Journal on Scientific Computing, vol.
31, no. 2, pp. 890–912, 2008.
[25] R. Otazo, E. Candès, and D. K. Sodickson, “Low-rank plus sparse
matrix decomposition for accelerated dynamic mri with separation of [46] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human
background and dynamic components,” Mag. Res. in Medicine, vol. 73, segmented natural images and its application to evaluating segmentation
no. 3, pp. 1125–1136, 2015. algorithms and measuring ecological statistics,” in Proc. IEEE ICCV,
2001, vol. 2, pp. 416–423.
[26] M. F. Duarte and R. G. Baraniuk, “Kronecker compressive sensing,”
IEEE Trans. Image Process., vol. 21, no. 2, pp. 494–504, Feb 2012.
[27] N. D. Sidiropoulos and A. Kyrillidis, “Multi-way compressed sensing
for sparse low-rank tensors,” IEEE Signal Process. Letters, vol. 19, no.
11, pp. 757–760, 2012.
[28] S. Friedland, Q. Li, and D. Schonfeld, “Compressive sensing of sparse
tensors,” IEEE Trans. Image Process., vol. 23, no. 10, pp. 4438–4447,
Oct 2014.
[29] C. F. Caiafa and A. Cichocki, “Multidimensional compressed sensing
and their applications,” Wiley Interdisciplinary Reviews: Data Mining
and Knowledge Discovery, vol. 3, no. 6, pp. 355–380, 2013.
[30] C. F. Caiafa and A. Cichocki, “Computing sparse representations of
multidimensional signals using kronecker bases,” Neural Comput., vol.
25, no. 1, pp. 186–220, Jan. 2013.
[31] M. Seibert, J. Wormann, R. Gribonval, and M. Kleinsteuber, “Separable
cosparse analysis operator learning,” in Proc. EUSIPCO, 2014, pp. 770–
774.
[32] F. Roemer, G. D. Galdo, and M. Haardt, “Tensor-based algorithms
for learning multidimensional separable dictionaries,” in Proc. IEEE
ICASSP, 2014, pp. 3963–3967.
[33] Y. Peng, D. Meng, Z. Xu, C. Gao, Y. Yang, and B. Zhang, “De-
composable nonlocal tensor dictionary learning for multispectral image
denoising,” in proc. IEEE CVPR, 2014, pp. 2949–2956.
[34] M.F. Duarte, M.A. Davenport, D. Takhar, J.N. Laska, Ting Sun, K.F.
Kelly, and R.G. Baraniuk, “Single-pixel imaging via compressive
sampling,” IEEE Signal Process. Magazine, vol. 25, no. 2, pp. 83 –91,
March 2008.
[35] R. F. Marcia, Z. T. Harmany, and R. M. Willett, “Compressive coded
aperture imaging,” SPIE 7246, Comput. Imag. VII, p. 72460G, 2009.
[36] V. Majidzadeh, L. Jacques, A. Schmid, P. Vandergheynst, and
Y. Leblebici, “A (256x256) pixel 76.7mW CMOS imager/ compressor
based on real-time in-pixel compressive sensing,” in in Proc. IEEE
ISCAS, June 2010, pp. 2956 –2959.
[37] E.J. Candès and T. Tao, “Decoding by linear programming,” IEEE
Trans. Information Theory, vol. 51, no. 12, pp. 4203 – 4215, Dec.
2005.
[38] J.A. Tropp and A.C. Gilbert, “Signal recovery from random mea-
surements via orthogonal matching pursuit,” IEEE Trans. Information
Theory, vol. 53, no. 12, pp. 4655 –4666, Dec. 2007.
[39] T. Blumensath and M. E. Davies, “Iterative hard thresholding for
compressed sensing,” Applied and Computational Harmonic Analysis,
vol. 27, no. 3, pp. 265–274, 2009.
[40] Y. Rivenson and A. Stern, “Compressed imaging with a separable
sensing operator,” IEEE Signal Process. Letters, vol. 16, no. 6, pp.
449–452, June 2009.
[41] Y. Rivenson and A. Stern, “Practical compressive sensing of large im-
ages,” in Digital Signal Processing, 2009 16th International Conference
on, July 2009, pp. 1–8.
[42] W. Chen, M. R. D. Rodrigues, and I. J. Wassell, “On the use of
unit-norm tight frames to improve the average mse performance in
compressive sensing applications,” IEEE Signal Process. Letters, vol.
19, no. 1, pp. 8–11, 2012.
[43] C. F. Van Loan, “The ubiquitous Kronecker product,” J. comput. applied
mathematics, vol. 123, no. 1, pp. 85–100, 2000.