0% found this document useful (0 votes)
4 views

2017_Joint Sensing Matrix and Sparsifying Dictionary Optimization for Tensor Compressive Sensing

This paper presents a joint optimization approach for Tensor Compressive Sensing (TCS) that enhances performance by optimizing both the sensing matrix and the multilinear sparsifying dictionary. The authors propose a separable method for sensing matrix design and a multidimensional dictionary learning method that accounts for the influence of the sensing matrices. Numerical experiments demonstrate the effectiveness of the proposed methods in improving TCS systems using synthetic data and real images.

Uploaded by

ljw1346940712
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

2017_Joint Sensing Matrix and Sparsifying Dictionary Optimization for Tensor Compressive Sensing

This paper presents a joint optimization approach for Tensor Compressive Sensing (TCS) that enhances performance by optimizing both the sensing matrix and the multilinear sparsifying dictionary. The authors propose a separable method for sensing matrix design and a multidimensional dictionary learning method that accounts for the influence of the sensing matrices. Numerical experiments demonstrate the effectiveness of the proposed methods in improving TCS systems using synthetic data and real images.

Uploaded by

ljw1346940712
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

1

Joint Sensing Matrix and Sparsifying Dictionary


Optimization for Tensor Compressive Sensing
Xin Ding, Student Member, IEEE, Wei Chen, Member, IEEE, and Ian J. Wassell

Abstract—Tensor Compressive Sensing (TCS) is a multidi- Property (RIP) [1], the mutual coherence [6] and the null
mensional framework of Compressive Sensing (CS), and it is space property [2]. These properties have been used to provide
advantageous in terms of reducing the amount of storage, eas- sufficient conditions on sensing matrices and to quantify the
ing hardware implementations and preserving multidimensional worst-case reconstruction performance [2], [6], [7]. Random
arXiv:1601.07804v1 [cs.LG] 28 Jan 2016

structures of signals in comparison to a conventional CS system. matrices such as Gaussian or Bernoulli matrices have been
In a TCS system, instead of using a random sensing matrix and
shown to fulfill these conditions, and hence are widely used
a predefined dictionary, the average-case performance can be
further improved by employing an optimized multidimensional as the sensing matrix in CS applications. In view of the fact
sensing matrix and a learned multilinear sparsifying dictionary. that the mainstream view in the signal processing community
In this paper, we propose a joint optimization approach of considers the average-case performance rather than the worst-
the sensing matrix and dictionary for a TCS system. For the case performance, later on, it is shown that the average-
sensing matrix design in TCS, an extended separable approach case reconstruction performance can be further enhanced by
with a closed form solution and a novel iterative non-separable optimizing the sensing matrix according to the aforementioned
method are proposed when the multilinear dictionary is fixed. conditions, e.g., [8]–[12]. On the other hand, instead of us-
In addition, a multidimensional dictionary learning method that ing a fixed signal-sparsifying basis, e.g., a Discrete Wavelet
takes advantages of the multidimensional structure is derived, Transform (DWT), one can further enhance CS performance
and the influence of sensing matrices is taken into account in the
learning process. A joint optimization is achieved via alternately
by employing a basis which is learned from a training data
iterating the optimization of the sensing matrix and dictionary. set to abstract the basic atoms that compose the signal en-
Numerical experiments using both synthetic data and real images semble. The process of learning such a basis is referred to
demonstrate the superiority of the proposed approaches. as “sparsifying dictionary learning” and it has been widely
investigated in the literature [13]–[17]. In addition, by further
Keywords—Multidimensional system, compressive sensing, tensor exploiting the interaction between the sensing matrix and the
compressive sensing, dictionary learning, sensing matrix optimiza-
tion.
sparsifying dictionary, joint optimization of the two has also
been considered in [18]–[20].
However, in the process of sensing and reconstruction, the
I.
I NTRODUCTION conventional CS framework considers vectorized signals, and
The traditional signal acquisition-and-compression multidimensional signals are mapped in a vector format in a CS
paradigm removes the signal redundancy and preserves system. At the sensing node, such a vectorization requires the
the essential contents of signals to achieve savings on hardware to be capable of simultaneously multiplexing along
storage and transmission, where the minimum sampling all data dimensions, which is hard to achieve especially when
ratio is restricted by the Shannon-Nyquist Theorem at the one of the dimensions is along a timeline. Secondly, a real-
signal sampling stage. The wasteful process of sensing-then- world vectorized signal requires an enormous sensing matrix
compressing is replaced by directly acquiring the compressed that has as many columns as the number of signal elements.
version of signals in Compressive Sensing (CS) [1]–[3], a new Consequently such an approach imposes large demands on the
sampling paradigm that leverages the fact that most signals storage and processing power. In addition, the vectorization
have sparse representations (i.e., there are only a few non-zero also results in a loss of structure along the various dimensions,
coefficients) in some suitable basis. Successful reconstruction the presence of which is beneficial for developing efficient
of such signals is guaranteed for a sufficient number of reconstruction algorithms. For these reasons, applying conven-
randomly taken samples that are far fewer in number than that tional CS to applications that involve multidimensional signals
required in the Shannon-Nyquist Theorem. Therefore CS is is challenging.
very attractive for applications such as medical imaging and Extending CS to multidimensional signals has attracted
wireless sensor networks where data acquisition is expensive growing interests over the past few years. Most of the related
[4], [5]. work in the literature focuses on CS for 2D signals (i.e., matri-
Achieving successful CS reconstruction has been character- ces), e.g., matrix completion [21], [22], and the reconstruction
ized by a number of properties, e.g., the Restricted Isometry of sparse and low rank matrices [23]–[25]. In [26], Kronecker
product matrices are proposed for use in CS systems, which
Xin Ding and Ian J. Wassell are with the Computer Lab, University of makes it possible to partition the sensing process along signal
Cambridge, UK (e-mail: xd225, [email protected]).
Wei Chen is with the State Key Laboratory of Rail Traffic Control and dimensions and paves the way to developing CS for tensors,
Safety, Beijing Jiaotong University, China, and also with the Computer Lab, i.e., signals with two or more dimensions. Tensor CS (TCS)
University of Cambridge, UK (e-mail: [email protected]). has been studied in [27]–[30], where the main focus is on
2

algorithm development for reconstruction. To the best of our RN1 ×...×Nn . The mode-i vectors of a tensor are determined
knowledge, there is no prior work concerning the enhancement by fixing every index except the one in the mode i and the
of TCS via optimizing the sensing matrices at various dimen- slices of a tensor are its two dimensional sections determined
sions in a tensor. In addition, although dictionary learning by fixing all but two indices. By arranging all the mode-i
techniques have been considered for tensors [31]–[33], it is vectors as columns of a matrix, the mode-i unfolding matrix
still not clear how to conduct tensor dictionary learning to X(i) ∈ RNi ×N1 ...Ni−1 Ni+1 ...Nn is obtained. The mode-k tensor
incorporate the influence of sensing matrices in TCS. by matrix product is defined as: Z = X ×k A, where
In this paper, we investigate joint sensing matrix design and A ∈ RJ×Nk , Z ∈ RN1 ×...×Nk−1 ×J×Nk+1 ×...×Nn and it is
dictionary learning for TCS systems. Unlike the optimization calculated by: Z = f oldi (AX(i) ), where f oldi (·) means
for a conventional CS system where a single sensing matrix folding up a matrix along mode i to a tensor. The matrix
and a sparsifying basis for vectorized signals are obtained, we Kronecker product and vector outer product are denoted by
produce a multiplicity of them functioning along various tensor A ⊗ B and a ◦ b, respectively. The lp norm of a vector is
Pn 1
dimensions, thereby maintaining the advantages of TCS. The defined as: ||x||p = ( i=1 |xi |p ) p . For vectors, matrices and
contributions of this work are as follows: tensors, the l0 norm is given by the number of nonzero entries.
• We are the first to consider the optimization of a mul- IN denotes the N × N identity matrix. The operator (·)−1 ,
tidimensional sensing matrix and dictionary for a TCS (·)T and tr(·) represent matrix inverse, matrix transpose and
system and a joint optimization of the two is designed, the trace of a matrix, respectively. The number of elements for
which also includes particular cases of optimizing the a vector, matrix or tensor is denoted by len(·).
sensing matrix for a given multilinear dictionary and
learning the dictionary for a given multidimensional II. C OMPRESSIVE S ENSING (CS) AND T ENSOR
sensing matrix. C OMPRESSIVE S ENSING (TCS)
• We propose a separable approach for sensing matrix
A. Sensing Model
design by extending the existing work for conventional
CS. In this approach, the optimization is proved to be Consider a multidimensional signal X ∈ RN1 ×...×Nn . Con-
separable, i.e., the sensing matrix along each dimension ventional CS takes measurements from its vectorized version
can be independently optimized, and the approach has via:
closed form solution. y = Φx + e, (1)
• We put forth a non-separable method for sensing matrix N
Q
where x ∈ R (N = i Ni ) denotes the vectorized signal,
design using a combination of the state-of-art measures Φ ∈ RM×N (M < N ) is the sensing matrix, y ∈ RM
for sensing matrix optimization. This approach leads to represents the measurement vector and e ∈ RM is a noise
the best reconstruction performance in our comparison, term. The vectorized signal is assumed to be sparse in some
but it is iterative and hence needs more computing power
sparsifying basis Ψ ∈ RN ×N̂ (N ≤ N̂ ), i.e.,
to implement.
• We propose a multidimensional dictionary learning ap- x = Ψs, (2)
proach that couples the optimization of the multidimen-
sional sensing matrix. This approach extends KSVD [14] where s ∈ RN̂ is the sparse representation of x and it has only
and coupled-KSVD [18] to take full advantages of the K (K ≪ N̂ ) non-zero coefficients. Thus the sensing model
multidimensional structure in tensors with a reduced can be rewritten as:
number of iterations required for the update of dictionary y = ΦΨs + e = As + e, (3)
atoms.
The proposed approaches are demonstrated to enhance the where A = ΦΨ ∈ RM×N̂ is the equivalent sensing matrix.
performance of existing TCS systems via the use of extensive Even though CS has been successfully applied to practical
simulations using both synthetic data and real images. sensing systems [34]–[36], the sensing model has a few
The remainder of this paper is organized as follows. Section drawbacks when it comes to tensor signals. First of all, the
II formulates CS and TCS, and introduces the related theory. multidimensional structure presented in the original signal X
Section III reviews the sensing matrix design approaches is omitted due to the vectorization, which loses information
for CS and presents the proposed methods for TCS sensing that can lead to efficient reconstruction algorithms. Besides, as
matrix design. In Section IV, the related dictionary learning stated by (1), the sensing system is required to operate along
techniques are reviewed, followed by the elaboration of the all dimensions of the signal simultaneously, which is difficult
proposed multidimensional dictionary learning approach and to achieve in practice. Furthermore, the size of Φ associated
the joint optimization algorithm is presented. Experimental with the vectorized signal becomes too large to be practical
results are given in Section V and Section VI concludes the for applications involving multidimensional signals.
paper. TCS tackles these problems by utilizing separable sensing
operators along tensor modes and its sensing model is:
A. Multilinear Algebra and Notations
Y = X ×1 Φ1 ×2 Φ2 ... ×n Φn + E, (4)
Boldface lower-case letters, boldface upper-case letters and
M1 ×...Mn
non-boldface letters denote vectors, matrices and scalars, re- where Y ∈ R represents the measurement, E ∈
spectively. A mode-n tensor is an n-dimensional array X ∈ RM1 ×...Mn denotes the noise term, Φi ∈ RMi ×Ni (i =
3


1, ..., n) are sensing matrices and Mi < Ni . The multidimen- satisfies δ2K < 2 − 1; while for the noisy case and the not
sional signal is assumed to be sparse in a separable sparsifying exactly sparse case, the reconstructed signal is still a good
basis Ψi ∈ RNi ×N̂i (i = 1, ..., n), i.e., approximation of the original signal under the same condition.
The theoretical guarantees of successful reconstruction for the
X = S ×1 Ψ1 ×2 Ψ2 ... ×n Ψn , (5) greedy approaches have also been investigated in [38], [39].
where S ∈ RN̂Q 1 ×...N̂n
is the sparse representation that has The RIP essentially measures the quality of the equivalent
sensing matrix A, which closely relates to the design of Φ and
only K (K ≪ i N̂i ) non-zero coefficients. The equivalent
Ψ. However, since the RIP is not tractable, another measure is
sensing model can then be written as:
often used for CS projection design, i.e., the mutual coherence
Y = S ×1 A1 ×2 A2 ... ×n An , (6) of A [6] and it is defined by:
where Ai = Φi Ψi (i = 1, ..., n) are the equivalent sensing µ(A) = max |aTi aj |, (11)
matrices. 1≤i, j≤N̂ , i6=j
Using the TCS sensing model in (4), the sensing procedure where ai denotes the ith column of A. It has been shown
in (1) is partitioned into a few processes having smaller sensing that the reconstruction error of the l1 minimization problem
matrices Φi ∈ RMi ×Ni (i = 1, ..., n) and yet it maintains the is bounded if µ(A) < 1/(4K − 1). Based on the concept of
multidimensional structure of the original signal X. It is also mutual coherence, optimal projection design approaches are
useful to mention that the TCS model in (6) is equivalent to: derived, e.g., in [8], [9], [18].
y = (An ⊗ An−1 ⊗ ... ⊗ A1 )s, (7) When it comes to TCS, the reconstruction approaches for CS
can still be utilized owing to the relationship in (7). However,
as derived in [29]. By denoting A = An ⊗ An−1 ⊗ ... ⊗ A1 , for the algorithms where explicit usage of A is required, e.g,
it becomes a conventional CS model akin to (3), except that OMP, the implementation is restricted by the large dimension
the sensing matrix in (7) has a multilinear structure. of A. By extending the CS reconstruction approaches to
utilize tensor-based operations, TCS reconstruction algorithms
B. Signal Reconstruction employing only small matrices Ai (i = 1, ..., n) have been
developed in [29], [30], [40], [41]. These methods maintain
In conventional CS, the problem of reconstructing s from
the theoretical guarantees of conventional CS when A obeys
the measurement vector y captured using (3) is modeled as a
the condition on the RIC or the mutual coherence, but reduce
l0 minimization problem as follows:
the computational complexity and relax the storage memory
min ||s||0 , s.t. ||y − As|| ≤ ε, (8) requirement.
s
Even so, the conditions on A are not intuitive for a prac-
where ε is a tolerance parameter. Many algorithms have been tical TCS system, which explicitly utilizes multiple separable
developed to solve this problem, including Basis Pursuit (BP) sensing matrices Ai (i = 1, ..., n) instead of a single matrix
[1]–[3], [37], i.e., conducting convex optimization by relaxing A. Fortunately, the authors of [26] have derived the follow-
the l0 norm in (8) as the l1 norm, and greedy algorithms such as ing relationships to clarify the corresponding conditions on
Orthogonal Matching Pursuit (OMP) [38] and Iterative Hard Ai (i = 1, ..., n).
Thresholding (IHT) [39]. The reconstruction performance of Theorem 2: Let Ai (i = 1, ..., n) be matrices with RICs
the l1 minimization approach has been studied in [7], [37], δK (A1 ), ..., δK (An ), respectively, and their mutual coherence
where the well known Restricted Isometry Property (RIP) are µ(A1 ), ..., µ(An ). Then for the matrix A = An ⊗An−1 ⊗
was introduced to provide a sufficient condition for successful ... ⊗ A1 , we have
signal recovery. n
Definition 1: A matrix A satisfies the RIP of order K with a Y
the Restricted Isometry Constant (RIC) δ K being the smallest µ(A) = µ(Ai ), (12)
i=1
number such that n
Y
(1 − δK )||s||22 ≤ ||As||22 ≤ (1 + δK )||s||22 (9) δK (A) ≤ (1 + δK (Ai )) − 1. (13)
i=1
holds for all s with ||s||0 ≤ K. √ 
Theorem 1: Assume that δ2K < 2 − 1 and ||e||2 ≤ ε. 
Then the solution ŝ to (8) obeys In [26], these relationships are then utilized to derive the
reconstruction error bounds for a TCS system.
||ŝ − s||2 ≤ C0 K −1/2 ||s − sK ||1 + C1 ε (10)
√ √
where C0 = 2+(2 √
2−2)δ2K
1−( 2+1)δ2K
, C1 = 4 √1+δ2K
1−( 2+1)δ2K
, δ2K is the III. O PTIMIZED M ULTILINEAR P ROJECTIONS FOR TCS
RIC of matrix A, sK is an approximation of s with all but the In this section, we show how to optimize the multilinear
K largest entries set to zero.  sensing matrix when the dictionaries Ψi (i = 1, ..., n) for each
The previous theorem states that for the noiseless case, any dimension are fixed. We first introduce the related design ap-
sparse signal with fewer than K non-zero coefficients can be proaches for CS, then present the proposed methods for TCS,
exactly recovered if the RIC of the equivalent sensing matrix including a separable and a non-separable design approach.
4

A. Sensing Matrix Design for CS B. Multidimensional Sensing Matrix Design for TCS
We observe that the sufficient conditions on the RIC or In contrast to the aforementioned methods, we consider
the mutual coherence for successful CS reconstruction, as optimization of the sensing matrix for TCS. Compared to the
reviewed in Section II-B, only describe the worst case bound, design process in conventional CS, the main distinction for
which means that the average recovery performance is not the TCS is that we would like to optimize multiple separable
reflected. In fact, the most challenging part of CS sensing sensing matrices Φi (i = 1, ..., n), rather than a single matrix
matrix design lies in deriving a measure that can directly reveal Φ. In this section, in addition to extending the approaches in
the expected-case reconstruction accuracy. (14) and (15) to the TCS case, we also propose a new approach
In [8], Elad et al. proposed the notion of averaged mutual for TCS sensing matrices design by combining the state-of-
coherence, based on which an iterative algorithm is derived for art ideas in [10], [12], [20]. To simplify our exposition, we
optimal sensing matrix design. This approach aims to minimize elaborate our methods in the following sections for the case
the largest absolute values of the off-diagonal entries in the of n = 2, i.e., the tensor signal becomes a matrix, but note that
Gram matrix of A, i.e., GA = AT A. It has been shown the methods can be straightforwardly extended to an n mode
to outperform a random Gaussian sensing matrix in terms of tensor case (n > 2).
reconstruction accuracy, but is time-consuming to construct As reviewed in Section II-B, the performance of existing
and can ruin the worst case guarantees by inducing large off- TCS reconstruction algorithms relies on the quality of A,
diagonal values that are not in the original Gram matrix. In where A = A2 ⊗ A1 when n = 2. Therefore, when the
order to make any subset of columns in A as orthogonal multilinear dictionary Ψ = Ψ2 ⊗Ψ1 is given, one can optimize
as possible, Sapiro et al. proposed in [18] to make GA as Φ (where Φ = Φ2 ⊗ Φ1 ) using the methods for CS as
close as possible to an identity matrix, i.e., ΨT ΦT ΦΨ ≈ IN̂ . introduced in Section III-A.
It is then approximated by minimizing ||Λ − ΛΓT ΓΛ||2F , However, when implementing a TCS system, it is still
where Γ comes from the eigen-decomposition of ΨT Ψ, i.e., necessary to obtain the separable matrices, i.e., Φ1 and Φ2 .
ΨT Ψ = VΛVT , and Γ = ΦV. This approach is also One intuitive solution is to design Φ using the aforementioned
iterative, but outperforms Elad’s method. Considering the fact approaches for CS and then to decompose Φ by solving the
that A has minimum coherence when the magnitudes of all following problem:
the off-diagonal entries of GA are equal, Xu et al. proposed
an Equiangular Tight Frame (ETF) based method in [9]. The min ||Φ − Φ2 ⊗ Φ1 ||2F , (16)
problem is modeled as: minGt ∈H ||ΨT ΦT ΦΨ−Gt ||2F , where Φ1 ,Φ2

Gt is the target Gram matrix and H is the set of the ETF


Gram matrices. Improved performance has been observed for which has been studied as a Nearest Kronecker Product (NKP)
the obtained sensing matrix. problem in [43]. But this is not a feasible solution for TCS
sensing matrix design. First of all, Φ can only be exactly
More recently, based on the same idea as Sapiro, the
decomposed as Φ2 ⊗ Φ1 when a certain permutation of Φ has
problem of
rank 1 [43], which is not the case for most sensing strategies.
min ||IN̂ − ΨT ΦT ΦΨ||2F (14) When the term in (16) is minimized to a non-zero value, the
Φ
solution Φ̂1 , Φ̂2 leads to a sensing matrix Φ̂2 ⊗ Φ̂1 , which
has been considered and an analytical solution has been derived may not satisfy the condition of the sensing matrix Φ for
in [11]. Meanwhile, in [10], [42], it has been shown that good CS recovery (e.g., the requirement on the mutual coher-
in order to achieve good expected-case Mean Squared Error ence), thereby ruining the reconstruction guarantees. Secondly,
(MSE) performance, the equivalent sensing matrix ought to be to solve (16), explicit storage of Φ is necessary, which is
close to a Parseval tight frame, thus leading to the following restrictive for high dimensional problems. In addition, when
design approach: the number of tensor modes increases, the problem becomes
min ||Φ||2F , s.t. ΦΨΨT ΦT = IM , (15) more complex to solve.
Φ Therefore, we aim to optimize Φ1 and Φ2 directly without
where ||Φ||2F is the sensing cost that also affects the recon- knowing Φ. Extending (14) and (15), we first propose a
struction accuracy (as verified in [10], [42]). A closed form method that is shown to be separable as independent sub-
solution to this problem was also obtained in [10], [42]. These design-problems. Then a non-separable design approach is
approaches have further improved the average reconstruction presented and a gradient based algorithm is derived.
performance for a CS system that is able to employ the 1) A Separable Design Approach: The proposed separable
optimized sensing matrix. design approach (Approach I) is as follows:
On the other hand, using the model of Xu’s method [9],
Cleju [12] proposed to take Gt = ΨT Ψ so that the equivalent min ||IN̂1 N̂2 −(ΨT2 ⊗ΨT1 )(ΦT2 ⊗ΦT1 )(Φ2 ⊗Φ1 )(Ψ2 ⊗Ψ1 )||2F ,
Φ1 ,Φ2
sensing matrix has similar properties to those of Ψ; and Bai et (17)
al. [20] proposed combining the ETF Grams and that proposed and it is an extension of (14) to the case when a multilinear
by Cleju to solve: minGt ∈H (1 − β)||ΨT Ψ − ΨT ΦT ΦΨ||2F + sensing matrix is employed. The solution of (17) is presented
β||Gt − ΨT ΦT ΦΨ||2F , where β is a trade-off parameter. in Theorem 3 and Approach I is also summarized in Algorithm
Promising results of these methods are demonstrated. 1.
5

Theorem
 3: Assume
 for i = 1, 2, N̄i = rank(Ψi ), Ψi = which represent the separable sub-problems of the following
ΛΨi 0 design approach:
UΨi T
VΨi is an SVD of Ψi and ΛΨi ∈ RN̄i ×N̄i .
0 0
Let Φ̂i ∈ RMi ×Ni (i = 1, 2) be matrices with rank(Φ̂i ) = min ||Φ2 ⊗ Φ1 ||2F , (22)
Φ1 ,Φ2
Mi and Mi ≤ N̄i is assumed. Then
s.t. (Φ2 ⊗ Φ1 )(Ψ2 ⊗ Ψ1 )(ΨT2 ⊗ ΨT1 )(ΦT2 ⊗ ΦT1 ) = IM1 M2 ,
• the following equation is a solution to (17):
 T −1  and it is in fact a multidimensional extension of the CS sensing
V ΛΨi 0
Φ̂i = U [ IMi 0 ] UTΨi , (18) matrix design approach proposed in [10]. 
0 0
Proof: Since the equivalent sensing matrices designed using
Approach I are Parseval tight frames, it follows from the
where i = 1, 2, U ∈ RMi ×Mi and V ∈ RN̄i ×N̄i are
derivation in [10] that the sub-problems in (21) have the
arbitrary orthonormal matrices;
same solution as in (18). The problem in (22) can be proved
• the resulting equivalent sensing matrices Âi = separable simply by revealing the fact that ||Φ2 ⊗ Φ1 ||2F =
Φ̂i Ψi (i = 1, 2) are Parseval tight frames, i.e.,
||Φ2 ||2F ||Φ1 ||2F , and when Φi Ψi ΨTi ΦTi = IMi is satisfied for
||ÂTi z||2 = ||z||2 , where z ∈ RN̂i is an arbitrary vector. both i = 1 and 2, the constraint in (22) is also satisfied. 
• the minimum of (17) is N̂1 N̂2 − M1 M2 ; By decomposing the original problems into independent
• separately solving the sub-problems sub-problems, the sensing matrices can be designed in parallel
and the problem becomes easier to solve. However, the CS
min ||IN̂i − ΨTi ΦTi Φi Ψi ||2F (19) sensing matrix design approaches are not always separable
Φi
after being extended to the multidimensional case, because
for i = 1, 2 leads to the same solutions as (18) and the a variety of different criteria can be used for sensing matrix
resulting objective in (17) has the same minimum, i.e., design as reviewed in Section III-A, and in many cases
N̂1 N̂2 − M1 M2 .  the decomposition is not provable. We will propose a non-
Proof: The proof is given in Appendix A. separable approach in the following section.
2) A Non-separable Design Approach: Taking into account:
Algorithm 1 Design Approach I i) the impact of sensing cost on reconstruction performance
[10]; ii) the benefit of making the equivalent sensing matrix
Input: Ψi (i = 1, 2). so that it has similar properties to those of the sparsifying
Output: Φ̂i (i = 1, 2). dictionary [12]; and iii) the conventional requirement on the
1: for i = 1, 2 do mutual coherence, we put forth the following Design Approach
2: Calculate optimized Φ̂i using (18); II:
3: end √
4: Normalization for i = 1, 2: Φ̂i = Ni Φ̂i /||Φ̂i ||F . min (1 − β)||(Ψ)T Ψ − (Ψ)T (Φ)T ΦΨ||2F
Φ1 ,Φ2

Clearly, Approach I is separable, which means that we can + α||Φ||2F + β||IN̂1 N̂2 − (Ψ)T (Φ)T ΦΨ||2F , (23)
independently design each Φi according to the corresponding
sparsifying dictionary Ψi in mode i. This observation stays where Ψ = Ψ2 ⊗ Ψ1 , Φ = Φ2 ⊗ Φ1 , α and β are tuning
consistent when we consider the situation in an alternative way. parameters. As investigated in [10] and [20], α ≥ 0 controls
Applying the method in (14) to acquire the optimal Φ1 and Φ2 the sensing energy; while β ∈ [0, 1] balances the impact of
independently, we are actually trying to make any subset of the first and third terms to achieve optimal performance under
columns in A1 and A2 , respectively, as orthogonal as possible. different conditions of the measurement noise. The choice of
As a result, the matrix A = A2 ⊗ A1 that is obtained will these parameters will be investigated in Section V-A.
also be as orthogonal as possible. This follows from the fact To solve (23), we adopt a coordinate descent method.
that for any two columns of A, we have Denoting the objective as f (Φ1 , Φ2 ), we first compute its
gradient with respect to Φ1 and Φ2 , respectively, and the result
|aTp aq | = |[(a2 )Tl ⊗ (a1 )Ts ][(a2 )c ⊗ (a1 )d ]| is as follows:
= |[(a2 )Tl (a2 )c ][(a1 )Ts (a1 )d ]|, (20)
∂f
where a, a1 and a2 denote the column of A, A1 and A2 , = 4||GAj ||2F (Ai GAi ΨTi ) − 4β||Aj ||2F (Ai ΨTi )
respectively, and p, q, l, s, c, d are the column indices. ∂Φi
Using the second statement of Theorem 3, we can derive + 2α||Φj ||2F Φi + 4(β − 1)||Ψj ATj ||2F (Ai GΨi ΨTi ),
the following corollary. (24)
Corollary 1: The solution in (18) also solves the following
problems for i = 1, 2: where i, j ∈ {1, 2} and j 6= i, GAi = ATi Ai and GΨi =
ΨTi Ψi .
min ||Φi ||2F , s.t. Φi Ψi ΨTi ΦTi = IMi , (21) For generality, we also provide the result for the n > 2 case
Φi
6

as follows: where S = [ s1 ... sT ] is the sparse representation with


∂f size N̂ × T , γ > 0 is a tuning parameter and Y ∈ RM×T
= 4ωi (Ai GAi ΨTi ) − 4βθi (Ai ΨTi ) contains the measurement vectors taken by the sensing matrix
∂Φi
Φ ∈ RM×N , i.e., Y = [ y1 ... yT ] and Y = ΦX + E
+ 2ατi Φi + (4β − 4)ρi (Ai GΨi ΨTi ), (25)
with E ∈ RM×T representing the noise. Then the problem in
where i, j ∈ {1, ..., n} and j 6= i, ωi = j ||GAj ||2F , θi = (27) is reformatted as:
Q
2 2 T 2
min ||Z − DS||2F , s.t. ∀i, ||si ||0 ≤ K,
Q Q Q
j ||Aj ||F , τi = j ||Φj ||F , ρi = j ||Ψj Aj ||F . (28)
With the gradient obtained, we can solve (23) by alterna- Ψ,S

tively updating Φ1 and Φ2 as follows: T T


where Z = γXT YT , D = γIN ΦT
 
Ψ. The
(t+1) (t) ∂f problem can then be solved following the conventional KSVD
Φi = Φi − η , (26) algorithm [14] and conducting proper normalization.
∂Φi
Specifically, with an initial arbitrary Ψ, it first recovers S
where η > 0 is a step size parameter. The algorithm for solving using some available algorithms, e.g., OMP. Then the objective
(23) is summarized in Algorithm 2. in (28) is rewritten as:
Algorithm 2 Design Approach II min ||R̃p − dp s̃Tp ||2F , (29)
Ψ,S
(0)
Input: Ψi (i = 1, 2), Φi (i = 1, 2), α, β, η, t = 0. where p is the index of the current atom we aim to update,
Output: Φ̂i (i = 1, 2). s̃Tp is the row of S where the zeros have been removed,
Rp = Z − q6=p dq sTq and R̃p denotes the columns of Rp
P
1: Repeat
2: for i = 1, 2 do corresponding to s̃Tp . Let R̃p = UR ΛR VR
T
be a SVD of R̃p ,
(t+1) (t) ∂f ∂f
3: Φi = Φi − η ∂Φ i
, where ∂Φ i
is given by (24); then the highest component of the coupled error R̃p can be
4: end eliminated by defining:
5: t = t + 1;
ψ̂ p = (γ 2 IN + ΦT Φ)−1 γIN ΦT u1R ,
 
6: Until a stopping criteria is met. √ (30)
7: Normalization for i = 1, 2: Φ̂i = Ni Φi /||Φi ||F . 1
s̃p = ||ψ̂ p ||2 λ1R vR , (31)

Till now, we have considered optimizing the multidimen- where λ1R is the largest singular value of R̃p and 1
u1R ,
vR
sional sensing matrix when the sparsifying dictionaries for are the corresponding left and right singular vectors. The
each tensor mode are given. For the purpose of joint optimiza- update column p of Ψ is obtained after normalization: ψ̂ p =
tion, we will proceed to optimize the dictionaries by coupling ψ̂ p /||ψ̂p ||2 . The above process is then iterated to update every
fixed sensing matrices. The joint optimization will eventually atom of Ψ.
be achieved by alternatively optimizing the sensing matrices Clearly the sensing matrix has been taken into account
and the sparsifying dictionaries. during the dictionary learning process, which has been shown
to be beneficial for CS reconstruction performance [18]. In
IV. J OINTLY L EARNING T HE M ULTIDIMENSIONAL order to learn multidimensional separable dictionaries for high
D ICTIONARY AND S ENSING M ATRIX dimensional signals, and to achieve joint optimization of the
multidimensional dictionary and sensing matrix, we will derive
In this section, we first propose a sensing-matrix-coupled a coupled-KSVD algorithm for a tensor, i.e., cTKSVD, in the
method for multidimensional sparsifying dictionary learning. following section. Again for simplicity we will still describe
Then it is combined with the previously introduced optimiza- the main flow for 2-D signals, i.e., n = 2.
tion approach for a multilinear sensing matrix to yield a joint
optimization algorithm. In the spirit of the coupled KSVD B. The cTKSVD Approach
method [18], our approach for dictionary learning can be Consider a training sequence of 2-D signals X1 , ..., XT , we
viewed as a sensing-matrix-coupled version of a tensor KSVD obtain a tensor X ∈ RN1 ×N2 ×T by stacking them along the
algorithm. We start by briefly introducing the coupled KSVD third dimension. Denoting the stack of the sparse represen-
method.
tations Si ∈ RN̂1 ×N̂2 , (i = 1, ..., T ) by S ∈ RN̂1 ×N̂2 ×T ,
we propose the following optimization problem to learn the
A. Coupled KSVD multidimensional dictionary:
The Coupled KSVD (cKSVD) [18] is a dictionary learning min ||Z − S ×1 D1 ×2 D2 ||2F , s.t., ∀i, ||Si ||0 ≤ K, (32)
approach for vectorized signals. Let X = [ x1 ... xT ] be Ψ1 ,Ψ2 ,S
a N × T matrix containing a training sequence of T signals in which
x1 , ..., xT . The cKSVD aims to solve the following problem, 
γ2X γY2

i.e., to learn a dictionary Ψ ∈ RN ×N̂ from X: Z= , Y i = X ×i Φi + Ei , (33)
γY1 Y
min γ||X − ΨS||2F + ||Y − ΦΨS||2F , s.t. ∀i, ||si ||0 ≤ K,
   
γIN̂1 γIN̂2
Ψ,S D1 = Ψ1 , D2 = Ψ2 , (34)
(27) Φ1 Φ2
7

and γ > 0 is a tuning parameter. of the atom for current update. A HOSVD is carried out for
The problem in (32) aims to minimize the representation R̃p2 and the update steps corresponding to (37) - (40) now
error ||X − S ×1 Ψ1 ×2 Ψ2 ||2F and the overall projection error become:
||Y − S ×1 A1 ×2 A2 ||2F with constraints on the sparsity of
each slice of the tensor. In addition, it also takes into account (d̂2 )p2 = vR1
, D1 S̃:,p2 ,: = u1R ◦ (λ1R ω 1R ), (42)
the projection errors induced by Φ1 and Φ2 individually. T
2 −1
 T
 1
Using an available sparse reconstruction algorithm for the (ψ̂ 2 )p2 = (γ IN2 + Φ2 Φ2 ) γIN2 Φ2 vR , (43)
TCS, e.g., Tensor OMP (TOMP) [30], and initial dictionaries (ψ̂ 2 )p2 = (ψ̂ 2 )p2 /||(ψ̂ 2 )p2 ||2 , (44)
Ψ1 , Ψ2 , the sparse representation S can be estimated first.
Then we update the multilinear dictionary alternately. We first D1 S̃:,p2 ,: = ||(ψ̂ 2 )p2 ||2 u1R ◦ (λ1R ω 1R ), (45)
update the atoms of Ψ1 with Ψ2 fixed. The objective in (32)
is rewritten as: in which S̃:,p2 ,: represents the lateral slice at index p2 and
its updated elements can also be calculated using LS. The
X
||Rp1 − (d1 )p1 ◦ (d2 )q2 ◦ s(p1 −1)N̂2 +q2 ||2F , (35)
q2
dictionary Ψ2 is then updated iteratively. The whole process
P P of updating S, Ψ1 , Ψ2 is repeated to obtain the final solution
where Rp1 = Z − q1 6=p1 q2 (d1 )q1 ◦ (d2 )q2 ◦ s(q1 −1)N̂2 +q2 ; of (32).
p1 is the index of the atom for the current update and q1 , q2 The uncoupled version of the proposed cTKSVD method
denote the indices of the remaining atoms of Ψ1 and all the (denoted by TKSVD) can be easily obtained by modifying the
atoms of Ψ2 , respectively; d1 , d2 are columns of D1 , D2 ; s is problem in (32) to:
the mode-3 vector of S. Then to satisfy the sparsity constraint
in (32), we only keep the non-zero entries of s(p1 −1)N̂2 +q2 and
min ||X − S ×1 Ψ1 ×2 Ψ2 ||2F , s.t. ∀i, ||Si ||0 ≤ K, (46)
the corresponding subset of Rp1 to obtain: Ψ1 ,Ψ2 ,S
X
||R̃p1 − (d1 )p1 ◦ (d2 )q2 ◦ s̃(p1 −1)N̂2 +q2 ||2F . (36) and it can be solved following the same procedures as
q2
described previously for cTKSVD except that the steps of
Assuming that after carrying out a Higher Order SVD pseudo-inverse and normalization are no longer needed.
(HOSVD) [44] for R̃p1 , the largest singular value is λ1R and The proposed cTKSVD for multidimensional dictionary
the corresponding singular vectors are u1R , vR
1
and ω 1R , we learning is different to the KHOSVD method [32], i.e., another
eliminate the largest error by: tensor-based dictionary learning approach obtained by extend-
ing the KSVD method. The learning process of KHOSVD
1
(d̂1 )p1 = u1R , D2 S̃p1 ,:,: = vR ◦ (λ1R ω1R ), (37) follows the same train of thought as with the conventional
where S̃p1 ,:,: denotes the horizontal slice of S at index p1 that KSVD method, except that to eliminate the largest error in each
contains only non-zero mode-2 vectors. The atom of Ψ1 is iteration, a HOSVD [44], i.e., SVD for tensors, is employed.
then calculated using the pseudo-inverse as: However, the process of KHOSVD does not take full advantage
of the multilinear structure and involves duplicated updating
(ψ̂ 1 )p1 = (γ 2 IN1 + ΦT1 Φ1 )−1 γIN1 ΦT1 u1R . (38)
 
of the atoms, which leads to a slow convergence speed. The
proposed cTKSVD approach is distinct from KHOSVD in the
The current update is then obtained after normalization: following respects. First, during the update of each atom, a
(ψ̂ 1 )p1 slice of the coefficient is updated accordingly in cTKSVD;
(ψ̂ 1 )p1 = , (39) while only a vector is updated in KHOSVD. Therefore, in
||(ψ̂ 1 )p1 ||2 cTKSVD, each iteration of the outer loop contains N̂1 + N̂2
1
D2 S̃p1 ,:,: = ||(ψ̂ 1 )p1 ||2 vR ◦ (λ1R ω 1R ). (40) inner iterations, which is N̂1 N̂2 for KHOSVD (and for KSVD).
It means that cTKSVD requires HOSVD to be executed
Since D2 and the support indices of each mode-2 vector N̂1 N̂2 − N̂1 − N̂2 fewer times than for the KHOSVD method
in S̃p1 ,:,: are known, the updated coefficients S̃p1 ,:,: can be and hence reduces the complexity. In addition, KHOSVD does
easily calculated by the Least Square (LS) solution. The above not take into account the influence from the sensing matrix.
process is repeated for all the atoms to update the dictionary The benefit of coupling of the sensing matrices in cTKSVD
Ψ1 . will be shown by simulations in Section V-B.
The next step is to update Ψ2 with the obtained Ψ1 fixed.
It follows a similar procedure to that described previously. Here, we also provide the problem formulation when one
Specifically, the objective in (32) is rewritten as: needs to learn 3-D sparsifying dictionaries. The cTKSVD for
X cases where n > 2 can be modeled following a similar strategy.
||R̃p2 − (d1 )q1 ◦ (d2 )p2 ◦ s̃(q1 −1)N̂2 +p2 ||2F , (41) For a training sequence consisting of T stacked 3-D signals
q1 X ∈ RN1 ×N2 ×N3 ×T , we learn the dictionaries by solving:
where s̃ is the mode-3 vector with only non-zero entries,
is the corresponding subset of Rp2 , Rp2 = Z −
R̃p2 P min ||Z−S×1 D1 ×2 D2 ×3 D3 ||2F , s.t., ∀i, ||Si ||0 ≤ K,
P Ψ1 ,Ψ2 ,Ψ3 ,S
q1 q2 6=p2 (d1 )q1 ◦ (d2 )q2 ◦ s(q1 −1)N̂2 +q2 and p2 is the index (47)
8

11
4 x 10
in which 10 3

γ 2 G1
   
γG2 γIN̂1 6
Z= , D1 = Ψ1 , (48) 10
γG3 G4 Φ1 2.5

MSE

MSE
    8
10
γIN̂2 γIN̂3
D2 = Ψ2 , D3 = Ψ3 , (49)
Φ2 Φ3 10
10
2

and if we denote the operator “ր3 ” as stacking tensors along 12


10 1.5
their third mode, then in the above formulation of Z, 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2
β α
G1 = (γX) ր3 (Y 3 ), G2 = (γY 2 ) ր3 (Y23 ), (a) (b)
0.128
G3 = (γY 1 ) ր3 (Y 13 ), G4 = (γY) ր3 (Y 12 ),
0.5 0.127
Y i = X ×i Φi + Ei , Yij = X ×i Φi ×j Φj + Eij . (50) 10
0.126

MSE

MSE
The problem can then be solved following similar steps to 0.7
10 0.125
those introduced earlier in this section.
We have now derived the method of learning the sparsifying 0.9
0.124
10
dictionaries when the multilinear sensing matrix is fixed. 0.123
0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2
Combining this approach with the methods of optimizing the β α
sensing matrices elaborated in Section III-B, we can then (c) (d)
jointly optimize Φ1 , Φ2 and Ψ1 , Ψ2 by alternating between
Fig. 1: MSE performance of sensing matrices generated by
them. The overall procedure is summarized in Algorithm 3.
Approach II with different values of α and β. (a) σ 2 = 0, α =
1; (b) σ 2 = 0, β = 0.8; (c) σ 2 = 10−2 , α = 1; (d) σ 2 =
Algorithm 3 Joint Optimization 10−2 , β = 0.2.
(0) (0)
Input: Ψi (i = 1, 2), Φi (i = 1, 2), X, α, β, η, γ,
iter = 0.
Output: Φ̂i (i = 1, 2), Ψ̂i (i = 1, 2).
1: Repeat until convergence: A. Optimal Multidimensional Sensing Matrix
(iter) (iter+1)
2: For Ψ̂i (i = 1, 2) fixed, optimize Φ̂i (i = This section is intended to examine the proposed separable
1, 2) using one of the approaches given in Section III-B; approach I and non-separable approach II for multidimensional
(iter) (iter+1) sensing matrix design. Before doing so, we first test the tuning
3: For Ψ̂i , Φ̂i (i = 1, 2) fixed, solve (32) using parameters for Approach II, i.e., the non-separable design
TOMP to obtain Ŝ; approach presented in Section III-B-2. As detailed in Section
4: For p1 = 1 to N̂1 III-B-1, Approach I has a closed form solution and there are
5: Compute R̃p1 using (32) - (35); no tuning parameters involved.
6: Do HOSVD to R̃p1 to obtain λ1R , u1R , vR 1
and ω1R ; We evaluate the Mean Squared Error (MSE) performance
(iter+1) of different sensing matrices generated using Approach II with
7: Update (ψ̂ 1 )p1 , D2 S̃p1,:,: using (38) - (40) and
various parameters and the results are reported by averaging
calculate S̃p1,:,: by LS;
over 500 trials. A random 2D signal S ∈ R64×64 with sparsity
8: end
K = 80 is generated, where the randomly placed non-zero
9: For p2 = 1 to N̂2 elements follow an i.i.d zero-mean unit-variance Gaussian
10: Compute R̃p2 using (32) and (41); distribution. Both the dictionaries Ψi ∈ R64×256 (i = 1, 2)
11: Do HOSVD to R̃p2 to obtain λ1R , u1R , vR 1
and ω1R ; and the initial sensing matrices Φi ∈ R40×64 (i = 1, 2)
(iter+1)
12: Update (ψ̂ 2 )p2 , D1 S̃:,p2 ,: using (43) - (45) and are generated randomly with i.i.d zero-mean unit-variance
calculate S̃:,p2 ,: by LS; Gaussian distributions, and the dictionaries are then column
13: end normalized
√ while the sensing matrices are normalized by:
14: iter = iter + 1; Φi = 64Φi /||Φi ||F . When taking measurements, random
additive Gaussian noise with variance σ 2 is induced. A con-
stant step size η = 1e − 7 is used for Approach II and the BP
solver SPGL1 [45] is employed for reconstructions.
V. E XPERIMENTAL R ESULTS Fig. 1 illustrates the results for the parameter tests. In Fig. 1
In this section, we evaluate the proposed approaches via (a) and (c), the parameter β is evaluated for the noiseless (σ 2 =
simulations using both synthetic data and real images. We first 0) and high noise (σ 2 = 10−2 ) cases, respectively, when α =
test the sensing matrix design approaches proposed in Section 1. From both (a) and (c), we can see that when β = 0 or 1, the
III-B with the sparsifying dictionaries being given. Then the MSE is larger than that for the other values, which means that
cTKSVD approach is evaluated when the sensing matrices are both terms of Approach II that are controlled by β are essential
fixed. Finally the experiments for the joint optimization of the for obtaining optimal sensing matrices. In addition, we can
two are presented. see that when β becomes larger in the range of [0.1, 0.9], the
9

2
10
Approach I Approach I
0
10 Approach II Approach II
SS SS
Gaussian 0 Gaussian
10
MSE

MSE
−2
10
−2
10

−4
10 −4
10

30 35 40 45 50 50 100 150 200


M (i=1, 2) K
i

(a) (a)

2 Approach I 2
10 10
Approach II
SS
0
Gaussian 0
10 10

MSE
MSE

−2 −2
10 10

−4 −4
10 10 Approach I
Approach II
SS
−6 −6
10 10 Gaussian

30 35 40 45 50 50 100 150 200


M (i=1, 2)
i

(b) (b)

Fig. 2: MSE performance of different sensing matrices for Fig. 3: MSE performance of different sensing matrices for (a)
(a) the BP, (b) the OMP when Mi (i = 1, 2) varies. the BP, (b) the OMP when K varies. (M1 = M2 = 40, N1 =
(K = 80, N1 = N2 = 64, N̂1 = N̂2 = 256 and σ 2 = 10−4 ) N2 = 64, N̂1 = N̂2 = 256 and σ 2 = 10−4 )

MSE decreases slightly in (a), but increases slightly in (b). This [18]. We hence also include it in the comparisons and denote it
indicates the choice of β under different conditions of sensing by Separable Sapiro’s approach (SS). The previously described
noise, which is consistent with that observed in [20]. Thus in synthetic data is generated for the experiments and both BP
the remaining experiments, we take β = 0.8 when sensing and OMP are investigated for the reconstruction.
noise is low and β = 0.2 when the noise is high. Fig. 1 (b) Different sensing matrices are first evaluated using BP and
and (d) demonstrate the MSE results for the tests of parameter OMP when the number of measurements varies. A small
α. It is observed that α = 1 is optimal for the noiseless case amount of noise (σ 2 = 10−4 ) is added when taking measure-
while it becomes α = 0.6 when high noise exists. Therefore a ments and the parameters are chosen as: α = 1, β = 0.8.
larger α is preferred when low noise is involved, which needs From Fig. 2, it can be observed that both the proposed
to be reduced accordingly when the noise becomes higher. approaches perform much better than the Gaussian sensing
We then proceed to examine the performance of both the matrices, among which Approach II has better performance.
proposed approaches. As this is the first work to optimize the In general, the SS method performs worse than Approach I,
multidimensional sensing matrix, we take the i.i.d Gaussian although the difference is not obvious at some points. Note that
sensing matrices that are commonly used in CS problems SS is an iterative method while Approach I is non-iterative.
for comparison. Besides, since Sapiro’s approach [18] has the The proposed approaches are again observed to be superior
same spirit to that of Approach I (as reviewed in Section III-A), to the other methods when the number of measurements is
it can be easily extended to the multidimensional case, i.e., fixed but the signal sparsity K is varied, as shown in Fig. 3.
individually generating Φi (i = 1, 2) using the approach in Compared to Approach I, Approach II exhibits better perfor-
10

0.045 0.04
cTKSVD (γ=1/256) (MSE: 0.0186) cTKSVD
cTKSVD (γ=1/128) (MSE: 0.0185) cKSVD
0.04
cTKSVD (γ=1/64) (MSE: 0.0183) 0.035 TKSVD
BCDEFG
cTKSVD (γ=1/32) (MSE: 0.0184)
0.035
cTKSVD (γ=1/16) (MSE: 0.0185) 0.03
cKSVD (γ=1/32) (MSE: 0.0203)

MSE

 0.03
A
0.025
0.025

0.02
0.02

0.015 0.015
0 20 40 60 80 100 1000 2000 3000 4000 5000 6000
I  T
(a) (a)

0.02 0.04
cTKSVD (γ=1/256) (MSE: 0.0336) cTKSVD
cTKSVD (γ=1/128) (MSE: 0.0335) cKSVD
0.035 TKSVD
cTKSVD (γ=1/64) (MSE: 0.0336)
PQRSTU
cTKSVD (γ= ! "#$%& '()*+,-
0.015
cTKSVD (γ./0123 45678 9:;<>?@ 0.03
cKSVD (γ=1/64) (MSE: 0.0350)

MSE
ARE

0.025
0.01

0.02

0.005 0.015 H6 J5 L4 M3 N2 O1
0 20 40 60 80 100 10 10 10 10 10 10
 σ2

(b) (b)

Fig. 4: Convergence behavior of cTKSVD with different values Fig. 5: MSE performance of different dictionaries when (a) T
of γ compared to that of cKSVD with its optimal parameter varies (σ 2 = 0), (b) σ 2 varies (T = 5000). (K = 4, M1 =
setting when (a) M1 = M2 = 7; (b) M1 = M2 = 3. M2 = 7, N1 = N2 = 10, N̂1 = N̂2 = 18)

mance, but at the cost of higher computational complexity and


the proper choice of the parameters.
Gaussian matrices are employed as the sensing √ matrices Φi ∈
RMi ×10 (i = 1, 2), normalized by: Φi = 10Φi /||Φi ||F .
TOMP [29] is utilized in both the training stage and the
B. Optimal Multidimensional Dictionary with the Sensing reconstructions of the test stage for tensor-based approaches
Matrices Coupled and OMP is employed for the vector-based approaches.
In this section, we evaluate the proposed cTKSVD method We first investigate the convergence behavior of the
with a given multidimensional sensing matrix. A training cTKSVD approach and examine the choice of the parameter γ.
sequence of 5000 2D signals (T = 5000) is generated, i.e., We define the Average
p Representation Error (ARE) [14], [19]
S ∈ R18×18×5000 , where each signal has K = 4 (2 × 2) of cTKSVD as: ||Z − S ×1 D1 ×2 D2 ||2F /len(Z), where Z,
randomly placed non-zero elements that follow an i.i.d zero- D1 and D2 have the same definitions as in (32). Fig. 4 shows
mean unit-variance Gaussian distribution. The dictionaries the AREs of cTKSVD at different numbers of iterations for
Ψi ∈ R10×18 (i = 1, 2) are also drawn from i.i.d Gaus- different values of γ. The cKSVD method [18] (reviewed in
sian distributions, followed by normalization such that they Section III-A) is also tested and only the results of the optimal
have unit-norm columns. The time-domain training signals γ are displayed in Fig. 4. Note that cKSVD learns a single
X ∈ R10×10×5000 are then formed by: X = S ×1 Ψ1 ×2 Ψ2 . dictionary Ψ ∈ R100×324 , rather than the separable multilinear
The test data of size 10 × 10 × 5000 is generated following dictionaries Ψi ∈ R10×18 (i = 1,p2). The ARE of cKSVD
the same procedure. Random Gaussian noise with variance σ 2 is thus modified accordingly as: ||Z − DS||2F /len(Z), in
is added to both the training and test data. Two i.i.d random which the symbols follow the definitions in (28). From Fig.
11

40 40

35
35
30
30
f 25 
e Œ
d ‹
c Š
b 20 ‰ 25
a ˆ
` ‡
_ † Ž‘’“”•
15
20 –—˜™š›œž

10 Ÿ ¡¢£¤¥¦

ghijklmn §¨©ª«¬­®¯°±²³´µ
15
5 opqrstuvw ¶·¸¹º»¼½¾¿ÀÁÂÃ

xyz{|}~€‚ƒ„ ÄÅÆÇÈÉÊËÌ

0 10
0 5 10 15 20 25 30 3 4 5 6 7
VWXYZ[\]^
Mi (i=1, 2)

(a)
Fig. 6: Convergence behavior of various joint optimization
methods. (T = 5000, K = 4, M1 = M2 = 6, N1 = N2 =
8, N̂1 = N̂2 = 16, σ 2 = 0) 38
ÕÖרÙÚÛÜ

36 ÝÞßàáâãäå
æçèéêëìí
34 îïðñòóôõö÷øùúûü
4, it can be seen that cTKSVD exhibits stable convergence 32
ýþÿS 

behavior with different parameters. It converges to a lowest Ô 


Ó
Ò 30
ARE with γ = 1/64 when Mi = 7 and the optimal γ is Ñ
Ð
1/128 when Mi = 3. The reconstruction MSE values are also Ï
Î
28
Í
shown in the legend, which are similar to each other but reveal 26
the same optimal choice of γ as described. Thus the optimal γ
is lower when the number of measurements decreases, which 24

is consistent with the observation in [18]. In both experiments, 22


cTKSVD with the optimal γ outperforms cKSVD in terms of
20
ARE and MSE. 1 2 3 4 5
Then the MSE performance of dictionaries learned by σ2
cTKSVD is compared with that of cKSVD [18] and KHOSVD (b)
[32] when the number of training sequences T and the noise
variance σ 2 vary. We use γ = 1/64 for cTKSVD and γ = 1/32 Fig. 7: PSNR performance of different methods when (a)
for cKSVD. To see the benefit of coupling sensing matrices, we Mi (i = 1, 2) varies (σ 2 = 0), (b) σ 2 varies (M1 = M2 = 6).
also evaluate the uncoupled version of the proposed approach, (T = 5000, K = 4, M1 = M2 = 6, N1 = N2 = 8, N̂1 =
i.e., TKSVD, in the experiments. The results can be found in N̂2 = 16)
Fig. 5. It is observable that cTKSVD outperforms all the other
methods in terms of the reconstruction MSE. The sensing-
matrix-coupled approaches (cKSVD and cTKSVD) are supe-
rior to the uncoupled approaches (TKSVD and KHOSVD). 25 patches from each of the 200 images in a training set
The TKSVD method leads to smaller MSE compared to from the Berkeley segmentation dataset [46]. The test data
KHOSVD, as it fully exploits the multidimensional structure. is obtained by extracting non-overlapping 8 × 8 patches from
In addition, since cKSVD is not an approach that explicitly the other 100 images in the dataset. A 2D Discrete Cosine
considers a multidimensional dictionary, it requires longer Transform (DCT) is employed to initialize the dictionaries
training sequences to learn the multilinear structure from the Ψi ∈ R8×16 (i = 1, 2) and i.i.d Gaussian matrices are used as
vectorized data. As seen in Fig. 5 (a), to achieve a MSE of 0.02, the initial sensing matrices Φi ∈ RMi ×8 (i = 1, 2). Random
cTKSVD only needs 2000 training data; while approximately Gaussian noise with variance σ 2 is added to the measurements
6000 is required for the cKSVD approach. For the same reason, at the test stage. We employ TOMP for reconstruction and the
the performance of cKSVD degrades dramatically when the Peak Signal to Noise Ratio (PSNR) is used as the evaluation
training data is less than 1000. criteria.
In the first experiment, we examine the convergence be-
havior of Algorithm 3 when the proposed approach I and II
C. TCS with Jointly Optimized Sensing Matrix and Dictionary are utilized for the sensing matrix optimization step (respec-
Now we examine the performance of the proposed joint tively denote by I + cTKSVD and II + cTKSVD). We take
optimization approach in Algorithm 3. The training data con- M1 = M2 = 6 and no noise is added to the measurements
sists of 5000 8 × 8 patches obtained by randomly extracting at the test stage, i.e., σ 2 = 0. By conducting the simulations
12

performed previously to obtain the results in Fig. 1 and 4, the


parameters are chosen as: α = 3, β = 0.8, γ = 1/8. The
step size for II + cTKSVD is set as: η = 1e − 5. The PSNR
performance for different numbers of iterations is illustrated in
Fig. 6. Since Sapiro’s approach in [18] also jointly optimizes
the sensing matrix and dictionary, we include it in this figure
(denoted by Sapiro’s + cKSVD). The parameter γ is optimal at
1/2 for cKSVD under our settings. However, note that Sapiro’s
approach is only for vectorized signals in the conventional
CS problem, i.e., a single sensing matrix Φ ∈ R36×64 and
a dictionary Ψ ∈ R64×256 are obtained. It is not suitable
for a practical TCS system, where separable multidimensional
sensing matrices Φi ∈ R6×8 (i = 1, 2) are required. Even so,
from Fig. 6, we can see the proposed approaches outperform
Sapiro’s approach. All the methods converge in less than 10
iterations, among which II + cTKSVD leads to the highest
PSNR value.
Then the proposed approaches are compared with various
other approaches when the number of measurements (Mi (i =
1, 2)) and the noise variance (σ 2 ) vary. Specifically, using the
notation employed previously and by denoting the method of
combining sensing matrix design with that of the dictionary
learning using a “+”, the methods for comparison are: II +
TKSVD, Gaussian + cTKSVD, Sapiro’s + cKSVD and SS Fig. 8: Reconstruction example when M1 = M2 = 6. The
+ KHOSVD. In these approaches, II + TKSVD and SS + images from left to right, top to bottom and their PSNR
KHOSVD are uncoupled methods; Gaussian + cTKSVD does (dB) values are: II+cTKSVD (35.41), I+cTKSVD (34.97),
not involve sensing matrix optimization; Sapiro’s + cKSVD is Sapiro’s+cKSVD (33.64), II+TKSVD (33.57), SS+KHOSVD
for conventional CS system only. (28.62), Gaussian+cTKSVD (28.05).
The results are shown in Fig. 7. We can see that the proposed
approaches obtain higher PSNR values than all of the other
methods and II + cTKSVD performs best. To see the gain
in the literature. Further gain is obtained by coupling the
of coupling sensing matrices during dictionary learning and
multidimensional sensing matrix while learning the dictionary.
optimizing the sensing matrices, respectively, we compare II
The performance advantage of the proposed approaches has
+ cTKSVD with II + TKSVD and Gaussian + cTKSVD. For
been demonstrated by experiments using both synthetic data
instance, when Mi = 5, σ 2 = 0, II + cTKSVD has a gain
and real images.
of about 3dB over II + TKSVD and nearly 9dB over Gaus-
sian + cTKSVD. Although Sapiro’s + cKSVD has a similar
performance to ours at some specific settings, it is not for a A PPENDIX A
TCS system that requires multiple separable sensing matrices. P ROOF OF T HEOREM 3
Examples of reconstructed images using these methods are
demonstrated in Fig. 8 and 9 with the corresponding PSNR T
Assume Ai = Φi Ψi = UAi [ ΛAi 0 ] VA i
is an SVD
values listed. All of the conducted simulations verify that of Ai for i = 1, 2 and rank(Ai ) = Mi . Then the objective
the proposed methods of multidimensional sensing matrix and we want to minimize in (17) can be rewritten as:
dictionary optimization improve the performance of a TCS
system.
Λ2A2 Λ2A1
   
0 T 0 T 2
IN̂ − (VA2 VA ) ⊗ (VA1 VA ) F.
1 N̂2 0 0 2 0 0 1

VI. C ONCLUSIONS
Λ2A2
   2 
In this paper, we propose to jointly optimize the multidimen- 0 ΛA1 0
Denote Σ = ⊗ = diag(ν A2 ⊗
sional sensing matrix and dictionary for TCS systems. To ob- 0 0 0 0
Λ2Ai 0

tain the optimized sensing matrices, a separable approach with ν A1 ), ν Ai = diag( ), then we have
closed form solutions has been presented and a joint iterative 0 0
approach with novel design measures has also been proposed.
T T
The iterative approach certainly has higher complexity, but also ||IN̂1 N̂2 − (VA2 ⊗ VA1 )Σ(VA 2
⊗ VA 1
)||2F . (51)
exhibits better performance. An approach to learning the mul-
tidimensional dictionary has been designed, which explicitly Let ν Ai = [(vi )1 , ..., (vi )Mi , 0]T , then the sub-vector of
takes the multidimensional structure into account and removes the diagonal of Σ containing its non-zero values is: ν̂ =
the redundant updates in the existing multilinear approaches [(v2 )1 (v1 )1 , ..., (v2 )1 (v1 )M1 , ..., (v2 )M2 (v1 )1 , ..., (v2 )M2 (v1 )M1 ]T .
13

tion,” IEEE Trans. Information Theory, Feb. 2006.


[2] D. L. Donoho, “Compressed sensing,” IEEE Trans. Information Theory,
April 2006.
[3] E. J. Candès and M. B. Wakin, “An introduction to compressive
sampling,” IEEE Signal Process. Magazine, vol. 25, no. 2, pp. 21 –30,
March 2008.
[4] M Lustig, D Donoho, and J Pauly., “Sparse MRI: The application of
compressed sensing for rapid MR imaging,” Magn Reson Med, vol. 58,
pp. 1182–1195, 2007.
[5] W. Chen and I.J. Wassell, “Energy-efficient signal acquisition in
wireless sensor networks: a compressive sensing framework,” IET
Wireless Sensor Systems, vol. 2, no. 1, pp. 1–8, March 2012.
[6] D. L. Donoho, M. Elad, and V. N. Temlyakov, “Stable recovery of
sparse overcomplete representations in the presence of noise,” IEEE
Trans. Information Theory, vol. 52, no. 1, pp. 6–18, 2006.
[7] E. J. Candès, “The restricted isometry property and its implications for
compressed sensing,” Comptes Rendus Mathematique, vol. 346, no. 9
- 10, pp. 589 – 592, 2008.
[8] M. Elad, “Optimized projections for compressed sensing,” IEEE Trans.
Signal Process., vol. 55, no. 12, pp. 5695–5702, 2007.
[9] J. Xu, Y. Pi, and Z. Cao, “Optimized projection matrix for compressive
sensing,” EURASIP J. on Advances in Signal Process., vol. 2010, pp.
43, 2010.
[10] W. Chen, M. R. D Rodrigues, and I. Wassell, “Projection design for
Fig. 9: Reconstruction example when M1 = M2 = 4. The statistical compressive sensing: A tight frame based approach,” IEEE
Trans. Signal Process., vol. 61, no. 8, pp. 2016–2029, 2013.
images from left to right, top to bottom and their PSNR
(dB) values are: II+cTKSVD (29.91), I+cTKSVD (29.45), [11] G. Li, Z. Zhu, D. Yang, L. Chang, and H. Bai, “On projection matrix
optimization for compressive sensing systems,” IEEE Trans. Signal
Sapiro’s+cKSVD (28.72), II+TKSVD (26.60), SS+KHOSVD Process., vol. 61, no. 11, pp. 2887–2898, 2013.
(22.62), Gaussian+cTKSVD (21.94). [12] N. Cleju, “Optimized projections for compressed sensing via rank-
constrained nearest correlation matrix,” Applied. Comput. Harmonic
Analysis, vol. 36, no. 3, pp. 495–507, 2014.
[13] K. Engan, S. O. Aase, and J. H. Husøy, “Multi-frame compression:
Thus (51) becomes: Theory and design,” EURASIP J. Signal Process., vol. 80, no. 10, pp.
M2 X
X M1 2121–2140, 2000.
||IN̂1 N̂2 − Σ||2F = N̂1 N̂2 − M1 M2 + (1 − (v2 )p (v)q )2 . [14] M. Aharon, M. Elad, and A. Bruckstein, “The KSVD: An algorithm
p=1 q=1 for designing overcomplete dictionaries for sparse representation,” IEEE
(52) Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, 2006.
Therefore we can obtain that the minimum value of (17) is [15] I. Tošić and P. Frossard, “Dictionary learning: What is the right
representation for my signals,” IEEE Signal Process. Mag., vol. 28,
N̂1 N̂2 − M1 M2 , and that it is achieved when the entries of ν̂ no. 2, pp. 27–38, 2011.
are all unity.
[16] S. K. Sahoo and A. Makur, “Dictionary training for sparse represen-
Clearly ΛAi =IMi for i = 1, 2 is a solution, i.e., tation as generalization of K-means clustering,” IEEE Signal Process.
T
Ai = UAi [ IMi 0 ] VA i
with UAi ∈ RMi ×Mi and Letters, vol. 20, no. 6, pp. 587–590, 2013.
N̂ × N̂
VAi ∈ R i i being arbitrary orthonormal matrices. Then [17] W. Dai, T. Xu, and W. Wang, “Simultaneous codeword optimization
we would like to find Φi (i = 1, 2) such that Φi Ψi = (simco) for dictionary update and learning,” IEEE Trans. Signal
UAi [ IMi 0 ] VA T
. Following the derivation of Theorem Process., vol. 60, no. 12, pp. 6340–6353, 2012.
i
2 in [11], the solution in (18) can be found. [18] J. M. Duarte-Carvajalino and G. Sapiro, “Learning to sense sparse
signals: Simultaneous sensing matrix and sparsifying dictionary opti-
With this solution, for an arbitrary vector z ∈ RN̂i , mization,” IEEE Trans. Image Process., vol. 18, no. 7, pp. 1395–1408,
we have ||ATi z||22 = tr(zT Ai ATi z) = tr(zT z) = ||z||22 , 2009.
which indicates that the resulting equivalent sensing matrices [19] W. Chen and M. R. D. Rodrigues, “Dictionary learning with optimized
Ai (i = 1, 2) are Parseval tight frames. In addition, we projection design for compressive sensing applications,” IEEE Signal
observe that the solution in (18) can be obtained by separately Process. Letters, vol. 20, no. 10, pp. 992–995, 2013.
solving the sub-problems in (19), of which the solutions have [20] H. Bai, G. Li, S. Li, Q. Li, Q. Jiang, and L. Chang, “Alternating op-
been derived in [11]. By substituting the solutions of the sub- timization of sensing matrix and sparsifying dictionary for compressed
problems into (17), we can conclude the minimum remains as sensing,” IEEE Trans. Signal Process., vol. 63, no. 6, pp. 1581–1594,
2015.
N̂1 N̂2 − M1 M2 .
[21] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank
solutions of linear matrix equations via nuclear norm minimization,”
R EFERENCES SIAM Review, vol. 52, no. 3, pp. 471–501, 2010.
[1] E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: [22] E. Candès and B. Recht, “Exact matrix completion via convex
exact signal reconstruction from highly incomplete frequency informa- optimization,” Commun. ACM, vol. 55, no. 6, pp. 111–119, June 2012.
14

[23] M. Golbabaee and P. Vandergheynst, “Compressed sensing of simulta- [44] L. De Lathauwer, B. De Moor, and J. Vandewalle, “A multilinear sin-
neous low-rank and joint-sparse matrices,” arXiv, 2012. gular value decomposition,” SIAM J. on Matrix Analysis. Applications.,
[24] R. Chartrand, “Nonconvex splitting for regularized low-rank+ sparse vol. 21, no. 4, pp. 1253–1278, 2000.
decomposition,” IEEE Trans. Signal Process., vol. 60, no. 11, pp. 5810– [45] E. V. D. Berg and M. P. Friedlander, “Probing the pareto frontier for
5819, 2012. basis pursuit solutions,” SIAM Journal on Scientific Computing, vol.
31, no. 2, pp. 890–912, 2008.
[25] R. Otazo, E. Candès, and D. K. Sodickson, “Low-rank plus sparse
matrix decomposition for accelerated dynamic mri with separation of [46] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human
background and dynamic components,” Mag. Res. in Medicine, vol. 73, segmented natural images and its application to evaluating segmentation
no. 3, pp. 1125–1136, 2015. algorithms and measuring ecological statistics,” in Proc. IEEE ICCV,
2001, vol. 2, pp. 416–423.
[26] M. F. Duarte and R. G. Baraniuk, “Kronecker compressive sensing,”
IEEE Trans. Image Process., vol. 21, no. 2, pp. 494–504, Feb 2012.
[27] N. D. Sidiropoulos and A. Kyrillidis, “Multi-way compressed sensing
for sparse low-rank tensors,” IEEE Signal Process. Letters, vol. 19, no.
11, pp. 757–760, 2012.
[28] S. Friedland, Q. Li, and D. Schonfeld, “Compressive sensing of sparse
tensors,” IEEE Trans. Image Process., vol. 23, no. 10, pp. 4438–4447,
Oct 2014.
[29] C. F. Caiafa and A. Cichocki, “Multidimensional compressed sensing
and their applications,” Wiley Interdisciplinary Reviews: Data Mining
and Knowledge Discovery, vol. 3, no. 6, pp. 355–380, 2013.
[30] C. F. Caiafa and A. Cichocki, “Computing sparse representations of
multidimensional signals using kronecker bases,” Neural Comput., vol.
25, no. 1, pp. 186–220, Jan. 2013.
[31] M. Seibert, J. Wormann, R. Gribonval, and M. Kleinsteuber, “Separable
cosparse analysis operator learning,” in Proc. EUSIPCO, 2014, pp. 770–
774.
[32] F. Roemer, G. D. Galdo, and M. Haardt, “Tensor-based algorithms
for learning multidimensional separable dictionaries,” in Proc. IEEE
ICASSP, 2014, pp. 3963–3967.
[33] Y. Peng, D. Meng, Z. Xu, C. Gao, Y. Yang, and B. Zhang, “De-
composable nonlocal tensor dictionary learning for multispectral image
denoising,” in proc. IEEE CVPR, 2014, pp. 2949–2956.
[34] M.F. Duarte, M.A. Davenport, D. Takhar, J.N. Laska, Ting Sun, K.F.
Kelly, and R.G. Baraniuk, “Single-pixel imaging via compressive
sampling,” IEEE Signal Process. Magazine, vol. 25, no. 2, pp. 83 –91,
March 2008.
[35] R. F. Marcia, Z. T. Harmany, and R. M. Willett, “Compressive coded
aperture imaging,” SPIE 7246, Comput. Imag. VII, p. 72460G, 2009.
[36] V. Majidzadeh, L. Jacques, A. Schmid, P. Vandergheynst, and
Y. Leblebici, “A (256x256) pixel 76.7mW CMOS imager/ compressor
based on real-time in-pixel compressive sensing,” in in Proc. IEEE
ISCAS, June 2010, pp. 2956 –2959.
[37] E.J. Candès and T. Tao, “Decoding by linear programming,” IEEE
Trans. Information Theory, vol. 51, no. 12, pp. 4203 – 4215, Dec.
2005.
[38] J.A. Tropp and A.C. Gilbert, “Signal recovery from random mea-
surements via orthogonal matching pursuit,” IEEE Trans. Information
Theory, vol. 53, no. 12, pp. 4655 –4666, Dec. 2007.
[39] T. Blumensath and M. E. Davies, “Iterative hard thresholding for
compressed sensing,” Applied and Computational Harmonic Analysis,
vol. 27, no. 3, pp. 265–274, 2009.
[40] Y. Rivenson and A. Stern, “Compressed imaging with a separable
sensing operator,” IEEE Signal Process. Letters, vol. 16, no. 6, pp.
449–452, June 2009.
[41] Y. Rivenson and A. Stern, “Practical compressive sensing of large im-
ages,” in Digital Signal Processing, 2009 16th International Conference
on, July 2009, pp. 1–8.
[42] W. Chen, M. R. D. Rodrigues, and I. J. Wassell, “On the use of
unit-norm tight frames to improve the average mse performance in
compressive sensing applications,” IEEE Signal Process. Letters, vol.
19, no. 1, pp. 8–11, 2012.
[43] C. F. Van Loan, “The ubiquitous Kronecker product,” J. comput. applied
mathematics, vol. 123, no. 1, pp. 85–100, 2000.

You might also like