Analysis of Compressed Sensing With Spatially-Coupled Orthogonal Matrices
Analysis of Compressed Sensing With Spatially-Coupled Orthogonal Matrices
Abstract—Recent development in compressed sensing (CS) has complex if the size of the signal is very large. This is not only
revealed that the use of a special design of measurement matrix, because AMP still requires many matrix multiplications up to
namely the spatially-coupled matrix, can achieve the information- order of O(M N ) but also because it requires many memory
theoretic limit of CS. In this paper, we consider the measurement
matrix which consists of the spatially-coupled orthogonal matri- to store the measurement matrix. It is therefore of great inter-
arXiv:1402.3215v1 [cs.IT] 13 Feb 2014
ces. One example of such matrices are the randomly selected est to consider some special measurement matrix permitting
discrete Fourier transform (DFT) matrices. Such selection enjoys faster multiplication procedure and less memory complexity.
a less memory complexity and a faster multiplication procedure. Randomly selected discrete Fourier transform (DFT) matrices
Our contributions are the replica calculations to find the mean- are one such example [12, 13]. Using DFT as the measurement
square-error (MSE) of the Bayes-optimal reconstruction for such
setup. We illustrate that the reconstruction thresholds under the matrix, fast Fourier transform (FFT) can be used to perform the
spatially-coupled orthogonal and Gaussian ensembles are quite matrix multiplications down to the order of O(N log2 N ) and
different especially in the noisy cases. In particular, the spatially the measurement matrix is not required to be stored. However,
coupled orthogonal matrices achieve the faster convergence rate, in contrast to the case with random matrices with independent
the lower measurement rate, and the reduced MSE. entries, there are only a few studies on the performance of CS
for matrices with row-orthogonal ensemble [5, 12–17].
I. I NTRODUCTION Analysis in [5] revealed that the `1 -reconstruction thresholds
are the same under all measurement matrices that are sampled
Compressed sensing (CS) is a signal processing technique from the rotationally invariant matrix ensembles. In addition,
that aims to reconstruct a sparse signal with a higher dimension along the line of `1 -reconstruction, the authors in [15] showed
(N ) space from an underdetermined lower dimension (M ) that the gain in performance using concatenation of random
measurement space, with the measurement ratio α = M/N as orthogonal matrices is only specific to signal with non-uniform
small as possible. In the literature, the `1 -norm minimization is sparsity pattern. In a different context, [14] also showed that
the most widely used scheme in signal reconstruction because a general class of free random matrices incur no loss in the
the `1 -norm minimization is convex and hence can be solved noise sensitivity threshold if optimal decoding is adopted. The
very efficiently [1–3]. However, the measurement ratio of the empirical study in [6] illustrated that the reconstruction ability
`1 -reconstruction for a perfect reconstruction is required to be of AMP is universal with respect to different matrix ensem-
sufficiently larger than the information-theoretic limit [4, 5]. bles. Furthermore, by empirical studies, [12] found that using
If the probabilistic properties of the signal are known, then DFT matrices does alter the AMP performance but will not
the probabilistic Bayesian inference offers the optimal recon- affect the final performance significantly. With these studies,
struction in the minimum mean-square-error (MSE) sense, but one might conclude that the reconstruction ability would be
the optimal Bayes estimation is not computationally tractable. nearly universal with respect to different measurement matrix
By using belief propagation (BP), an efficient and less complex ensembles. However, counter evidences appear recently when
alternative, referred to as approximate message passing (AMP) the measurement is corrupted by additive noise [16, 17]. They
[6–8], has recently emerged. A remarkable result by Krzakala argued the superiority of the row-orthogonal ensembles over
et al. [8, 9] showed that a sparse signal can be recovered up to independent Gaussian ensembles in noisy setting.2 Therefore,
its information theoretical limit if the measurement matrix has it is not fully understood how the measurement matrix with
some special structure, namely spatially-coupled.1 Roughly row-orthogonal ensemble affects the CS performance.
speaking, spatially-coupled matrices are random matrices with
In this paper, our aim is to fill this gap by investigating the
a band diagonal structure as shown in Fig. 1. The authors of
MSE in the optimal Bayes inference of the sparse signals if the
[8, 9] support this claim using an insightful statistical physics
measurement matrix consists of the spatially-coupled orthogo-
argument. This claim has been proven rigorously by [11].
nal matrices. In particular, by using the replica method, we get
Though AMP is less complex than the Bayes-optimal ap- the state evolution of the MSE for CS with spatially-coupled
proach, the implementation of AMP will become prohibitively orthogonal matrices. Based on the derived state evolution, we
C.-K. Wen is with the Institute of Communications Engineering, National are able to observe closer behaviors regarding the CS with
Sun Yat-sen University, Kaohsiung, Taiwan. E-mail: [email protected]. orthogonal matrices. Several interesting observations will be
K.-K. Wong is with the Department of Electronic and Electrical Engineer- made via the statistical physics argument in [8].
ing, University College London, London, WC1E 7JE, United Kingdom. E-
mail: [email protected].
1 The idea of spatial coupling was first introduced in the CS literature by 2 In fact, the significance of orthogonal matrices under other problems (e.g.,
[10], where some limited improvement in performance was observed. CDMA and MIMO) was pointed out in [18, 19].
2
As a summary, our finding is that the reconstruction thresh- III. A NALYTICAL R ESULT
olds under the row orthogonal and i.i.d. Gaussian ensembles Before proceeding, it is useful to understand the posterior
are quite different especially in noisy scenarios. The construc- mean estimator (3) by revisiting a scalar single measurement
tion thresholds seem universal only in a very low noise vari- 1
ance regime. In the higher noisy variance, the reconstruction y = x + ς − 2 z. (6)
thresholds of the row-orthogonal ensemble are significantly This is a special case of (1) with M = N = 1. According to
lower than those of the i.i.d. Gaussian ensemble. In addition, (3), MMSE is achieved by the conditional expectation
we notice that the case with the spatially coupled orthogonal Z
matrices enjoys 1) the faster convergence rate, 2) the lower E{x|y} = xp(x|y)dx, (7)
measurement rate, and 3) the lower MSE result.
2
where p(x|y) = p(y|x)p(x)/p(y) and p(y|x) = πς e−ς|y−x| .
II. P ROBLEM F ORMULATION Note that x̂(y) changes with y while we will suppress y for
brevity. Finally, we define mmse(·) of this setting as
We consider the noisy CS problem n o
2
y = Ax + σz, (1) mmse(ς) , E |x − E{x|y}| , (8)
where y ∈ CM is a measurement vector, A ∈ CM ×N denotes in which the expectation is taken over the joint conditional
a known measurement matrix, x ∈ CN is a signal vector, z ∈ distribution p(y, x) = p(y|x)p(x).
CM is the standard Gaussian noise vector, and σ represents the Explicit expressions of mmse are available for some special
noise magnitude. We denote by α = M/N the measurement signal distributions. For example, if the signal distribution
ratio (i.e., the number of measurements per variable). p(x) follows the Bernoulli-Gaussian (BG) density, i.e., g is
The spatially-coupled matrix A used in this paper follows the standard complex Gaussian distribution, then we have
that in [8], see Fig. 1. The N components of the signal vector ρN (y; 1 + ς −1 ) y
x are split into Lc blocks of Np variables for p = 1, . . . , Lc . E{x|y} = ,
ρN (y; 1 + ς −1 ) + (1 − ρ)N (y; ς −1 ) 1 + ς −1
We denote γp = Np /N . Next, we split the M components of (9)
the measurements y into Lr blocks of Mq measurements, for where N (y; c) denotes a Gaussian probability density func-
q = 1, . . . , Lr . As a result, A is composed of Lr × Lc blocks. tion (pdf) with zero mean and variance c, i.e., N (y; c) ,
Each block Aq,p ∈ CMq ×Np is obtained by randomly selecting 2
1/(πc)e−|y| /c . Then we obtain explicitly
and
p re-ordering from the standard DFT matrix multiplied by
ρ2 ς |z|2
Z
Jq,p γp .3 The measurement ratio of (q, p)-group is αq,p = mmse(ς) = ρ − Dz , (10)
Mq /Np . We have an Lr × Lc coupling matrix J , [Jq,p ]. ς +1 ρ + (1 − ρ)e−|z|2 ς (ς + 1)
CS aims to reconstruct x from y. We suppose that each where Dz , d<zd=z
2
e−|z| with <z and =z being the real and
π
entry of x is generated from a distribution P (x) independently. imaginary parts of z, respectively.
In particular, the signals are sparse where the fraction of non- Although the analytical result of mmse of the scalar mea-
zero entries is ρ and their distribution is g. That is, surement is available, the task of obtaining the corresponding
N
Y N
Y result to the vector case (1) might appear daunting. Surpris-
P (x) = P (xn ) = (1 − ρ)δ(xn ) + ρg(xn ) . (2) ingly, tools from statistical mechanics enable such develop-
n=1 n=1 ment in large system limits. The key for finding the statistical
3 The
p properties of (3) is through the average free entropy [8]
standard DFT matrix has been normalized by 1/ Np . However, we
assume that the components of Aq,p have variance Jq,p /N . To this end, the 1
factor γp = Np /N is used to adjust the normalization in each block. F, Ey,A [log Z(y, A)] , (11)
N
3
where h 1 2
i where
Z(y, A) , Ex e− σ2 ky−Axk (12) 1
Λ(t)
q,p = (t)
(t−1)
− ςq,p , (18a)
is the partition function. The similar approach also has been εp
γ J
used under different settings, e.g., [8, 14, 15, 17]. The analysis αq,p p (t)q,p
Λq,p
of (11) is still difficult. The major difficulty in (11) lies in ∆(t)
q,p , PLc γl Jq,l . (18b)
2
σ + l=1 (t)
the expectations over y and A. We can, nevertheless, greatly Λq,l
facilitate the mathematical derivation by rewriting F as (t) (t)
(t) Λq,p ∆q,p
1 ∂ ςq,p = (t)
. (18c)
F= lim log Ey,A [Z r (y, A)] , (13) 1 − ∆q,p
N r→0 ∂r
in which we have moved the expectation operator inside the As t → ∞ (i.e., in thermodynamic), {ςq,p , Λq,p , ∆q,p } will
log-function. We first evaluate Ey,A [Z r (y, A)] for an integer- converge to values which locally maximize the free entropy.
valued r, and then generalize it for any positive real number Proof: The state evolution of the Bayes-optimum corre-
r. This technique is called the replica method [21], and has sponds to the steepest ascent of the free entropy (14).
been widely adopted in the field of statistical physics [22]. If the ensembles of {Aq,p } are Gaussian, the free entropy
shares the same form as (14) while G should be replaced by
Lc
!
In the analysis, we use the assumptions that Np → ∞, for X γp Jq,p εq,p
GGaussian ({εq,p }) , −αq,p γp log 1 + .
all p = 1, . . . , Lc , and Mq → ∞, for all q = 1, . . . , Lr , while σ2
p=1
keeping Mq /Np = αq,p fixed and finite. For convenience, we (19)
refer to this large dimensional regime simply as N → ∞. In that case, the evolution of the MSE εp in each block p also
(t)
follows the same form as (17) while ςq,p should be [8]
Under the assumption of replica symmetry, the following αq,p γp Jq,p
(t)
results are obtained. ςq,p = PLc . (20)
σ2 + l=1 γl Jq,l εq,p
It is evident that the evolution of the MSE for row-orthogonal
Proposition 1: As N → ∞, the free entropy is given by matrices instead of random ones indeed alters the performance
Lc n n oo of the Bayes-optimal reconstruction.
X 2
F({ςq,p , εq,p }) = γp Eyp log Exp e−ςp |yp −xp |
p=1 IV. D ISCUSSIONS
Lr X
Lc Lr
X X To better understand the relation between the free entropy
+ γp εq,p ςq,p + G({εq,p }) + (1 − α), (14) and the MSE using the Bayes-optimal reconstruction, let us
q=1 p=1 q=1
first consider the simplest case without the spatially-coupled
PLr
where ςp , q=1 ςq,p , the outer expectation Eyp {·} is taken matrix, i.e., Lc = Lr = 1 and J1,1 = 1. For notational
over the joint conditional distribution convenience, we refer ς1,1 and ε1,1 to ς and ε, respectively. In
ς 0 2
Fig. 2, we plot the free entropy F(ε) for a signal of density
p(yp , x0p ) = e−ς|yp −xp | p(x0p ), (15) ρ = 0.4, variance of noise σ 2 = 10−4 , and different values of
π
the measurement rate α. In maximizing F(ε) with respect to
and
ε, one obtains the Bayes-optimal MSE. In particular, the state
evolution in Proposition 2 performs a steepest ascent in F(ε)
( Lc
!
X γp Jq,p
G({εq,p }) , Extr − αq,p γp log 1 + and gets trapped at one of the local maximas according to the
{Λq,p } σ 2 Λq,p
p=1 initial value of ε. Nonetheless, the only initialization that is
Lc
)
X algorithmically possible, e.g., BP, is starting from a large value,
+ γp (Λq,p εq,p − log Λq,p εq,p − 1) , (16) e.g., ε(0) = ρ (or larger). This means that BP converges to the
p=1 much higher MSE than the MSE corresponding to the global
where ExtrX {· · · } denotes the extremization with respect to maximum of the free entropy. We can observe a phenomenon
X. The quantities {ςq,p , εq,p } are chosen to maximize (14). similar to that describes in [8] for Gaussian i.i.d. matrices.
Following [8], we define three phase transitions as follows:
• αd is defined as the largest α for which F(ε) has two
Proof: The proposition can be obtained by applying the
local maximas.
techniques in [17, 18] after additional manipulations.
• αs is defined as the smallest α for which F(ε) has two
local maximas.
Proposition 2: The asymptotic evolution of the MSE εp in • αc is defined as the value α for which the two maximas
each block p is given by of F(ε) are identical.
XLr
! From Fig. 2, we see that αd > αc > αs . If α > αd , then the
εp(t) = mmse (t−1)
ςq,p (17) global maximum of F(ε) is the only maximum which is at the
q=1 smaller value of MSE comparable to σ 2 . This means that BP
4
0.3
From the above, we have seen three kinds of phase transition
behavior as a function of α for a given σ 2 . The three phase
0.2
transitions are expected to depend upon σ 2 . In Fig. 3, we plot
the dependence of αd , αc , and αs on the noise variance under
0.1 row-orthogonal and i.i.d. Gaussian ensembles. For the larger
noise variance, both of the ensembles appear to have no such
0 sharp threshold among αd , αc , and αs . We observe that in very
F(")
Based on this observation and the knowledge that for the noise-
free case, we conclude that the construction threshold seems
-0.2 ®d = 0:5145
universal only with very low noise variance. Nonetheless, with
® = 0:4900
higher noisy variance, we can see the superiority of the row-
-0.3 ®c = 0:4740 orthogonal ensembles to the i.i.d. Gaussian ensembles in terms
®s = 0:4410 of the following three perspectives. First, the BP threshold
-0.4 -6
10
-5
10 10
-4
10
-3
10
-2 -1
10
αd of the row-orthogonal ensemble is lower than that of the
" i.i.d. Gaussian ensemble. Secondly, for region of σ 2 & 0.0013,
Fig. 2. The free entropy F (ε) for Lc = Lr = 1, measurement rate ρ = there is no sharp phase transition under the i.i.d. Gaussian
0.4 and noise level σ 2 = 10−4 under different value of density ρ. Marks ensemble, while the region is extended to σ 2 & 0.0025 under
correspond to the free entropy which in principle can be achieved by the BP the row-orthogonal ensemble. Finally, the αc transition line
algorithm.
under the row-orthogonal ensemble is much lower than that
under the i.i.d. Gaussian ensemble. This implies that with a
-3
proper spatially-coupled matrix A, the BP algorithm under the
x 10
2.5 row-orthogonal ensemble can achieve the good MSE down to
®d the lower measurement rate in the noisy case.
row- In Fig. 4, we plot the MSE achieved by the BP algorithm
®c orthogonal
2
®s as a function of the noise variance σ 2 when the measurement
rates are αd in Fig. 3. As can be observed, the row-orthogonal
ensemble even achieves the lower MSE than the i.i.d. Gaussian
1.5 i.i.d. ensemble. This together with the results from Fig. 3 indicates
Gaussian
that the row-orthogonal ensemble not only allows the lower
¾2
-1
10
paper can serve as an efficient way to design spatially-coupled
n ogonal
orthogonal matrices that have good performance.
ssia row-orth
Gau
i.i.d.
-2
10
MSE = ¾2 R EFERENCES
-3
10 [1] E. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans.
Inform. Theory, vol. 51, no. 12, pp. 4203–4215, Dec. 2005.
[2] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52,
MSE
-4
10 no. 4, pp. 1289–1306, Apr. 2006.
[3] E. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact
signal reconstruction from highly incomplete frequency information,”
-5
10 IEEE Trans. Inf. Theory, vol. 52, no. 8, pp. 489–509, Aug. 2006.
[4] D. Donoho and J. Tanner, “Counting faces of randomly projected
-6
polytopes when the projection radically lowers dimension,” Journal of
10 the American Mathematical Society, vol. 22, no. 1, pp. 1–53, 2009.
[5] Y. Kabashima, T. Wadayama, and T. Tanaka, “A typical reconstruction
-7
limit for compressed sensing based on lp-norm minimization,” J. Stat.
10
0 0.5 1 1.5 2 2.5 Mech., no. 9, p. L09003, 2009.
-3
¾2 x 10 [6] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing algo-
rithms for compressed sensing,” Proceedings of the National Academy
Fig. 4. The MSE achieved by the BP algorithm versus the noise variance of Sciences, 2009.
σ 2 . The measurement rate is at the BP threshold αd . The block solid line [7] S. Rangan, “Generalized approximate message passing for estimation
(lowermost) marks the noise level which is the re-constructibility limit. with random linear mixing,” preprint, 2010. [Online]. Available:
http://arxiv.org/abs/1010.5141v2.
[8] F. Krzakala, M. Mézard, F. Sausset, Y. Sun, and L. Zdeborová,
0.4 “Probabilistic reconstruction in compressed sensing: algorithms, phase
i.i.d. Gaussian diagrams, and threshold achieving matrices,” J. Stat. Mech., vol. P08009,
0.35 2012.
row-orthogonal
[9] ——, “Statistical physics-based reconstruction in compressed sensing,”
0.3 Phys. Rev. X, vol. 021005, 2012.
[10] S. Kudekar and H. Pfister, “The effect of spatial coupling on compressive
0.25 sensing,” in in Communication, Control, and Computing (Allerton),
2010, pp. 347–353.
MSE
0.2
[11] D. L. Donoho, A. Javanmard, and A. Montanari, “Informationtheoreti-
cally optimal compressed sensing via spatial coupling and approximate
message passing,” in in Proc. of the IEEE Int. Symposium on Information
0.15
Theory (ISIT), 2012.
[12] J. Barbier, F. Krzakala, and C. Schülke, “Compressed sensing
0.1
and Approximate Message Passing with spatially-coupled Fourier
and Hadamard matrices,” preprint 2013. [Online]. Available:
0.05 http://arxiv.org/abs/1312.1740.
[13] A. Javanmard and A. Montanari, “Subsampling at information theo-
0 retically optimal rates,” in in Proc. of the IEEE Int. Symposium on
0 10 20 30 40 50 60 70 80
Iterations Information Theory (ISIT), 2012, pp. 2431–2435.
[14] A. M. Tulino, G. Caire, S. Verdú, and S. Shamai, “Support recovery
with sparsely sampled free random matrices.”
Fig. 5. Evolution of the MSE in each block in the noisy case with σ 2 = [15] Y. Kabashima, M. M. Vehkaper a, and S. Chatterjee, “Typical l1-
10−6 . We use the seeding matrix with W = 2, L = 10, αseed = 0.70,
recovery limit of sparse vectors represented by concatenations of random
αbulk = 0.49, and J = 0.5.
orthogonal matrices,” J. Stat. Mech., vol. 2012, no. 12, p. P12003, 2012.
[16] Y. Kabashima and M. Vehkapera, “Signal recovery using expectation
consistent approximation for linear observations,” preprint, 2014.
[Online]. Available: http://arxiv.org/abs/1401.5151.
development of the AMP algorithm that corresponds to the
[17] M. Vehkaper a, Y. Kabashima, and S. Chatterjee, “Analysis
evolution of MSE in Proposition 2 is under way. of regularized LS reconstruction and random matrix ensembles
in compressed sensing,” preprint 2013. [Online]. Available:
V. C ONCLUSION http://arxiv.org/abs/1312.0256.
[18] K. Takeda, S. Uda, and Y. Kabashima, “Analysis of CDMA systems that
We have derived the MSE in the optimal Bayes inference are characterized by eigenvalue spectrum,” Europhysics Letters, vol. 76,
of the sparse signals if the measurement matrix consists of the pp. 1193–1199, 2006.
spatially-coupled orthogonal matrices. The analysis provides a [19] A. Hatabu, K. Takeda, and Y. Kabashima, “Statistical mechanical anal-
ysis of the Kronecker channel model for multiple-input multipleoutput
step towards understanding of the behaviors of the CS with
wireless communication,” Phys. Rev. E, vol. 80, p. 061124(1V12), 2009.
orthogonal matrices. In particular, the numerical results have [20] H. V. Poor, An Introduction to Signal Detection and Estimation. New
revealed that the spatially coupled row-orthogonal matrices York: Springer-Verlag, 1994.
enjoy the faster convergence rate, the lower measurement rate, [21] S. F. Edwards, and P.W. Anderson, “Theory of spin glasses,” J. Physics
and the lower MSE result. In addition, we remark that a F: Metal Physics, vol. 5, pp. 965–974, 1975.
[22] H. Nishimori, Statistical Physics of Spin Glasses and Information
way to design a spatially-coupled matrix with row-orthogonal
Processing: An Introduction. ser. Number 111 in Int. Series on
ensemble should be different from that with Gaussian en- Monographs on Physics. Oxford U.K.: Oxford Univ. Press, 2001.
semble especially in noisy cases. The derived results in this