A Generic Proximal Algorithm For Convex Optimization - Application To Total Variation Minimization
A Generic Proximal Algorithm For Convex Optimization - Application To Total Variation Minimization
2014
beyond, it is not possible to manipulate, at every iteration g(x) proxτ g (x) ∇g(x)
of an algorithm, matrices of size N × N , like the Hessian 0 x 0
of a function. So, proximal algorithms, which only exploit ıΩ (x) PΩ (x)
N
first-order information of the functions, are often the only ı(R+)N (x) max{xn , 0} n=1
viable way to solve (2). In this paper, we place ourselves in kxk1 = N
P N
sgn(xn ) max{|xn | − τ, 0} n=1
n=1 xn
the case where it is not possible to apply, at every iteration
ı{x′ ; Ax′ =y} (x) x + A† (y − Ax)
of an algorithm, an inverse operator like (Id + L∗m Lm )−1 ,
which amounts to solving a linear system, or operators like
1
2
kAx − yk2 (Id + τ A∗ A)−1 (x + τ A∗ y) A∗ (Ax − y)
proxhm ◦Lm or proxf +g . Also, we exclude nested strategies, hAx, yi = hx, A∗ yi x − τ A∗ y A∗ y
which consist in solving iteratively an optimization problem
1
2
hAx, xi (Id + τ A)−1 x Ax
at every iteration, as this raises theoretical and practical Fig. 2. Expressions of the gradient and proximity operator of a convex
convergence issues [14]. Thus, we propose two new proximal function g : X → R ∪ {+∞}, in some classical cases. In the tables, x
is an arbitrary element of some real Hilbert space X , τ a positive real, y an
algorithms to solve the problem (2) by full splitting; that is, at arbitrary element of some real Hilbert space Y, A : X → Y a linear operator,
every iteration, the only operations involved are evaluations of A† its Moore–Penrose pseudoinverse, Ω a subset of X .
∇f , proxg , proxhm , Lm , or the adjoint operators L∗m . Thus,
it is required that these evaluations are “simple”; that is, that
they can be performed in time like O(N ) or O(N log(N )), Theorem 1. Suppose that the parameters in Algorithm 1 or
where N is the dimension of the ambiant space X or Um . Algorithm 2 satisfy:
β M
The paper aims at making known the optimization algo-
X
(i) τ + σ
L∗m Lm
< 1, where the Lipschitz con-
rithms proposed by the author in [15], to the signal and 2 m=1
image processing community. Thus, the two algorithms are stant β is defined in (3) and k · k is the operator norm.
presented in Sect. II in a slightly simplified form, along with
(ii) ρ ∈ ]0, 1].
the corresponding convergence results, but the mathematical
proofs are omitted. The relationship of the algorithm with Then both sequences (x̃(i) )i∈N and (x(i) )i∈N generated by
other methods of the literature is discussed. In Sect. III, we Algorithm 1 or Algorithm 2 converge (weakly if X has infinite
detail as a proof of concept the application to inverse imaging dimension) to an element x̂ ∈ X solution of the problem (2).
problems regularized by the total variation.
Theorem 2. Suppose that f = 0, that the spaces X and Um
have finite dimension, and that the parameters in Algorithm 1
II. P ROPOSED A LGORITHMS AND C ONVERGENCE or Algorithm 2 satisfy:
A NALYSIS
X M
(i) τ σ
L∗m Lm
≤ 1.
The two proposed algorithms are in Fig. 1. Note that we
make use of h∗m ∈ Γ0 (Um ), the Fenchel–Rockafellar conjugate m=1
of hm , which satisfies the useful Moreau identity: for every (ii) ρ ∈ ]0, 2[.
u ∈ Um and real σ > 0, proxσh∗m (u) = u−σ proxhm /σ (u/σ). Then both sequences (x̃(i) )i∈N and (x(i) )i∈N generated by
Some classical expressions of the gradient and proximity op- Algorithm 1 or Algorithm 2 converge to an element x̂ ∈ X
erator are provided in Fig. 2. The two algorithms behave very solution of the problem (2).
similarly in practice; which one is the most appropriate for a
particular problem depends on the way they are implemented, We refer to the mathematical article [15] for a more detailed
since the different place of the extrapolation step 2˜◦(i+1) −◦(i) study of the convergence conditions, including error terms
can lead to different memory storage requirements. and variable relaxation parameters. Moreover, the proposed
(i) (i)
The convergence results, whose proofs can be found in [15], approach is primal-dual, in the sense that (u1 , . . . , uM )
are the following: converges to an element (û1 , . . . , ûM ) ∈ U1 × · · · × UM
IEEE SIGNAL PROC. LETTERS, VOL. 21, NO. 8, PP. 985–989, AUG. 2014
solution to the dual problem of (2). Qualification constraints grayscale images of size Nh columns times Nv rows, endowed
other than the ones given in Sect. 1 can be found [16]. Also, with the usual Euclidean inner product. We want to restore or
the problem (2) and the algorithms can be generalized to reconstruct an image x̂ by solving
include infimal convolutions, or to solve monotone inclusions
instead of optimization problems [17]. Finally, the metric can Find x̂ ∈ arg min 1
2 kAx − yk2 + λ.TV(x), (4)
x∈Ω
be changed, with potential acceleration [18], [19].
where
A. Particular Cases and Related Work • y, which lives in a real Hilbert space Y, represents the
The most classical splitting methods to minimize the sum available data.
of two functions are the forward-backward and Douglas– • A : X → Y is the linear operator modeling the
Rachford algorithms, see [13] and references therein. In our acquisition process.
notations, they allow to minimize f (x)+g(x) and g(x)+h(x), • Ω is a closed and convex subset of X .
respectively. Until recently, there was no convenient way of • λ > 0 is a tradeoff parameter to tune, depending on the
solving a problem like (2) with nontrivial linear operators Lm . properties of A and the noise level.
A step forward regarding this issue was made in [20]: the The discrete total variation, denoted by TV in (4), is defined as
Chambolle–Pock algorithm allows to minimize g(x) + h(Lx). follows. We define the discrete gradient operator D : X → X 2 ,
Several other algorithms have been proposed recently for which maps an image x to a pair of images (uh , uv ) with, for
particular instances of the problem (2) [4], [16], [21]–[25], every kh = 1, .. . , Nh , kv = 1, . . . , Nv ,
but only the algorithm in [16] is a rival to ours and allows to uh [kh , kv ] = x[kh , kv ] − x[kh − 1, kv ] if kh ≥ 2, 0 else},
solve the problem (2) in whole generality.
uv [kh , kv ] = x[kh , kv ] − x[kh , kv − 1] if kv ≥ 2, 0 else .
In the case f = 0, the proposed algorithms revert to the Note that kD∗ Dk ≤ 8 [20]. Then we have
ones of Chambolle–Pock, with additional relaxation. Indeed, TV(x) = p kDxk1,2 , where k(uh , uv )k1,2 =
according to Theorem 2, we allow a value ρ close to 2, PNh PNv
u [k , k ] 2 + u [k , k ]2 . Let us set
kh =1 kv =1 h h v v h v
instead of ρ = 1 in [20], which can significantly speed up the h = λk · k1,2 . For every σ > 0, we have
convergence. Moreover, the
convergence is guaranteed with
q 2
the choice στ
m L∗m Lm
= 1,
P
which
Pm we recommend in
proxσh∗ : (uh , uv ) 7→ (uh , uv )/max uh + u2v )/λ, 1 , (5)
practice, whereas the condition στ
i=1 L∗i Li
< 1 was
given in [20]. This new condition is important, since it allows which does not depend on σ, and for which the operations are
to recover the Douglas–Rachford algorithm as a particular case to be understood as pixelwise.
of ours, when f = 0, M = 1, L1 = Id, by setting σ = 1/τ Hence, the problem (4) can be put under the form (2), with:
to let τ appear as the only parameter. 1 2
• f (x) = 2 kAx − yk , whose gradient, given in Fig. 2, has
If M = 0 and one simply wants to minimize f (x)+g(x), the Lipschitz constant β = kAk2 .
proposed algorithms revert to the forward-backward algorithm. • g(x) = ıΩ (x).
We remark that there are often several ways to assign the • M = 1, so that we omit the index m for simplicity,
functions of a given problem to the terms f , g, hm ◦Lm in (2). U = X 2 , λ.TV = h ◦ L with h = λk · k1,2 , L = D.
In particular, a function like 12 kA·−yk2 in (4) can be assigned
either to f or to a term hm ◦ Lm with Lm = A. These two Now we turn our attention to the case of color images. A
formulations yield different algorithms. Although it is hard to color image x has three red (R), green (G), blue (B) channels
make general statements, assigning a function to f whenever xR , xG , xB , which can be manipulated like grayscale images.
one can is probably better for the convergence speed, because Equivalently, the pixel value of x at location k = (kh , kv ) is
T
the vector x[k] = xR [k] xG [k] xB [k] . It is well known that
of the serial way the variables are updated: the step of gradient
descent with respect to f updates and improves the variable x, the R, G, B channels of natural images are strongly correlated.
and this updated version is used to update the dual variables So, it is often better to work within a luminance-chrominance
um . By contrast, the variables um are updated independently representation where, in first approximation, the information
and in parallel, with respect to the antagonist functions hm , is decorrelated. So, we define the orthonormal change of basis
before being essentially averaged to form the new estimate of which maps a vector in the R, G, B basis to a vector of
x. So, except if the algorithm is run on a parallel architecture, luminance, green-red and yellow-blue opponent chrominance:
the higher is M , the slower is the convergence. For the same √1 √1 √1
xL xR
reason, one should make use of the function g instead of a 3 3 3
xG/R = √ −1 1
term hm ◦ Id, especially because every iterate x̃(i) , as well as
2
√
2
0 xG .
(6)
x(i) if ρ ≤ 1, belongs to dom(g); for instance, if g = ıΩ , then xY /B
√1 √1 −2
√ xB
6 6 6
x̃(i) ∈ Ω, for every i ≥ 1.
Note that W−1 = WT , where W is the 3 × 3 matrix in (6).
III. A PPLICATION TO I NVERSE P ROBLEMS R EGULARIZED Then we introduce the color as a regularizer of
BY THE T OTAL VARIATION
P total variation
p
color images: TV(x) = k∈Nh ×Nv kuh [k]k2 + kuv [k]k2 ,
A. Formulation of the Method where (uh , uv ) = Dx is the pair of color images
G/R G/R
In this section, we consider inverse problems in imaging. such that (uL L L
h , uv ) = µDx , (uh , uv ) = DxG/R ,
Y /B Y /B
So, we first place ourselves in the space X = RNh ×Nv of (uh , uv ) = DxY /B . The real parameter µ > 0 in the
IEEE SIGNAL PROC. LETTERS, VOL. 21, NO. 8, PP. 985–989, AUG. 2014
B. Experiments
We first consider the classical problem of deconvolution
of a grayscale image, as illustrated in Fig. 3. We solve the
(a) (b)
problem (4) with A a lowpass convolution operator, with
symmetric boundary conditions, so that kAk = 1. We set Fig. 4. Joint demosaicing-deconvolution of the image v depicted in (a). In (b),
the reconstructed image solution to (4) with λ = 1.5, µ = 0.2, obtained with
Ω = [0, 255]Nh×Nv , so that proxτ g = PΩ is clipping: the pixel 300 iterations of Algorithm 1. The blurring filter is Gaussian (std. dev. 2), the
values in the image larger than 255 or lower than 0 are set noise is white and Gaussian (std. dev. 5), σ = 0.03, τ = 0.99/(0.5 + 8σ),
(0) (0)
to 255 and 0, respectively. This choice is known to be better ρ = 1. uh , uv are set to zero and x(0),R = x(0),G = x(0),B = y.
than Ω = X to limit the appearance of oscillation artifacts.
Note that many optimization algorithms in the literature allow
to perform deconvolution, but in most cases, artificial periodic The second experiment consists in reconstructing a color
boundary conditions are assumed, in order to use the FFT to image by joint deblurring-demosaicing-denoising, see [33],
invert convolutions in Fourier domain. Since only the operators [34] for a presentation of the problem. We solve (4) with
A and A∗ are called with the proposed algorithms, every type A = M B, where B is the same blurring operator as in the
of boundary conditions can be used. The method is flexible and first experiment, applied on each R, G, B channel, and M is
can be adapted without difficulty to spatially-varying blur [27], the Bayer mosaicing operator [33], [34]. We have kAk = 1.
by changing the operator A, or to the presence of Poisson- We set Ω = [0, 255]Nh×Nv ×3 .
Gaussian noise, by replacing the least-squares in f by the Matlab code implementing Algorithm 1 and generating the
appropriate negative log-likelihood [28]. images in Figs. 3,4 is available on the webpage of the author.
To compare the convergence speed with a well known
algorithm, we solve the same problem (4) without the
constraint x ∈ Ω, i.e. g = 0, with the proposed Algorithm 1 IV. C ONCLUSION
and with the alternating direction method of multipliers
(ADMM) [29], also known as split Bregman [30]–[32]. In We proposed two algorithms to exactly solve a large class
our (i+1)
case, ADMM consists in iterating [32]: of convex optimization problems. The simplicity, universality
z
:= proxλk·k1,2 /α (Dx(i) − p(i) /α), and ease of implementation of the algorithms make them well
x(i+1) := (αD∗ D + A∗ A)−1 (A∗ y + αD∗ z (i+1) + D∗ p(i) ), suited to prototyping new methods, for instance to test various
p(i+1) := p(i) + α(z (i+1) − Dx(i+1) ). types of regularization for some inverse problem. However,
At every iteration, the linear system is solved approximately the algorithms do not exploit any further structure of the
with one Richardson iteration. Note that the guarantee of con- problem at hand that may be present. So, our future work will
vergence is lost in that case. We consider the same conditions focus on the theoretical study of the convergence rates, on the
as in Fig. 3 and α =√1e-3. The number i of iterations to reach development of possible accelerations, and on the practical
a RMSE kx̂−x(i) k/ Nh Nv of 2 gray levels is 3481 and 3608 comparison, in terms of computation time and memory usage,
with the proposed Algorithm 1 and ADMM, respectively. with other algorithms, for several typical large-scale problems.
IEEE SIGNAL PROC. LETTERS, VOL. 21, NO. 8, PP. 985–989, AUG. 2014
R EFERENCES [17] B. C. Vũ, “A splitting algorithm for dual monotone inclusions involving
cocoercive operators,” Advances in Computational Mathematics, vol. 38,
[1] S. Becker, E. Candès, and M. Grant, “Templates for convex cone no. 3, pp. 667–681, Apr. 2013.
problems with applications to sparse signal recovery,” Mathematical [18] T. Pock and A. Chambolle, “Diagonal preconditioning for first order
Programming Computation, vol. 3, no. 3, pp. 165–218, Sept. 2011. primal–dual algorithms in convex optimization,” in Proc. of ICCV, Nov.
[2] S. Anthoine, J. F. Aujol, C. Mélot, and Y. Boursier, “Some proximal 2011.
methods for Poisson intensity CBCT and PET,” Inverse Problems and [19] P. L. Combettes and B. C. Vũ, “Variable metric forward–backward split-
Imaging, vol. 6, no. 4, Nov. 2012. ting with applications to monotone inclusions in duality,” Optimization,
[3] E. Y. Sidky, J. H. Jørgensen, and X. Pan, “Convex optimization problem 2013, to be published.
prototyping for image reconstruction in computed tomography with the [20] A. Chambolle and T. Pock, “A first-order primal–dual algorithm for
Chambolle–Pock algorithm,” Physics in Medicine and Biology, vol. 57, convex problems with applications to imaging,” Journal of Mathematical
no. 10, pp. 3065–3091, 2012. Imaging and Vision, vol. 40, no. 1, pp. 120–145, 2011.
[4] P. Chen, J. Huang, and X. Zhang, “A primal-dual fixed point algorithm [21] L. M. Briceño-Arias and P. L. Combettes, “A monotone+skew splitting
for convex separable minimization with applications to image restora- model for composite monotone inclusions in duality,” SIAM J. Optim.,
tion,” Inverse Problems, vol. 29, no. 2, 2013. vol. 21, no. 4, pp. 1230–1250, 2011.
[5] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, “Optimization [22] H. Raguet, J. Fadili, and G. Peyré, “Generalized forward–backward
with sparsity-inducing penalties,” Foundations and Trends in Machine splitting,” SIAM Journal on Imaging Sciences, vol. 6, no. 3, pp. 1199–
Learning, vol. 4, no. 1, pp. 1–106, 2012. 1226, 2013.
[6] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component [23] I. Loris and C. Verhoeven, “On a generalization of the iterative soft-
analysis?” Journal of ACM, vol. 58, no. 1, pp. 1–37, 2009. thresholding algorithm for the case of non-separable penalty,” Inverse
[7] D. Cremers, T. Pock, K. Kolev, and A. Chambolle, “Convex relaxation Problems, vol. 27, no. 12, 2011.
techniques for segmentation, stereo and multiview reconstruction,” in [24] J.-C. Pesquet and N. Pustelnik, “A parallel inertial proximal optimization
Markov Random Fields for Vision and Image Processing. MIT Press, method,” Pacific Journal of Optimization, vol. 8, no. 2, pp. 273–305,
2011. Apr. 2012.
[8] J. Lellmann, D. Breitenreicher, and C. Schnörr, “Fast and exact primal- [25] L. M. Briceño-Arias, “Forward–Douglas–Rachford splitting and
dual iterations for variational problems in computer vision,” in Proc. of forward–partial inverse method for solving monotone inclusions,” Opti-
ECCV: Part II, Heraklion, Crete, Greece, 2010, pp. 494–505. mization, 2013, to be published.
[9] J.-L. Starck, F. Murtagh, and J. Fadili, Sparse Image and Signal [26] X. Bresson and T. F. Chan, “Fast dual minimization of the vectorial
Processing: Wavelets, Curvelets, Morphological Diversity. Cambridge total variation norm and applications to color image processing,” Inverse
University Press, 2010. problems and Imaging, vol. 2, no. 4, pp. 455–484., Nov. 2008.
[10] A. Chambolle, V. Caselles, D. Cremers, M. Novaga, and T. Pock, [27] F. Soulez, É. Thiébaut, and L. Denis, “Restoration of hyperspectral
“An introduction to total variation for image analysis,” in Theoretical astronomical data with spectrally varying blur,” EAS Publications Series,
Foundations and Numerical Methods for Sparse Recovery, vol. 9. De vol. 59, pp. 403–416, 2013.
Gruyter, Radon Series Comp. Appl. Math., 2010, pp. 263–340. [28] A. Jezierska, E. Chouzenoux, J.-C. Pesquet, and H. Talbot, “A convex
[11] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone approach for image restoration with exact Poisson-Gaussian likelihood,”
Operator Theory in Hilbert Spaces. New York: Springer, 2011. 2013, preprint hal-00922151, submitted to IEEE Trans. Image Proc.
[12] C. Chaux, P. L. Combettes, J.-C. Pesquet, and V. R. Wajs, “A varia- [29] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed
tional formulation for frame based inverse problems,” Inverse Problems, optimization and statistical learning via the alternating direction method
vol. 23, pp. 1495–1518, June 2007. of multipliers,” Foundations and Trends in Machine Learning, vol. 3,
[13] P. L. Combettes and J.-C. Pesquet, “Proximal splitting methods in no. 1, pp. 1–122, 2011.
signal processing,” in Fixed-Point Algorithms for Inverse Problems in [30] T. Goldstein and S. Osher, “The split Bregman method for L1-
Science and Engineering, H. H. Bauschke, R. Burachik, P. L. Combettes, regularized problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 2,
V. Elser, D. R. Luke, and H. Wolkowicz, Eds. New York: Springer- pp. 323–343, 2009.
Verlag, 2010. [31] S. Setzer, “Operator splittings, Bregman methods and frame shrinkage
[14] P. Machart, S. Anthoine, and L. Baldassarre, “Optimal computational inimage processing,” International Journal of Computer Vision, vol. 92,
trade-off of inexact proximal methods,” 2012, preprint arXiv:1210.5034. no. 3, pp. 265–280, 2011.
[15] L. Condat, “A primal-dual splitting method for convex optimization [32] E. Esser, “Applications of Lagrangian-based alternating direction meth-
involving Lipschitzian, proximable and linear composite terms,” J. ods and connections to split Bregman,” 2009, tech. Rep. 09-31, ULCA.
Optimization Theory and Applications, vol. 158, no. 2, pp. 460–479, [33] F. Soulez and É. Thiébaut, “Joint deconvolution and demosaicing,” in
2013. Proc. of IEEE ICIP, Cairo, Egypt, Nov. 2009, pp. 145–148.
[16] P. L. Combettes and J.-C. Pesquet, “Primal–dual splitting algorithm [34] L. Condat and S. Mosaddegh, “Joint demosaicking and denoising by
for solving inclusions with mixtures of composite, Lipschitzian, and total variation minimization,” in Proc. of IEEE ICIP, Orlando, USA,
parallel-sum type monotone operators,” Set-Valued and Variational Anal- Sept. 2012.
ysis, vol. 20, no. 2, pp. 307–330, 2012.