Generalized Nonconvex Nonsmooth Low-Rank Minimization
Generalized Nonconvex Nonsmooth Low-Rank Minimization
Supergradient w g(T)
Abstract
Supergradient w g(T)
3 200 2
1
Penalty g(T)
Penalty g(T)
150 1.5
2
100 1
As surrogate functions of L0 -norm, many nonconvex 1
50 0.5
0.5
vex penalty functions on singular values of a matrix to en- (a) Lp Penalty [11] (b) SCAD Penalty [10]
Supergradient w g(T)
Supergradient w g(T)
hance low-rank matrix recovery. However, different from 3 1.5 2 1
Penalty g(T)
Penalty g(T)
convex optimization, solving the nonconvex low-rank mini- 2 1
1.5
1 0.5
mization problem is much more challenging than the non- 1 0.5
0.5
convex sparse minimization problem. We observe that all 0 0 0 0
0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6
the existing nonconvex penalty functions are concave and T T T T
monotonically increasing on [0, ∞). Thus their gradients (c) Logarithm Penalty [12] (d) MCP Penalty [23]
are decreasing functions. Based on this property, we pro-
Supergradient w g(T)
Supergradient w g(T)
2 2 2
1
Penalty g(T)
Penalty g(T)
pose an Iteratively Reweighted Nuclear Norm (IRNN) al- 1.5 1.5 1.5
1 1 1
gorithm to solve the nonconvex nonsmooth low-rank mini- 0.5
0.5 0.5 0.5
mization problem. IRNN iteratively solves a Weighted Sin- 0 0 0 0
0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6
gular Value Thresholding (WSVT) problem. By setting the T T T T
weight vector as the gradient of the concave penalty func- (e) Capped L1 Penalty [24] (f) ETP Penalty [13]
tion, the WSVT problem has a closed form solution. In theo-
Supergradient w g(T)
Supergradient w g(T)
2 0.8 2 0.8
Penalty g(T)
Penalty g(T)
ry, we prove that IRNN decreases the objective function val- 1.5 0.6 1.5 0.6
1 0.4 1 0.4
ue monotonically, and any limit point is a stationary point.
0.5 0.2 0.5 0.2
Extensive experiments on both synthetic data and real im- 0 0 0 0
0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6
ages demonstrate that IRNN enhances the low-rank matrix T T T T
recovery compared with state-of-the-art convex algorithms. (g) Geman Penalty [15] (h) Laplace Penalty [21]
m
A1 gλ : R → R+ is continuous, concave and monotoni-
min F (X) = gλ (σi (X)) + f (X), (1) cally increasing on [0, ∞). It is possibly nonsmooth.
X∈Rm×n
i=1
A2 f : Rm×n → R+ is a smooth function of type C 1,1 ,
m×n
where σi (X) denotes the i-th singular value of X ∈ R i.e., the gradient is Lipschitz continuous,
(we assume m ≤ n in this work). The penalty function gλ
and loss function f satisfy the following assumptions: ||∇f (X) − ∇f (Y)||F ≤ L(f )||X − Y||F , (2)
∗ Corresponding author. for any X, Y ∈ Rm×n , L(f ) > 0 is called Lipschitz
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on June 05,2020 at 15:51:16 UTC from IEEE Xplore. Restrictions apply.
Table 1: Popular nonconvex surrogate functions of ||θ||0 and their supergradients.
Penalty Formula gλ (θ), θ ≥ 0, λ > 0 Supergradient
∂gλ (θ)
p ∞, if θ = 0,
Lp [11] λθ
λpθ p−1 , if θ > 0.
⎧ ⎧
⎪
⎪ λθ, if θ ≤ λ, ⎪
⎨ 2 ⎨λ, if θ ≤ λ,
−θ +2γλθ−λ2 γλ−θ
SCAD [10] , if λ < θ ≤ γλ, , if λ < θ ≤ γλ,
⎪ 2 2(γ−1)
⎪ ⎪
⎩0,
γ−1
⎩ λ (γ+1) , if θ > γλ. if θ > γλ.
2
λ γλ
Logarithm [12] log(γ+1)
log(γθ + 1) (γθ+1) log(γ+1)
θ2 λ − γθ , if θ < γλ,
λθ − 2γ , if θ < γλ,
MCP [23] 1 2
2 γλ , if θ ≥ γλ. 0, if θ ≥ γλ.
⎧
⎪
λθ, if θ < γ, ⎨λ, if θ < γ,
Capped L1 [24] [0, λ], if θ = γ,
λγ, if θ ≥ γ. ⎪
⎩0, if θ > γ.
λ λγ
ETP [13] 1−exp(−γ)
(1 − exp(−γθ)) 1−exp(−γ)
exp(−γθ)
λθ λγ
Geman [15] θ+γ (θ+γ)2
Laplace [21] λ(1 − exp(− γθ )) λ θ
γ exp(− γ )
constant of ∇f . f (X) is possibly nonconvex. have been proposed, including Lp -norm [11], Smoothly
Clipped Absolute Deviation (SCAD) [10], Logarithm [12],
A3 F (X) → ∞ iff || X ||F → ∞. Minimax Concave Penalty (MCP) [23], Capped L1 [24],
Many optimization problems in machine learning and Exponential-Type Penalty (ETP) [13], Geman [15], and
computer vision areas fall into the formulation in (1). As for Laplace [21]. Table 1 tabulates these penalty functions and
the choice of f , the squared loss f (X) = 12 ||A(X) − b||2F , Figure 1 visualizes them. One may refer to [14] for more
with a linear mapping A, is widely used. In this case, the properties of these penalty functions. Some of these non-
Lipschitz constant of ∇f is then the spectral radius of A∗ A, convex penalties have been extended to approximate the
i.e., L(f ) = ρ(A∗ A), where A ∗
is the adjoint operator of rank function, e.g. the Schatten-p norm [19]. Another non-
m
A. By choosing gλ (x) m = λx, i=1 gλ (σi (X)) is exactly
convex surrogate of rank function is the truncated nuclear
the nuclear norm λ i=1 σi (X) = λ|| X ||∗ . Problem (1) norm [16].
resorts to the well known nuclear norm regularized problem For nonconvex sparse minimization, several algorithms
have been proposed to solve the problem with a nonconvex
min λ|| X ||∗ + f (X). (3) regularizer. A common method is DC (Difference of Con-
X
vex functions) programming [14]. It minimizes the non-
If f (X) is convex, it is the most widely used convex relax- convex function f (x) − (−gλ (x)) based on the assumption
ation of the rank minimization problem: that both f and −gλ are convex. In each iteration, DC pro-
gramming linearizes −gλ (x) at x = xk , and minimizes the
min λrank(X) + f (X). (4)
X relaxed function as follows
The above low-rank minimization problem arises in many xk+1 = arg min f (x) − (−gλ (xk )) − vk , x −xk , (5)
x
machine learning tasks such as multiple category classifi-
cation [1], matrix completion [20], multi-task learning [2], where vk is a subgradient of −gλ (x) at x = xk . DC pro-
and low-rank representation with squared loss for subspace gramming may be not very efficient, since it requires some
segmentation [18]. However, solving problem (4) is usu- other iterative algorithm to solve (5). Note that the updating
ally difficult, or even NP-hard. Most previous works solve rule (5) of DC programming cannot be extended to solve the
m problem (1). The reason is that for concave gλ ,
the convex problem (3) instead. It has been proved that un- low-rank
der certain incoherence assumptions on the singular values − i=1 gλ (σi (X)) does not guarantee to be convex w.r.t.
of the matrix, solving the convex nuclear norm regularized X. DC programming also fails when f is nonconvex in
problem leads to a near optimal low-rank solution [6]. How- problem (1).
ever, such assumptions may be violated in real applications. Another solver is to use the proximal gradient algorith-
The obtained solution by using nuclear norm may be sub- m which is originally designed for convex problem [3]. It
optimal since it is not a perfect approximation of the rank requires computing the proximal operator of gλ ,
function. A similar phenomenon has been observed in the
1
convex L1 -norm and nonconvex L0 -norm for sparse vector Pgλ (y) = arg min gλ (x) + (x − y)2 , (6)
recovery [7]. x 2
In order to achieve a better approximation of the L0 - in each iteration. However, for nonconvex gλ , there may not
norm, many nonconvex surrogate functions of L0 -norm exist a general solver for (6). Even if (6) is solvable, differ-
4131
4127
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on June 05,2020 at 15:51:16 UTC from IEEE Xplore. Restrictions apply.
ent from convex optimization, (Pgλ (y1 ) − Pgλ (y2 ))(y1 − g(x 2 ) vT2 x x 2
y2 ) ≥ 0 does not always hold. Thus we cannot perform
Pgλ (·) on the singular values of Y directly for solving g(x 2 ) vT3 x x 2
m
g(x1 ) v1T x x1
Pgλ (Y) = arg min gλ (σi (X)) + || X − Y ||2F . (7) g(x)
X
i=1
x1 x2
The nonconvexity of gλ makes the nonconvex low-rank
Figure 2: Supergraidients of a concave function. v1 is a super-
minimization problem much more challenging than the
gradient at x1 , and v2 and v3 are supergradients at x2 .
nonconvex sparse minimization.
Another related work is the Iteratively Reweighted Least 2.1. Supergradient of a Concave Function
Squares (IRLS) algorihtm. It has been recently extended to
handle the nonconvex Schatten-p norm penalty [19]. Actu- The subgradient of the convex function is an extension
ally it solves a relaxed smooth problem which may require of gradient at a nonsmooth point. Similarly, the supergradi-
many iterations to achieve a low-rank solution. It cannot ent is an extension of gradient of the concave function at a
solve the general nonsmooth problem (1). The alternative nonsmooth point. If g(x) is concave and differentiable at x,
updating algorithm in [16] minimizes the truncated nuclear it is known that
norm by using a special property of this penalty. It contains g(x) + ∇g(x), y − x ≥ g(y). (8)
two loops, both of which require computing SVD. Thus it is
not very efficient. It cannot be extended to solve the general If g(x) is nonsmooth at x, the supergradient extends the
problem (1) either. gradient at x inspired by (8) [5].
In this work, all the existing nonconvex surrogate func-
Definition 1 Let g : Rn → R be concave. A vector v is a
tions of L0 -norm are extended on the singular values of a
supergradient of g at the point x ∈ Rn if for every y ∈ Rn ,
matrix to enhance low-rank recovery. In problem (1), gλ
the following inequality holds
can be any existing nonconvex penalty function shown in
Table 1 or any other function which satisfies the assump- g(x) + v, y − x ≥ g(y). (9)
tion (A1). We observe that all the existing nonconvex sur-
rogate functions are concave and monotonically increasing All supergradients of g at x are called the superdifferential
on [0, ∞). Thus their gradients (or supergradients at the of g at x, and are denoted as ∂g(x). If g is differentiable at
nonsmooth points) are nonnegative and monotonically de- x, ∇g(x) is also a supergradient, i.e., ∂g(x) = {∇g(x)}.
creasing. Based on this key fact, we propose an Iterative- Figure 2 illustrates the supergradients of a concave function
ly Reweighted Nuclear Norm (IRNN) algorithm to solve at both differentiable and nondifferentiable points.
problem (1). IRNN computes the proximal operator of the For concave g, −g is convex, and vice versa. From this
weighted nuclear norm, which has a closed form solution fact, we have the following relationship between the super-
due to the nonnegative and monotonically decreasing su- gradient of g and the subgradient of −g.
pergradients. In theory, we prove that IRNN monotonically Lemma 1 Let g(x) be concave and h(x) = −g(x). For
decreases the objective function value, and any limit point is any v ∈ ∂g(x), u = −v ∈ ∂h(x), and vice versa.
a stationary point. To the best of our knowledge, IRNN is
the first work which is able to solve the general problem The relationship of the supergradient and subgradien-
(1) with convergence guarantee. Note that for noncon- t shown in Lemma 1 is useful for exploring some properties
vex optmization, it is usually very difficult to prove that of the supergradient. It is known that the subdiffierential of
an algorithm converges to stationary points. At last, we a convex function h is a monotone operator, i.e.,
test our algorithm with several nonconvex penalty function-
u − v, x − y ≥ 0, (10)
s on both synthetic data and real image data to show the
effectiveness of the proposed algorithm. for any u ∈ ∂h(x), v ∈ ∂h(y). The superdifferential of
a concave function holds a similar property, which is called
2. Nonconvex Nonsmooth Low-Rank Mini- antimonotone operator in this work.
mization Lemma 2 The superdifferential of a concave function g is
In this section, we present a general algorithm to solve an antimonotone operator, i.e.,
problem (1). To handle the case that gλ is nonsmooth, e.g., u − v, x − y ≤ 0, (11)
Capped L1 penalty, we need the concept of supergradient
defined on the concave function. for any u ∈ ∂g(x), v ∈ ∂g(y).
4132
4128
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on June 05,2020 at 15:51:16 UTC from IEEE Xplore. Restrictions apply.
This can be easily proved by Lemma 1 and (10). Algorithm 1 Solving problem (1) by IRNN
Lemma 2 is a key lemma in this work. Supposing that Input: μ > L(f ) - A Lipschitz constant of ∇f (X).
the assumption (A1) holds for g(x), (11) indicates that Initialize: k = 0, Xk , and wik , i = 1, · · · , m.
u ≥ v, for any u ∈ ∂g(x) and v ∈ ∂g(y), (12) Output: X ∗ .
while not converge do
when x ≤ y. That is to say, the supergradient of g is mono-
1. Update Xk+1 by solving problem (18).
tonically decreasing on [0, ∞). Table 1 shows some usual
concave functions and their supergradients. We also visual- 2. Update the weights wik+1 , i = 1, · · · , m, by
ize them in Figure 1. It can be seen that they all satisfy the
assumption (A1). Note that for the Lp penalty, we further wik+1 ∈ ∂gλ σi (Xk+1 ) . (17)
define that ∂g(0) = ∞. This will not affect our algorithm
and convergence analysis as shown latter. The Capped L1 end while
penalty is nonsmooth at θ = γ, with the superdifferential
∂gλ (γ) = [0, λ].
2.2. Iteratively Reweighted Nuclear Norm Instead of updating Xk+1 by solving (16), we linearize
f (X) at Xk and add a proximal term:
In this subsection, we show how to solve the general non-
convex and possibly nonsmooth problem (1) based on the μ
f (X) ≈ f (Xk ) + ∇f (Xk ), X − Xk + ||X − Xk ||2F ,
assumptions (A1)-(A2). For simplicity of notation, we de- 2
note σi = σi (X) and σik = σi (Xk ).
Since gλ is concave on [0, ∞), by the definition of the where μ > L(f ). Such a choice of μ guarantees the con-
supergradient, we have vergence of our algorithm as shown latter. Then we update
Xk+1 by solving
gλ (σi ) ≤ gλ (σik ) + wik (σi − σik ), (13) m
Xk+1 = arg min wik σi + f (Xk )
where X
i=1
wik ∈ ∂gλ (σik ). (14) μ
+ ∇f (X ), X − Xk + ||X − Xk ||2F
k
Since σ1k ≥ σ2k ≥ · · · ≥ σm k
≥ 0, by the antimonotone 2
m
2
property of supergradient (12), we have μ 1
= arg min wik σi + X − Xk
− ∇f (X k
) .
X 2 μ
0 ≤ w1k ≤ w2k ≤ · · · ≤ wm
k
. (15) i=1 F
(18)
This property is important in our algorithm shown latter.
(13) motivates us to minimize its right hand side instead of Problem (18) is still nonconvex. Fortunately, it has a closed
gλ (σi ). Thus we may solve the following relaxed problem form solution due to (15).
m
Lemma 3 [8, Theorem 2.3] For any λ > 0, Y ∈ Rm×n
k+1
X = arg min gλ (σik ) + wik (σi − σik ) + f (X) and 0 ≤ w1 ≤ w2 ≤ · · · ≤ ws (s = min(m, n)), a global-
X
i=1 ly optimal solution to the following problem
m
= arg min wik σi + f (X). s
1
X
i=1 min λ wi σi (X) + ||X − Y||2F , (19)
i=1
2
(16)
is given by the weighted singular value thresholding
It seems that updating Xk+1 by solving the above weighted
nuclear norm problem (16) is an extension of the weighted X∗ = U Sλw (Σ)V T , (20)
L1 -norm problem in IRL1 algorithm [7] (IRL1 is a special
DC programming algorithm). However, the weighted nu- where Y = U ΣV T is the SVD of Y, and Sλw (Σ) =
clear norm is nonconvex in (16) (it is convex if and only Diag{(Σii − λwi )+ }.
if w1k ≥ w2k ≥ · · · ≥ wm k
≥ 0 [8]), while the weighted
L1 -norm is convex. Solving the nonconvex problem (16) is It is worth mentioning that for the Lp penalty, if σik = 0,
much more challenging than the convex weighted L1 -norm ∈ ∂gλ (σik ) = {∞}. By the updating rule of Xk+1 in
wik
problem. In fact, it is not easier than solving the original (18), we have σik+1 = 0. This guarantees that the rank of
problem (1). the sequence {Xk } is nonincreasing.
4133
4129
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on June 05,2020 at 15:51:16 UTC from IEEE Xplore. Restrictions apply.
Iteratively updating wik , i = 1, · · · , m, by (14) and Third, since wik ∈ ∂gλ (σik ), by the definition of the super-
k+1
X by (18) leads to the proposed Iteratively Reweight- gradient, we have
ed Nuclear Norm (IRNN) algorithm. The whole procedure
gλ (σik ) − gλ (σik+1 ) ≥ wik (σik − σik+1 ). (24)
of IRNN is shown in Algorithm 1. If the Lipschitz constant
L(f ) is not known or computable, the backtracking rule can Now, summing (22), (23) and (24) for i = 1, · · · , m, to-
be used to estimate μ in each iteration [3]. gether, we obtain
4134
4130
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on June 05,2020 at 15:51:16 UTC from IEEE Xplore. Restrictions apply.
4. Extension to Other Problems
1 0.5
APGL
0.9 IRNN - Lp
0.45
IRNN - SCAD
0.8 IRNN - Logarithm
0.4
Our proposed IRNN algorithm can solve a more general IRNN - MCP
Frequency of Sucess
0.7 IRNN - ETP
0.35
Relative Error
low-rank minimization problem as follows, 0.6
0.5
0.3
0.25
m
0.4
0.2
i=1 0.1
IRNN-Logarithm
IRNN-MCP
0.1
IRNN-ETP
0 0.05
gradients satisfy 0 ≤ v1 ≤ v2 ≤ · · · ≤ vm , for any (a) random data without noise (b) random data with noise
vi ∈ ∂gi (σi (X)),mi = 1, · · · , m. The truncated nuclear
norm || X ||r = i=r+1 σi (X) m[16] satisfies the above as-
Figure 3: Comparison of matrix recovery on (a) random data
sumption. Indeed, || X ||r = i=1 gi (σi (X)) by letting without noise, and (b) random data with noise.
0, i = 1, · · · , r, where Ω is the set of indices of samples, and PΩ : Rm×n →
gi (x) = (31) Rm×n is a linear operator that keeps the entries in Ω un-
x, i = r + 1, · · · , m.
changed and those outside Ω zeros. The gradient of squared
Their supergradients are loss function in (35) is Lipschitz continuous, with a Lips-
chitz constant L(f ) = 1. We set μ = 1.1 in Algorithm 1.
0, i = 1, · · · , r, For the choice of gλ , we test all the penalty functions listed
∂gi (x) = (32)
1, i = r + 1, · · · , m. in Table 1 except for Capped L1 and Geman, since we find
that their recovery performances are sensitive to the choices
The convergence results in Theorem 1 and 2 also hold since of γ and λ in different cases. For the choice of λ in IRN-
(24) holds for each gi . Compared with the alternating up- N, we use a continuation technique to enhance the low-rank
dating algorithms in [16], which require double loops, our matrix recovery. The initial value of λ is set to a larger val-
IRNN algorithm will be more efficient and with stronger ue λ0 , and dynamically decreased by λ = η k λ0 with η < 1.
convergence guarantee. It is stopped till reaching a predefined target λt . X is ini-
More generally, IRNN can solve the following problem tialized as a zero matrix. For the choice of parameters (e.g.,
m
p and γ) in the nonconvex penalty functions, we search it
min g(h(σi (X))) + f (X), (33) from a candidate set and use the one which obtains good
X
i=1 performance in most cases 1 .
when g(y) is concave, and the following problem 5.1. Low-Rank Matrix Recovery
min wi h(σi (X)) + || X −Y||2F , (34) We first compare our nonconvex IRNN algorithm with
X
state-of-the-art convex algorithms on synthetic data. We
can be cheaply solved. An interesting application of (33) conduct two experiments. One is for the observed matrix
is to extend the group sparsity on the singular values. By M without noise, and the other one is for M with noise.
dividing the singular values into k groups, i.e., G1 =
For the noise free case, we generate the rank r matrix M
{1, · · · , r1 }, G2 = {r1 + 1, · · · , r1 + r2 − 1}, · · · , Gk =
k−1 as ML MR , where ML ∈ R150×r , and MR ∈ Rr×150 are
{ i ri + 1, · · · , m}, where i ri = m, we can de- generated by the Matlab command randn. 50% elements
fine the group sparsity on the singular values as || X ||2,g = of M are missing uniformly at random. We compare our
k
i=1 g(||σGi ||2 ). This is exactly the first term in (33) by algorithm with Augmented Lagrange Multiplier (ALM) 2
letting h be the L2 -norm of a vector. g can be noncon- [17] which solves the noise free problem
vex functions satisfying the assumption (A1) or specially
the convex absolute function. min || X ||∗ s.t. PΩ (X) = PΩ (M). (36)
X
5. Experiments For this task, we set λ0 = ||PΩ (M)||∞ , λt = 10−5 λ0 ,
In this section, we present several experiments on both and η = 0.7 in IRNN, and stop the algorithm when
synthetic data and real images to validate the effectiveness ||PΩ (X − M)||F ≤ 10−5 . For ALM, we use the default
of the IRNN algorithm. We test our algorithm on the matrix parameters in the released codes. We evaluate the recov-
completion problem ery performance by the Relative Error defined as ||X̂ −
m
1
1 Code of IRNN: https://sites.google.com/site/canyilu/.
min gλ (σi (X)) + ||PΩ (X − M)||2F , (35) 2 Code: http://perception.csl.illinois.edu/matrix-rank/
X
i=1
2 sample_code.html.
4135
4131
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on June 05,2020 at 15:51:16 UTC from IEEE Xplore. Restrictions apply.
(1)
(2)
(a) Original Image (b) Noisy Image (c) APGL (d) LMaFit (e) TNNR-ADMM (f) IRNN-Lp (g) IRNN-SCAD
Figure 4: Comparison of image recovery by using different matrix completion algorithms. (a) Original image. (b) Image
with Gaussian noise and text. (c)-(g) Recovered images by APGL, LMaFit, TNNR-ADMM, IRNN-Lp , and IRNN-SCAD,
respectively. Best viewed in ×2 sized color pdf file.
M ||F /|| M ||F , where X̂ is the recovered solution by a cer- rank, but the top singular values dominate the main infor-
tain algorithm. If the Relative Error is smaller than 10−3 , mation [16]. Thus the corrupted image can be recovered
X̂ is regarded as a successful recovery of M. We repeat by low-rank approximation. For color images which have
the experiments 100 times with the underlying rank r vary- three channels, we simply apply matrix completion for each
ing from 20 to 33 for each algorithm. The frequency of channel independently. The well known Peak Signal-to-
success is plotted in Figure 3a. The legend IRNN-Lp in Noise Ratio (PSNR) is employed to evaluate the recovery
Figure 3a denotes the Lp penalty function used in problem performance. We compare IRNN with some other ma-
(1) and solved by our proposed IRNN algorithm. It can be trix completion algorithms which have been applied for
seen that IRNN with all the nonconvex penalty functions this task, including APGL, Low-Rank Matrix Fitting (L-
achieves much better recovery performance than the con- MaFit) 4 . [22] and Truncated Nuclear Norm Regularization
vex ALM algorithm. This is because the nonconvex penalty (TNNR) [16]. We use the solver based on ADMM to solve
functions approximate the rank function better than the con- a subproblem of TNNR in the released codes (denoted as
vex nuclear norm. TNNR-ADMM) 5 . We try to tune the parameters to be op-
For the noisy case, the data are generated by PΩ (M) = timal of the chosen algorithms and report the best result.
PΩ (ML MR )+0.1×randn. We compare our algorith- In our test, we consider two types of noises on the real
m with convex Accelerated Proximal Gradient with Line images. The first one replaces 50% of pixels with random
search (APGL) 3 [20] which solves the noisy problem values (sample image (1) in Figure 4 (b)). The other one
adds some unrelated texts on the image (sample image (2)
1 in Figure 4 (b)). Figure 4 (c)-(g) show the recovered images
min λ|| X ||∗ + ||PΩ (X) − PΩ (M)||2F . (37)
X 2 by different methods. It can be observed that our IRNN
method with different penalty functions achieves much bet-
For this task, we set λ0 = 10||PΩ (M)||∞ , and λt = 0.1λ0
ter recovery performance than APGL and LMaFit. Only
in IRNN. All the chosen algorithms are run 100 times with
the results by IRNN-Lp and IRNN-SCAD are plotted due
the underlying rank r lying between 15 and 35. The rela-
to the limit of space. We further test on more images and
tive errors can be ranging for each test, and the mean errors
plot the results in Figure 5. Figure 6 shows the PSNR val-
by different methods are plotted in Figure 3b. It can be
ues of different methods on all the test images. It can be
seen that IRNN for the nonconvex penalty outperforms the
seen that IRNN with all the evaluated nonconvex functions
convex APGL for the noisy case. Note that we cannot con-
achieves higher PSNR values, which verifies that the non-
clude from Figure 3 that IRNN with Lp , Logarithm and ET-
convex penalty functions are effective in this situation. The
P penalty functions always perform better than SCAD and
nonconvex truncated nuclear norm is close to our methods,
MCP, since the obtained solutions are not globally optimal.
but its running time is 3∼5 times of that for ours.
5.2. Application to Image Recovery
6. Conclusions and Future Work
In this section, we apply matrix completion for image
recovery. As shown in Figure 4, the real image may be In this work, the nonconvex surrogate functions of L0 -
corrupted by different types of noises, e.g., Gaussian noise norm are extended on the singular values to approximate
or unrelated text. Usually the real images are not of low- 4 Code: http://lmafit.blogs.rice.edu/.
3 Code: http://www.math.nus.edu.sg/ mattohkc/NNLS.html. 5 Code: https://sites.google.com/site/zjuyaohu/.
˜
4136
4132
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on June 05,2020 at 15:51:16 UTC from IEEE Xplore. Restrictions apply.
45
APGL IRNN - SCAD
40
LMaFit IRNN - Logarithm
() TNNR - ADMM IRNN - MCP
35
IRNN - Lp IRNN - ETP
30
Image recovery by APGL lp
25
PSNR
() 20
15
10
Image recovery by APGL lp
() 0
Image (1) Image (2) Image (3) Image (4) Image (5) Image (6)
() [3] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for
linear inverse problems. SIAM Journal on Imaging Sciences, 2009.
[4] D. P. Bertsekas. Nonlinear programming. Athena Scientific (Belmont, Mass.),
2nd edition, 1999.
(a) Original Image (b) Noisy Image (c) APGL (d) IRNN-Lp [5] K. Border. The supergradient of a concave function. http://www.hss.
caltech.edu/˜kcb/Notes/Supergrad.pdf, 2001. [Online].
[6] E. Candès and T. Tao. The power of convex relaxation: Near-optimal matrix
Figure 5: Comparison of image recovery on more images. (a) completion. IEEE Transactions on Information Theory, 2010.
Original images. (b) Images with noises. Recovered images by (c) [7] E. Candès, M. Wakin, and S. Boyd. Enhancing sparsity by reweighted 1 min-
APGL, and (d) IRNN-Lp . Best viewed in ×2 sized color pdf file. imization. Journal of Fourier Analysis and Applications, 2008.
[8] K. Chen, H. Dong, and K. Chan. Reduced rank regression via adaptive nuclear
norm penalization. Biometrika, 2013.
the rank function. It is observed that all the existing non- [9] F. Clarke. Nonsmooth analysis and optimization. In Proceedings of the Inter-
convex surrogate functions are concave and monotonically national Congress of Mathematicians, 1983.
increasing on [0, ∞). Then a general solver IRNN is pro- [10] J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and
its oracle properties. Journal of the American Statistical Association, 2001.
posed to solve problem (1) with such penalties. IRNN is the
[11] L. Frank and J. Friedman. A statistical view of some chemometrics regression
first algorithm which is able to solve the general noncon- tools. Technometrics, 1993.
vex low-rank minimization problem (1) with convergence [12] J. Friedman. Fast sparse regression and classification. International Journal of
guarantee. The nonconvex penalty can be nonsmooth by Forecasting, 2012.
[13] C. Gao, N. Wang, Q. Yu, and Z. Zhang. A feasible nonconvex relaxation ap-
using the supergradient at the nonsmooth point. In theory, proach to feature selection. In AAAI, 2011.
we proved that any limit point is a local minimum. Ex- [14] G. Gasso, A. Rakotomamonjy, and S. Canu. Recovering sparse signals with a
periments on both synthetic data and real images demon- certain family of nonconvex penalties and DC programming. IEEE Transac-
tions on Signal Processing, 2009.
strated that IRNN usually outperforms the state-of-the-art [15] D. Geman and C. Yang. Nonlinear image recovery with half-quadratic regular-
convex algorithms. An interesting future work is to solve ization. TIP, 1995.
the nonconvex low-rank minimization problem with affine [16] Y. Hu, D. Zhang, J. Ye, X. Li, and X. He. Fast and accurate matrix completion
via truncated nuclear norm regularization. TPAMI, 2013.
constraint. A possible way is to combine IRNN with Alter-
[17] Z. Lin, M. Chen, L. Wu, and Y. Ma. The augmented lagrange multiplier method
nating Direction Method of Multiplier (ADMM). for exact recovery of a corrupted low-rank matrices. UIUC Technical Report
UILU-ENG-09-2215, Tech. Rep., 2009.
Acknowledgements [18] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. Robust recovery of subspace
structures by low-rank representation. TPAMI, 2013.
This research is supported by the Singapore National Re- [19] K. Mohan and M. Fazel. Iterative reweighted algorithms for matrix rank mini-
mization. In JMLR, 2012.
search Foundation under its International Research Centre [20] K. Toh and S. Yun. An accelerated proximal gradient algorithm for nuclear
@Singapore Funding Initiative and administered by the ID- norm regularized linear least squares problems. Pacific Journal of Optimization,
2010.
M Programme Office. Z. Lin is supported by NSF of China
[21] J. Trzasko and A. Manduca. Highly undersampled magnetic resonance image
(Grant nos. 61272341, 61231002, and 61121002) and M- reconstruction via homotopic 0 -minimization. TMI, 2009.
SRA. [22] Z. Wen, W. Yin, and Y. Zhang. Solving a low-rank factorization model for
matrix completion by a nonlinear successive over-relaxation algorithm. Math-
ematical Programming Computation, 2012.
References [23] C. Zhang. Nearly unbiased variable selection under minimax concave penalty.
[1] Y. Amit, M. Fink, N. Srebro, and S. Ullman. Uncovering shared structures in The Annals of Statistics, 2010.
multiclass classification. In ICML, 2007. [24] T. Zhang. Analysis of multi-stage convex relaxation for sparse regularization.
[2] A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. JMLR, 2010.
Machine Learning, 2008.
4137
4133
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on June 05,2020 at 15:51:16 UTC from IEEE Xplore. Restrictions apply.