0% found this document useful (0 votes)
1 views

A note on the accelerated proximal gradient method for nonconvex optimization

This document presents an improvement to the accelerated proximal gradient (APG) method for nonconvex optimization, allowing variable step sizes and proving convergence under the Kurdyka-Łojasiewicz property. The authors build on previous work by Li et al., enhancing the algorithm's results and establishing convergence with finite iterations. The paper includes definitions and lemmas related to nonconvex functions, proximal mappings, and convergence criteria.

Uploaded by

sach.co.quy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

A note on the accelerated proximal gradient method for nonconvex optimization

This document presents an improvement to the accelerated proximal gradient (APG) method for nonconvex optimization, allowing variable step sizes and proving convergence under the Kurdyka-Łojasiewicz property. The authors build on previous work by Li et al., enhancing the algorithm's results and establishing convergence with finite iterations. The paper includes definitions and lemmas related to nonconvex functions, proximal mappings, and convergence criteria.

Uploaded by

sach.co.quy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

CARPATHIAN J. MATH. Online version at https://www.carpathian.cunbm.utcluj.

ro/
Volume 34 (2018), No. 3, Print Edition: ISSN 1584 - 2851; Online Edition: ISSN 1843 - 4401
Pages 449 - 457 DOI: https://doi.org/10.37193/CJM.2018.03.22

Dedicated to Professor Yeol Je Cho on the occasion of his retirement

A note on the accelerated proximal gradient method for


nonconvex optimization

H UIJUAN WANG and H ONG -K UN X U

A BSTRACT. We improve a recent accelerated proximal gradient (APG) method in [Li, Q., Zhou, Y., Liang,
Y. and Varshney, P. K., Convergence analysis of proximal gradient with momentum for nonconvex optimization, in
Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017]
for nonconvex optimization by allowing variable stepsizes. We prove the convergence of the APG method
for a composite nonconvex optimization problem under the assumption that the composite objective function
satisfies the Kurdyka-Łojasiewicz property.

1. I NTRODUCTION
In this paper we consider a composite optimization problem of the form
(1.1) min F (x) := f (x) + g(x),
x∈H

where H = Rd is a Euclidean d-space, and f and g are proper, lower-semicontinuous


functions from H to (−∞, ∞].
In the convex case (i.e., f and g are convex functions), the proximal gradient method [6]
can well be used to solve (1.1); moreover, Nesterov’s acceleration technique can be used
to speed up the rate of convergence from O( k1 ) to O( k12 ) [3]. However, this remains open
for nonconvex optimization.
Very recently in [10], Li, et al proposed a new algorithm which is known as an accel-
erated proximal gradient (APG) method which constructs three sequences (xk ), (yk ) and
(vk ) in such a way that xk is produced from yk through the composite of the proximal
mapping of g and the gradient ∇f of f (f is assumed to have Lipschitz continuous gra-
dient), and vk is simply a linear combination of xk and xk−1 . Li, et al proved that their
algorithm guarantees, under certain conditions, that the sequence (xk ) is bounded, each
cluster point of (xk ) is a critical point of F , and F is constant on the set of cluster points
of (xk ). They also obtained errors on the residual of F (xk ) − inf F under the uniformized
Kurdyka-Łojasiewicz property with desingularizing function φ(t) = ctθ , where c is a con-
stant and θ ∈ (0, 1].
We continue working in this line by improving the algorithm and results of Li, et al
[10] in twofold. First we allow the stepsizes to vary with the iteration steps and obtain
the same convergence results of [10, Theorem 3]. Secondly, we prove convergence with
finite length of our algorithm under the Kurdyka-Łojasiewicz property with a genaral
desingularizing function, which is not discussed in Li, et al [10].

Received: 17.09.2017. In revised form: 13.06.2018. Accepted: 15.07.2018


2010 Mathematics Subject Classification. 90C26, 90C30, 90C46.
Key words and phrases. accelerated proximal gradient method, composite nonconvex optimization, Kurdyka-
Łojasiewicz property, subdifferential, critical point.
Corresponding author: Hong-Kun Xu; [email protected]

449
450 H. Wang and H. K. Xu

2. P RELIMINARIES
Let d ≥ 1 be a given integer and consider the Euclidean d-space Rd with inner product
⟨·, ·⟩ and norm ∥ · ∥ (i.e., ∥ · ∥2 ). By Γ(Rd ) we denote the family of all functions f : Rd →
(−∞, ∞] =: R which are proper and lower-semicontinuous (lsc).
2.1. Subdifferential of Nonconvex Functions.
Definition 2.1. Let f ∈ Γ(Rd ) and x ∈ dom(f ) be given. We say that x∗ ∈ Rd is a Fréchet
derivative of f at x if
f (z) − f (x) − ⟨x∗ , z − x⟩
lim inf ≥ 0.
z→x ∥z − x∥
ˆ (x), is said to be the Fréchet differential
The set of Fréchet derivatives of f at x, denoted ∂f
of f at x.
The (Mordukhovich) limiting-subdifferential (or simply, subdifferential) of f at x, de-
noted ∂f (x), is defined as
f
ˆ (xk ) with xk −→
∂f (x) = {x∗ ∈ H : ∃ x∗k → x∗ , x∗k ∈ ∂f x}.
f f
Here “−→” means f -convergence, that is, xk −→ x if and only if xk → x and f (xk ) →
f (x).
Definition 2.2. We say that a point x is a critical point of f if 0 ∈ ∂f (x). The lazy slope of
f at a point x is defined as
|∂f (x)| := inf{∥z∥ : z ∈ ∂f (x)} = dist(0, ∂f (x)).
Proposition 2.1. [7] Let f, g ∈ Γ(Rd ) and x ∈ Rd be given.
b (x) ⊂ ∂f (x). Moreover, ∂f
(i) We have ∂f b (x) is convex and ∂f (x) is closed (not necessarily
convex). If f is convex, then both sets are reduced to the subdifferential in the sense of
convex analysis.
f
(ii) If the sequences (xk ) an (yk ) are such that xk −→ x, yk → y, and yk ∈ ∂f (xk ) for all k,
then y ∈ ∂f (x).
(iii) The Fermat’s rule remains true: if x is a local minimizer of f , then x is a critical point (or
stationary point) of f , that is, 0 ∈ ∂f (x).
(iv) If g is continuously differentiable, then ∂(f + g)(x) = ∂f (x) + ∇g(x).
(v) We have that ∂f is closed in the sense that if {(xk , yk )} is a sequence in the graph of ∂f ,
f
G(∂f ) := {(z, w) : z ∈ dom(∂f ), w ∈ ∂f (z)}, such that xk −→ x and yk → y, it
follows that (x, y) ∈ G(∂f ).
f
(vi) If xk −→ x and lim inf k→∞ |∂f (xk )| = 0, then x is a critical point of f .
2.2. Kurdyka-Łojasiewicz Property. The Kurdyka-Łojasiewicz property [8, 9] plays a cen-
tral part in the nonconvex optimization theory.
Definition 2.3. [2] We say that a function f ∈ Γ(Rd ) satisfies the Kurdyka-Łojasiewicz
property (KŁP) at x∗ ∈ dom(∂f ) if there exist η ∈ (0, ∞], a neighborhood U of x∗ , and a
continuous concave function φ : [0, η) → R+ such that
(i) φ(0) = 0,
(ii) φ ∈ C 1 (0, η),
(iii) φ′ (t) > 0 for all t ∈ (0, η),
(v) there holds the Kurdyka-Łojasiewicz inequality:
(2.2) φ′ (f (x) − f (x∗ ))|∂f (x)| ≥ 1
for all x ∈ U ∩ {x : f (x∗ ) < f (x) < f (x∗ ) + η}.
Accelerated proximal gradient method 451

We say that f ∈ Γ(Rd ) is a KŁ-function provided it satisfies KŁP at each point x∗ ∈


dom(∂f ).
The KŁ inequality (2.2) (at a single point) can, in some circumstances, be extended to a
compact set, as shown below.
Lemma 2.1. [5, Lemma 6] (Uniformized KŁ property) Let f ∈ Γ(Rd ) and let Ω ⊂ Rd be a
nonempty compact set. Assume that f is constant on Ω and satisfies the KŁ property at each point
of Ω. Then there exist ε > 0 and η > 0, and φ satisfying properties (i)-(iii) of Definition 2.3 such
that for all ū ∈ Ω and all u ∈ Rd with the property:
(2.3) u ∈ {u ∈ Rd : dist(u, Ω) < ε} ∩ {f (ū) < f (u) < f (ū) + η},
the following uniformized KŁ inequality holds:
(2.4) φ′ (f (u) − f (ū))|∂f (u)| ≥ 1.
More discussions on can be found in [1, 2, 7, 4]

2.3. Proximal Mappings.


Definition 2.4. Let f ∈ Γ(Rd ) and let λ > 0. The proximal mapping of f (of index λ) is
defined as
 
1
(2.5) proxλf (x) := arg min f (y) + ∥y − x∥2 : y ∈ Rd , x ∈ Rd .

Note that if f is, in addition, convex, then proxλf is single-valued and well defined
over the entire space Rd . However, in the general nonconvex case, proxλf is set-valued
and may be defined on a subset of Rd (more details can be found [12]).
The following inequality (2.7) is widely used in optimization theory (see [11]). How-
ever, for the sake of completeness, we include a proof.
Lemma 2.2. Assume that f : Rd → R is continuously differentiable and its gradient ∇f is
L-Lipschitz continuous for some constant L ≥ 0:
(2.6) ∥∇f (y) − ∇f (x)∥ ≤ L∥y − x∥ for all x, y ∈ R.
Then we have
L
(2.7) f (y) ≤ f (x) + ⟨∇f (x), y − x⟩ + ∥y − x∥2 for all x, y ∈ R.
2
Proof. Let x, y ∈ R and define a function φ by
φ(t) := f (x + t(y − x)), t ∈ R.

Then φ (t) = ⟨∇f (x + t(y − x)), y − x⟩. It turns out that
Z 1 Z 1

f (y) − f (x) = φ (t)dt = ⟨∇f (x + t(y − x)), y − x⟩dt
0 0
Z 1
= ⟨∇f (x), y − x⟩ + ⟨∇f (x + t(y − x)) − ∇f (x), y − x⟩dt.
0

Using the Lipschitz condition (2.6), we get


Z 1
2
f (y) − f (x) ≤ ⟨∇f (x), y − x⟩ + L∥y − x∥ tdt
0

and the desired inequality (2.7) follows immediately. □


452 H. Wang and H. K. Xu

Lemma 2.3. Let f, g ∈ Γ(Rd ) and set F = f + g. Let λ > 0 be given. Assume that the gradient
∇f of f is L-Lipschitz continuous. Then, for any u ∈ Rd and setting
û := proxλg (u − λ∇f (u)),
we have
 
1 1
(2.8) F (û) ≤ F (u) − − L ∥û − u∥2 .
2 λ
Proof. We have
1
û = arg min g(z) + ∥z − u + λ∇f (u)∥2
z 2λ
1
= arg min g(z) + ∥z − u∥2 + ⟨z − u, ∇f (u)⟩.
z 2λ
Since û is a minimizer of the function
1
(2.9) z 7→ ψ(z) := g(z) + ∥z − u∥2 + ⟨z − u, ∇f (u)⟩

it turns out that
1
(2.10) g(û) + ∥û − u∥2 + ⟨û − u, ∇f (u)⟩ ≤ g(u).

On the other hand, since ∇f is L-Lipschitz, we can use Lemma 2.2 to get the inequality:
L
(2.11) f (û) ≤ f (u) + ⟨∇f (u), û − u⟩ + ∥û − u∥2 .
2
Adding up (2.10) and (2.11) immediately yields (2.8). □
2.4. Convergence Lemma. We also need the following basic result regarding conver-
gence of nonnegative series.
Lemma 2.4. Let {an } be a sequence of real nonnegative numbers such that
(2.12) ak+1 ≤ γak + bk , k ≥ 0,
P∞ P∞
where γ ∈ [0, 1) and bk ≥ 0 such that k=0 bk < ∞. Then k=0 ak < ∞.

3. M AIN R ESULTS
Consider the following composite optimization problem
(3.13) min F (x) := f (x) + g(x),
x∈Rd

where f, g ∈ Γ(Rd ).
The following accelerated proximal gradient (APG) algorithm is introduced by Li, et al
[10, Algorithm 3].

Algorithm 1 (Algorithm APG nonconvex problem)


k
Input: y1 = x0 , βk = k+3 , λ < L1
for k = 1, 2, · · · do
xk = proxλg (yk − λ∇f (yk )).
vk = xk + βk (xk − xk−1 ).
if F (xk ) ≤ F (vk ), then yk+1 = xk ,
else if F (xk ) ≥ F (vk ), then yk+1 = vk .
end if
end for
Accelerated proximal gradient method 453

The following result regarding Algorithm 1 is proved in [10].


Theorem 3.1. [10, Theorem 1] Let the following assumptions be satisfied:
(A1) f, g ∈ Γ0 (Rd ), inf x∈Rd F (x) > −∞, and for each α ∈ R, the sublevel set {x ∈ Rd :
F (x) ≤ α} is bounded;
(A2) f has a continuous gradient ∇f that is L-Lipschitz continuous, i.e.,
∥∇f (x) − ∇f (y)∥ ≤ L∥x − y∥, x, y ∈ Rd .
Let {xk } be a generated by Algorithm 1 with stepsize λ < L1 . Then
(i) {xk } is a bounded sequence;
(ii) The set Ω of limit points of {xk } forms a compact set, on which the objective function F is
constant;
(iii) All elements of Ω are critical points of F .
Below we slightly improve the above algorithm by allowing the stepsizes to depend on
the steps, that is, we take λ := λk , where k is the number of iterations. We also readjust
the parameter βk (in our convergence proof we actually only require βk ≤ 1 − β for some
β ∈ (0, 1)).

Algorithm 2 (Algorithm APG nonconvex problem with variable stepsizes)


1
Input: y1 = x0 , βk = k+1 , λk < L1
for k = 1, 2, · · · do
xk = proxλk g (yk − λk ∇f (yk )).
vk = xk + βk (xk − xk−1 ).
if F (xk ) ≤ F (vk ), then yk+1 = xk ,
else if F (xk ) > F (vk ), then yk+1 = vk .
end if
end for

The main results in this paper show that the conclusions of Theorem 3.1 remain true
for variable stepsizes, and moreover, convergence of the trajectories is guaranteed if, in
addition, the composite function F satisfies the Kurdyka-Łojasiewicz property.
Theorem 3.2. Consider a sequence {xk } generated by Algorithm 2. Assume the conditions (A1)
and (A2) of Theorem 3.1 hold, and in addition, the stepsize sequence {λk } satisfies the property:
1
0 < a ≤ λk ≤ b < for all k. Then the following conclusions hold.
L
(i) {xk } is a bounded sequence;
(ii) The set Ω of limit points of {xk } forms a compact set, on which the objective function F is
constant;
(iii) All elements of Ω are critical points of F .
Moreover, if, in addition, F = f + g is a KL function, then {xk } converges to a critical point of F
with finite length, that is,
X∞
(3.14) ∥xk+1 − xk ∥ < ∞.
k=0

Proof. Apply (2.8) to the case where λ := λk and u := yk to find that


 
1 1
(3.15) F (xk ) ≤ F (yk ) − − L ∥xk − yk ∥2 .
2 λk
1
In particular, F (xk ) ≤ F (yk ) for λk < L.
454 H. Wang and H. K. Xu

It is a straightforward observation from the definition of Algorithm 2 that F (yk+1 ) ≤


F (xk ) which together with (3.15) immediately results in that
(3.16) F (yk+1 ) ≤ F (xk ) ≤ F (yk ) ≤ F (xk−1 ) ≤ · · · ≤ F (y1 ) ≤ F (x0 ).
Consequently, limk→∞ F (xk ) = limk→∞ F (yk ) exists. Moreover, by (A1), we know that
{xk } and {yk } are bounded. Rewrite (3.15) as
 
1 1
− L ∥xk − yk ∥2 ≤ F (yk ) − F (xk ) ≤ F (xk−1 ) − F (xk ).
2 λk
1
Since λk ≤ b < L for all k, this implies that
 X ∞
1 1
−L ∥xk − yk ∥2 ≤ F (x0 ) − lim F (xk ) < ∞.
2 b k→∞
k=1

In particular,
(3.17) lim ∥xk − yk ∥ = 0.
k→∞

Now let Ω be the set of cluster points of {xk }, that is,


Ω ≡ ω({xk }) := {x ∈ Rd : ∃ xki → x}.
The boundedness of {xk } ensures that Ω ̸= ∅, and due to (3.17), we also have Ω = ω({yk }).
Now let x̄ ∈ Ω and let {xki } be a subsequence of {xk } such that xki → x̄. By definition
of the algorithm, we have
1
(3.18) xk = arg min g(z) + ∥z − (yk − λ∇f (yk ))∥2 .
z∈Rd 2λ
By the optimality condition, we obtain
1
0 ∈ ∂g(xk ) + (xk − yk ) + ∇f (yk ).
λk
Equivalently,
1
(3.19) (yk − xk ) − ∇f (yk ) ∈ ∂g(xk ).
λk
Applying (3.19) to the subsequence {ki } we get
1
(3.20) (yki − xki ) − ∇f (yki ) ∈ ∂g(xki ).
λki
With no loss of generality (up to a further convergent subsequence of {λki } if necessary),
we may assume λki → λ̄ ∈ [a, b].
1
Now since xki → x̄, yki → x̄, and (yki − xki ) → 0, we may take the limit in (3.20) as
λki
i → ∞ and by the closedness of the subdifferential ∂g of g to obtain
−∇f (x̄) ∈ ∂g(x̄).
This is rewritten as 0 ∈ ∂F (x̄) = ∇f (x̄) + ∂g(x̄). Hence, x̄ is a critical point of F .
We finally verify that F is constant on Ω; it suffices to show that
(3.21) F (x̄) = lim F (xk ).
k→∞

Here x̄ ∈ Ω and xki → x̄ as above. Since F (x̄) = f (x̄) + g(x̄) and since f is continuous, all
we need to prove is that
lim g(xki ) = g(x̄).
i→∞
Accelerated proximal gradient method 455

On the one hand, from (3.18) we immediately deduce that


1
g(xki ) ≤ g(x̄) + (∥x̄ − yki + λki ∇f (yki )∥2 − ∥xki − yki + λki ∇f (yki )∥2 )
2λki
1
(3.22) = g(x̄) + (∥x̄ − yki ∥2 − ∥xki − yki ∥2 ) + ⟨x̄ − xki , ∇f (yki )⟩.
2λki
Since xki → x̄, ∥xki − yki ∥ → 0, and {λki } is bounded away from 0 from below, it turns
out from (3.22) that lim supi→∞ g(xki ) ≤ g(x̄).
On the other hand, the lower semicontinuity of g implies that g(x̄) ≤ lim inf i→∞ g(xki ).
Consequently, we have verified that limi→∞ g(xki ) = g(x̄) exists. Furthermore, since f
is continuous, we have limk→∞ F (xk ) = limi→∞ F (xki ) = limi→∞ (f (xki ) + g(xki )) =
f (x̄) + g(x̄) = F (x̄). This proves (3.21).
Finally we prove (3.14) under the additional condition that F satisfies the KL property.
Observe that the conclusions in part (ii) guarantee that
(3.23) lim dist(xk , Ω) = 0.
k→∞
As previously, assume xki → x̄; then we have proved that x̄ is a critical point of F .
We may assume xk ̸= yk (since, if xk = yk for some k, xk is a critical point of F and the
iteration process is terminated); hence F (xk ) < F (yk ), and furthermore, F (xk+1 ) < F (xk )
by (3.16). Recall that we have F (x̄) = limk→∞ F (xk ).
By (3.23) we can apply Lemma 2.1 to get
(3.24) φ′ (F (xk ) − F (x̄))|∂F (xk )| ≥ 1
for all k ≥ k0 . Here k0 is big enough so that dist(xk , Ω) < ε for all k ≥ k0 . Before further
proceeding, we notice the following two facts:
2 2
• Fact  k ) ≤ F (yk ) − c1 ∥xk − yk ∥ ≤ F (xk−1 ) − c1 ∥xk − yk ∥ , where c1 =
 1: F (x
1 1
− L > 0. This follows from (3.15) and the fact that λk ≤ b.
2 b
1
• Fact 2: ∥wk ∥ ≤ c2 ∥xk − yk ∥, where wk = ∇f (xk ) − ∇f (yk ) + (yk − xk ) ∈ ∂F (xk ),
λk
1
and c2 = L + . This is due to (3.19) and the facts that ∥∇f (xk ) − ∇f (yk )∥ ≤
a
L∥xk − yk ∥ and λk ≥ a.
Applying (3.24) and Fact 1, we derive that, for k ≥ k0 ,
1 1 1
(3.25) φ′ (F (xk ) − F (x̄)) ≥ ≥ .
∥wk ∥ c2 ∥xk − yk ∥
Since φ is concave, we have the inequality:
φ(x) − φ(y) ≥ φ′ (x)(x − y), x, y ∈ R.
It follows that
φ(F (xk ) − F (x̄)) − φ(F (xk+1 ) − F (x̄)) ≥ φ′ (F (xk ) − F (x̄))(F (xk ) − F (xk+1 ))
≥ φ′ (F (xk ) − F (x̄))c1 ∥xk+1 − yk+1 ∥2 .
This, combining with (3.25), yields
c1 ∥xk+1 − yk+1 ∥2
φ(F (xk ) − F (x̄)) − φ(F (xk+1 ) − F (x̄)) ≥ .
c2 ∥xk − yk ∥
In other words,
∥xk+1 − yk+1 ∥2 c2
(3.26) ≤ [φ(F (xk ) − F (x̄)) − φ(F (xk+1 ) − F (x̄))].
∥xk − yk ∥ c1
456 H. Wang and H. K. Xu

Fix γ ∈ (0, 1). It is then not hard to get from (3.26)


1 c2
(3.27) ∥xk+1 − yk+1 ∥ ≤ γ∥xk − yk ∥ + [φ(F (xk ) − F (x̄)) − φ(F (xk+1 ) − F (x̄)].
γ c1
for all k ≥ 1. By Lemma 2.4, (3.27) guarantees that

X
∥xk − yk ∥ < ∞.
k=1

Note that yk+1 is either xk if F (xk ) ≤ F (vk ) or vk = xk +βk (xk −xk−1 ) if F (xk ) > F (vk ).
In the latter case, we have ∥xk − xk+1 ∥ ≤ ∥yk+1 − xk+1 ∥ + βk ∥xk − xk−1 ∥ and we have
estimates on the partial sums:
k
X k
X k
X
∥xi − xi+1 ∥ ≤ ∥yi+1 − xi+1 ∥ + βi ∥xi − xi−1 ∥.
i=1 i=1 i=1
It turns out that
k−1
X k
X
(1 − βi+1 )∥xi − xi+1 ∥ ≤ ∥yi+1 − xi+1 ∥ + β1 ∥x1 − x0 ∥.
i=1 i=1
1 i+1 2
Since βi+1 = , 1 − βi+1 = ≥ for i ≥ 1. Consequently, we derive from the last
i+2 i+2 3
inequality that
∞ ∞
!
X 3 X
∥xi − xi+1 ∥ ≤ ∥yi+1 − xi+1 ∥ + ∥x1 − x0 ∥ < ∞
i=1
2 i=1
and (3.14) is proved. □

Acknowledgement. We were grateful to the reviewers for their helpful suggestions and
comments which improved the presentation of this manuscript.

R EFERENCES
[1] Attouch, H., Bolte, J., Redont, P. and Soubeyran, A., Proximal alternating minimization and projection methods
for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality, Math. Operations Research,
35 (2010), No. 2, 438–457
[2] Attouch, H., Bolte, J. and Svaiter, B. F., Convergence of descent methods for semi-algebraic and tame problems:
proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods, Math. Program., 137
(2013), 91–129
[3] Beck A. and Teboulle, M., A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J.
Imaging Sci., 2 (2009), No. 1, 183–202
[4] Bolte, J., Daniilidis, A. and Lewis, A., The Łojasiewicz inequality for nonsmooth sub-analytic functions with
applications to subgradient dynamical systems, SIAM J. Optim., 17 (2007), 1205–1223
[5] Bolte, J., Sabach, S. and Teboulle, M., Proximal alternating linearized minimization for nonconvex and nonsmooth
problems, Math. Program., Ser. A , 146 (2014), 459–494
[6] Combettes, P. L. and Wajs, R., Signal recovery by proximal forward-backward splitting, Multiscale Model. Simul.,
4 (2005), No. 4, 1168–1200
[7] Frankel, P., Garrigos, G. and Peypouquet, J., Splitting methods with variable Metric for Kurdyka-Łojasiewicz
functions and general convergence rates, J. Optim. Theory Appl., 65 (2015), No. 3, 874–900
[8] Kurdyka, K., On gradients of functions definable in o-minimal structures, Ann. Inst. Fourier (Grenoble), 48
(1998), No. 3, 769–783
[9] Łojasiewicz, S., Une propriete topologique des sous-ensembles analytiques reels, in: Les Euations aux De-
rivees Partielles, Editions du centre National de la Recherche Scientifique, Paris, 1963, pp. 87–89
[10] Li, Q., Zhou, Y., Liang, Y. and Varshney, P. K., Convergence analysis of proximal gradient with momentum for
nonconvex optimization, in Proceedings of the 34th International Conference on Machine Learning, Sydney,
Australia, PMLR 70, 2017
Accelerated proximal gradient method 457

[11] Nesterov, Y. E., Introductory Lectures on Convex Optimization: a basic course Kluwer Academic Publishers,
Massachusetts, 2004
[12] Rockafellar, R. T. and Wets, R. J.-B., Variational Analysis, Grundlehren der Mathematischen Wissenschaften,
vol. 317, Springer, Berlin, 1998

D EPARTMENT OF M ATHEMATICS
S CHOOL OF S CIENCE
H ANGZHOU D IANZI U NIVERSITY
H ANGZHOU 310018 C HINA
Email address: [email protected]
Email address: [email protected]

You might also like