0% found this document useful (0 votes)

1 views

A note on the accelerated proximal gradient method for nonconvex optimization

This document presents an improvement to the accelerated proximal gradient (APG) method for nonconvex optimization, allowing variable step sizes and proving convergence under the Kurdyka-Łojasiewicz property. The authors build on previous work by Li et al., enhancing the algorithm's results and establishing convergence with finite iterations. The paper includes definitions and lemmas related to nonconvex functions, proximal mappings, and convergence criteria.

Uploaded by

sach.co.quy

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

A note on the accelerated proximal gradient method for nonconvex optimization

Uploaded by

sach.co.quy

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

CARPATHIAN J. MATH. Online version at https://www.carpathian.cunbm.utcluj.

ro/
Volume 34 (2018), No. 3, Print Edition: ISSN 1584 - 2851; Online Edition: ISSN 1843 - 4401
Pages 449 - 457 DOI: https://doi.org/10.37193/CJM.2018.03.22

Dedicated to Professor Yeol Je Cho on the occasion of his retirement

A note on the accelerated proximal gradient method for

nonconvex optimization

H UIJUAN WANG and H ONG -K UN X U

A BSTRACT. We improve a recent accelerated proximal gradient (APG) method in [Li, Q., Zhou, Y., Liang,
Y. and Varshney, P. K., Convergence analysis of proximal gradient with momentum for nonconvex optimization, in
Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017]
for nonconvex optimization by allowing variable stepsizes. We prove the convergence of the APG method
for a composite nonconvex optimization problem under the assumption that the composite objective function
satisfies the Kurdyka-Łojasiewicz property.

1. I NTRODUCTION
In this paper we consider a composite optimization problem of the form
(1.1) min F (x) := f (x) + g(x),
x∈H

where H = Rd is a Euclidean d-space, and f and g are proper, lower-semicontinuous

functions from H to (−∞, ∞].
In the convex case (i.e., f and g are convex functions), the proximal gradient method [6]
can well be used to solve (1.1); moreover, Nesterov’s acceleration technique can be used
to speed up the rate of convergence from O( k1 ) to O( k12 ) [3]. However, this remains open
for nonconvex optimization.
Very recently in [10], Li, et al proposed a new algorithm which is known as an accel-
erated proximal gradient (APG) method which constructs three sequences (xk ), (yk ) and
(vk ) in such a way that xk is produced from yk through the composite of the proximal
mapping of g and the gradient ∇f of f (f is assumed to have Lipschitz continuous gra-
dient), and vk is simply a linear combination of xk and xk−1 . Li, et al proved that their
algorithm guarantees, under certain conditions, that the sequence (xk ) is bounded, each
cluster point of (xk ) is a critical point of F , and F is constant on the set of cluster points
of (xk ). They also obtained errors on the residual of F (xk ) − inf F under the uniformized
Kurdyka-Łojasiewicz property with desingularizing function φ(t) = ctθ , where c is a con-
stant and θ ∈ (0, 1].
We continue working in this line by improving the algorithm and results of Li, et al
[10] in twofold. First we allow the stepsizes to vary with the iteration steps and obtain
the same convergence results of [10, Theorem 3]. Secondly, we prove convergence with
finite length of our algorithm under the Kurdyka-Łojasiewicz property with a genaral
desingularizing function, which is not discussed in Li, et al [10].

Received: 17.09.2017. In revised form: 13.06.2018. Accepted: 15.07.2018

2010 Mathematics Subject Classification. 90C26, 90C30, 90C46.
Key words and phrases. accelerated proximal gradient method, composite nonconvex optimization, Kurdyka-
Łojasiewicz property, subdifferential, critical point.
Corresponding author: Hong-Kun Xu; [email protected]

449
450 H. Wang and H. K. Xu

2. P RELIMINARIES
Let d ≥ 1 be a given integer and consider the Euclidean d-space Rd with inner product
⟨·, ·⟩ and norm ∥ · ∥ (i.e., ∥ · ∥2 ). By Γ(Rd ) we denote the family of all functions f : Rd →
(−∞, ∞] =: R which are proper and lower-semicontinuous (lsc).
2.1. Subdifferential of Nonconvex Functions.
Definition 2.1. Let f ∈ Γ(Rd ) and x ∈ dom(f ) be given. We say that x∗ ∈ Rd is a Fréchet
derivative of f at x if
f (z) − f (x) − ⟨x∗ , z − x⟩
lim inf ≥ 0.
z→x ∥z − x∥
ˆ (x), is said to be the Fréchet differential
The set of Fréchet derivatives of f at x, denoted ∂f
of f at x.
The (Mordukhovich) limiting-subdifferential (or simply, subdifferential) of f at x, de-
noted ∂f (x), is defined as
f
ˆ (xk ) with xk −→
∂f (x) = {x∗ ∈ H : ∃ x∗k → x∗ , x∗k ∈ ∂f x}.
f f
Here “−→” means f -convergence, that is, xk −→ x if and only if xk → x and f (xk ) →
f (x).
Definition 2.2. We say that a point x is a critical point of f if 0 ∈ ∂f (x). The lazy slope of
f at a point x is defined as
|∂f (x)| := inf{∥z∥ : z ∈ ∂f (x)} = dist(0, ∂f (x)).
Proposition 2.1. [7] Let f, g ∈ Γ(Rd ) and x ∈ Rd be given.
b (x) ⊂ ∂f (x). Moreover, ∂f
(i) We have ∂f b (x) is convex and ∂f (x) is closed (not necessarily
convex). If f is convex, then both sets are reduced to the subdifferential in the sense of
convex analysis.
f
(ii) If the sequences (xk ) an (yk ) are such that xk −→ x, yk → y, and yk ∈ ∂f (xk ) for all k,
then y ∈ ∂f (x).
(iii) The Fermat’s rule remains true: if x is a local minimizer of f , then x is a critical point (or
stationary point) of f , that is, 0 ∈ ∂f (x).
(iv) If g is continuously differentiable, then ∂(f + g)(x) = ∂f (x) + ∇g(x).
(v) We have that ∂f is closed in the sense that if {(xk , yk )} is a sequence in the graph of ∂f ,
f
G(∂f ) := {(z, w) : z ∈ dom(∂f ), w ∈ ∂f (z)}, such that xk −→ x and yk → y, it
follows that (x, y) ∈ G(∂f ).
f
(vi) If xk −→ x and lim inf k→∞ |∂f (xk )| = 0, then x is a critical point of f .
2.2. Kurdyka-Łojasiewicz Property. The Kurdyka-Łojasiewicz property [8, 9] plays a cen-
tral part in the nonconvex optimization theory.
Definition 2.3. [2] We say that a function f ∈ Γ(Rd ) satisfies the Kurdyka-Łojasiewicz
property (KŁP) at x∗ ∈ dom(∂f ) if there exist η ∈ (0, ∞], a neighborhood U of x∗ , and a
continuous concave function φ : [0, η) → R+ such that
(i) φ(0) = 0,
(ii) φ ∈ C 1 (0, η),
(iii) φ′ (t) > 0 for all t ∈ (0, η),
(v) there holds the Kurdyka-Łojasiewicz inequality:
(2.2) φ′ (f (x) − f (x∗ ))|∂f (x)| ≥ 1
for all x ∈ U ∩ {x : f (x∗ ) < f (x) < f (x∗ ) + η}.
Accelerated proximal gradient method 451

We say that f ∈ Γ(Rd ) is a KŁ-function provided it satisfies KŁP at each point x∗ ∈

dom(∂f ).
The KŁ inequality (2.2) (at a single point) can, in some circumstances, be extended to a
compact set, as shown below.
Lemma 2.1. [5, Lemma 6] (Uniformized KŁ property) Let f ∈ Γ(Rd ) and let Ω ⊂ Rd be a
nonempty compact set. Assume that f is constant on Ω and satisfies the KŁ property at each point
of Ω. Then there exist ε > 0 and η > 0, and φ satisfying properties (i)-(iii) of Definition 2.3 such
that for all ū ∈ Ω and all u ∈ Rd with the property:
(2.3) u ∈ {u ∈ Rd : dist(u, Ω) < ε} ∩ {f (ū) < f (u) < f (ū) + η},
the following uniformized KŁ inequality holds:
(2.4) φ′ (f (u) − f (ū))|∂f (u)| ≥ 1.
More discussions on can be found in [1, 2, 7, 4]

2.3. Proximal Mappings.

Definition 2.4. Let f ∈ Γ(Rd ) and let λ > 0. The proximal mapping of f (of index λ) is
defined as

1
(2.5) proxλf (x) := arg min f (y) + ∥y − x∥2 : y ∈ Rd , x ∈ Rd .
2λ
Note that if f is, in addition, convex, then proxλf is single-valued and well defined
over the entire space Rd . However, in the general nonconvex case, proxλf is set-valued
and may be defined on a subset of Rd (more details can be found [12]).
The following inequality (2.7) is widely used in optimization theory (see [11]). How-
ever, for the sake of completeness, we include a proof.
Lemma 2.2. Assume that f : Rd → R is continuously differentiable and its gradient ∇f is
L-Lipschitz continuous for some constant L ≥ 0:
(2.6) ∥∇f (y) − ∇f (x)∥ ≤ L∥y − x∥ for all x, y ∈ R.
Then we have
L
(2.7) f (y) ≤ f (x) + ⟨∇f (x), y − x⟩ + ∥y − x∥2 for all x, y ∈ R.
2
Proof. Let x, y ∈ R and define a function φ by
φ(t) := f (x + t(y − x)), t ∈ R.
′
Then φ (t) = ⟨∇f (x + t(y − x)), y − x⟩. It turns out that
Z 1 Z 1
′
f (y) − f (x) = φ (t)dt = ⟨∇f (x + t(y − x)), y − x⟩dt
0 0
Z 1
= ⟨∇f (x), y − x⟩ + ⟨∇f (x + t(y − x)) − ∇f (x), y − x⟩dt.
0

Using the Lipschitz condition (2.6), we get

Z 1
2
f (y) − f (x) ≤ ⟨∇f (x), y − x⟩ + L∥y − x∥ tdt
0

and the desired inequality (2.7) follows immediately. □

452 H. Wang and H. K. Xu

Lemma 2.3. Let f, g ∈ Γ(Rd ) and set F = f + g. Let λ > 0 be given. Assume that the gradient
∇f of f is L-Lipschitz continuous. Then, for any u ∈ Rd and setting
û := proxλg (u − λ∇f (u)),
we have

1 1
(2.8) F (û) ≤ F (u) − − L ∥û − u∥2 .
2 λ
Proof. We have
1
û = arg min g(z) + ∥z − u + λ∇f (u)∥2
z 2λ
1
= arg min g(z) + ∥z − u∥2 + ⟨z − u, ∇f (u)⟩.
z 2λ
Since û is a minimizer of the function
1
(2.9) z 7→ ψ(z) := g(z) + ∥z − u∥2 + ⟨z − u, ∇f (u)⟩
2λ
it turns out that
1
(2.10) g(û) + ∥û − u∥2 + ⟨û − u, ∇f (u)⟩ ≤ g(u).
2λ
On the other hand, since ∇f is L-Lipschitz, we can use Lemma 2.2 to get the inequality:
L
(2.11) f (û) ≤ f (u) + ⟨∇f (u), û − u⟩ + ∥û − u∥2 .
2
Adding up (2.10) and (2.11) immediately yields (2.8). □
2.4. Convergence Lemma. We also need the following basic result regarding conver-
gence of nonnegative series.
Lemma 2.4. Let {an } be a sequence of real nonnegative numbers such that
(2.12) ak+1 ≤ γak + bk , k ≥ 0,
P∞ P∞
where γ ∈ [0, 1) and bk ≥ 0 such that k=0 bk < ∞. Then k=0 ak < ∞.

3. M AIN R ESULTS
Consider the following composite optimization problem
(3.13) min F (x) := f (x) + g(x),
x∈Rd

where f, g ∈ Γ(Rd ).
The following accelerated proximal gradient (APG) algorithm is introduced by Li, et al
[10, Algorithm 3].

Algorithm 1 (Algorithm APG nonconvex problem)

k
Input: y1 = x0 , βk = k+3 , λ < L1
for k = 1, 2, · · · do
xk = proxλg (yk − λ∇f (yk )).
vk = xk + βk (xk − xk−1 ).
if F (xk ) ≤ F (vk ), then yk+1 = xk ,
else if F (xk ) ≥ F (vk ), then yk+1 = vk .
end if
end for
Accelerated proximal gradient method 453

The following result regarding Algorithm 1 is proved in [10].

Theorem 3.1. [10, Theorem 1] Let the following assumptions be satisfied:
(A1) f, g ∈ Γ0 (Rd ), inf x∈Rd F (x) > −∞, and for each α ∈ R, the sublevel set {x ∈ Rd :
F (x) ≤ α} is bounded;
(A2) f has a continuous gradient ∇f that is L-Lipschitz continuous, i.e.,
∥∇f (x) − ∇f (y)∥ ≤ L∥x − y∥, x, y ∈ Rd .
Let {xk } be a generated by Algorithm 1 with stepsize λ < L1 . Then
(i) {xk } is a bounded sequence;
(ii) The set Ω of limit points of {xk } forms a compact set, on which the objective function F is
constant;
(iii) All elements of Ω are critical points of F .
Below we slightly improve the above algorithm by allowing the stepsizes to depend on
the steps, that is, we take λ := λk , where k is the number of iterations. We also readjust
the parameter βk (in our convergence proof we actually only require βk ≤ 1 − β for some
β ∈ (0, 1)).

Algorithm 2 (Algorithm APG nonconvex problem with variable stepsizes)

1
Input: y1 = x0 , βk = k+1 , λk < L1
for k = 1, 2, · · · do
xk = proxλk g (yk − λk ∇f (yk )).
vk = xk + βk (xk − xk−1 ).
if F (xk ) ≤ F (vk ), then yk+1 = xk ,
else if F (xk ) > F (vk ), then yk+1 = vk .
end if
end for

The main results in this paper show that the conclusions of Theorem 3.1 remain true
for variable stepsizes, and moreover, convergence of the trajectories is guaranteed if, in
addition, the composite function F satisfies the Kurdyka-Łojasiewicz property.
Theorem 3.2. Consider a sequence {xk } generated by Algorithm 2. Assume the conditions (A1)
and (A2) of Theorem 3.1 hold, and in addition, the stepsize sequence {λk } satisfies the property:
1
0 < a ≤ λk ≤ b < for all k. Then the following conclusions hold.
L
(i) {xk } is a bounded sequence;
(ii) The set Ω of limit points of {xk } forms a compact set, on which the objective function F is
constant;
(iii) All elements of Ω are critical points of F .
Moreover, if, in addition, F = f + g is a KL function, then {xk } converges to a critical point of F
with finite length, that is,
X∞
(3.14) ∥xk+1 − xk ∥ < ∞.
k=0

Proof. Apply (2.8) to the case where λ := λk and u := yk to find that

1 1
(3.15) F (xk ) ≤ F (yk ) − − L ∥xk − yk ∥2 .
2 λk
1
In particular, F (xk ) ≤ F (yk ) for λk < L.
454 H. Wang and H. K. Xu

It is a straightforward observation from the definition of Algorithm 2 that F (yk+1 ) ≤

F (xk ) which together with (3.15) immediately results in that
(3.16) F (yk+1 ) ≤ F (xk ) ≤ F (yk ) ≤ F (xk−1 ) ≤ · · · ≤ F (y1 ) ≤ F (x0 ).
Consequently, limk→∞ F (xk ) = limk→∞ F (yk ) exists. Moreover, by (A1), we know that
{xk } and {yk } are bounded. Rewrite (3.15) as

1 1
− L ∥xk − yk ∥2 ≤ F (yk ) − F (xk ) ≤ F (xk−1 ) − F (xk ).
2 λk
1
Since λk ≤ b < L for all k, this implies that
X ∞
1 1
−L ∥xk − yk ∥2 ≤ F (x0 ) − lim F (xk ) < ∞.
2 b k→∞
k=1

In particular,
(3.17) lim ∥xk − yk ∥ = 0.
k→∞

Now let Ω be the set of cluster points of {xk }, that is,

Ω ≡ ω({xk }) := {x ∈ Rd : ∃ xki → x}.
The boundedness of {xk } ensures that Ω ̸= ∅, and due to (3.17), we also have Ω = ω({yk }).
Now let x̄ ∈ Ω and let {xki } be a subsequence of {xk } such that xki → x̄. By definition
of the algorithm, we have
1
(3.18) xk = arg min g(z) + ∥z − (yk − λ∇f (yk ))∥2 .
z∈Rd 2λ
By the optimality condition, we obtain
1
0 ∈ ∂g(xk ) + (xk − yk ) + ∇f (yk ).
λk
Equivalently,
1
(3.19) (yk − xk ) − ∇f (yk ) ∈ ∂g(xk ).
λk
Applying (3.19) to the subsequence {ki } we get
1
(3.20) (yki − xki ) − ∇f (yki ) ∈ ∂g(xki ).
λki
With no loss of generality (up to a further convergent subsequence of {λki } if necessary),
we may assume λki → λ̄ ∈ [a, b].
1
Now since xki → x̄, yki → x̄, and (yki − xki ) → 0, we may take the limit in (3.20) as
λki
i → ∞ and by the closedness of the subdifferential ∂g of g to obtain
−∇f (x̄) ∈ ∂g(x̄).
This is rewritten as 0 ∈ ∂F (x̄) = ∇f (x̄) + ∂g(x̄). Hence, x̄ is a critical point of F .
We finally verify that F is constant on Ω; it suffices to show that
(3.21) F (x̄) = lim F (xk ).
k→∞

Here x̄ ∈ Ω and xki → x̄ as above. Since F (x̄) = f (x̄) + g(x̄) and since f is continuous, all
we need to prove is that
lim g(xki ) = g(x̄).
i→∞
Accelerated proximal gradient method 455

On the one hand, from (3.18) we immediately deduce that

1
g(xki ) ≤ g(x̄) + (∥x̄ − yki + λki ∇f (yki )∥2 − ∥xki − yki + λki ∇f (yki )∥2 )
2λki
1
(3.22) = g(x̄) + (∥x̄ − yki ∥2 − ∥xki − yki ∥2 ) + ⟨x̄ − xki , ∇f (yki )⟩.
2λki
Since xki → x̄, ∥xki − yki ∥ → 0, and {λki } is bounded away from 0 from below, it turns
out from (3.22) that lim supi→∞ g(xki ) ≤ g(x̄).
On the other hand, the lower semicontinuity of g implies that g(x̄) ≤ lim inf i→∞ g(xki ).
Consequently, we have verified that limi→∞ g(xki ) = g(x̄) exists. Furthermore, since f
is continuous, we have limk→∞ F (xk ) = limi→∞ F (xki ) = limi→∞ (f (xki ) + g(xki )) =
f (x̄) + g(x̄) = F (x̄). This proves (3.21).
Finally we prove (3.14) under the additional condition that F satisfies the KL property.
Observe that the conclusions in part (ii) guarantee that
(3.23) lim dist(xk , Ω) = 0.
k→∞
As previously, assume xki → x̄; then we have proved that x̄ is a critical point of F .
We may assume xk ̸= yk (since, if xk = yk for some k, xk is a critical point of F and the
iteration process is terminated); hence F (xk ) < F (yk ), and furthermore, F (xk+1 ) < F (xk )
by (3.16). Recall that we have F (x̄) = limk→∞ F (xk ).
By (3.23) we can apply Lemma 2.1 to get
(3.24) φ′ (F (xk ) − F (x̄))|∂F (xk )| ≥ 1
for all k ≥ k0 . Here k0 is big enough so that dist(xk , Ω) < ε for all k ≥ k0 . Before further
proceeding, we notice the following two facts:
2 2
• Fact k ) ≤ F (yk ) − c1 ∥xk − yk ∥ ≤ F (xk−1 ) − c1 ∥xk − yk ∥ , where c1 =
1: F (x
1 1
− L > 0. This follows from (3.15) and the fact that λk ≤ b.
2 b
1
• Fact 2: ∥wk ∥ ≤ c2 ∥xk − yk ∥, where wk = ∇f (xk ) − ∇f (yk ) + (yk − xk ) ∈ ∂F (xk ),
λk
1
and c2 = L + . This is due to (3.19) and the facts that ∥∇f (xk ) − ∇f (yk )∥ ≤
a
L∥xk − yk ∥ and λk ≥ a.
Applying (3.24) and Fact 1, we derive that, for k ≥ k0 ,
1 1 1
(3.25) φ′ (F (xk ) − F (x̄)) ≥ ≥ .
∥wk ∥ c2 ∥xk − yk ∥
Since φ is concave, we have the inequality:
φ(x) − φ(y) ≥ φ′ (x)(x − y), x, y ∈ R.
It follows that
φ(F (xk ) − F (x̄)) − φ(F (xk+1 ) − F (x̄)) ≥ φ′ (F (xk ) − F (x̄))(F (xk ) − F (xk+1 ))
≥ φ′ (F (xk ) − F (x̄))c1 ∥xk+1 − yk+1 ∥2 .
This, combining with (3.25), yields
c1 ∥xk+1 − yk+1 ∥2
φ(F (xk ) − F (x̄)) − φ(F (xk+1 ) − F (x̄)) ≥ .
c2 ∥xk − yk ∥
In other words,
∥xk+1 − yk+1 ∥2 c2
(3.26) ≤ [φ(F (xk ) − F (x̄)) − φ(F (xk+1 ) − F (x̄))].
∥xk − yk ∥ c1
456 H. Wang and H. K. Xu

Fix γ ∈ (0, 1). It is then not hard to get from (3.26)

1 c2
(3.27) ∥xk+1 − yk+1 ∥ ≤ γ∥xk − yk ∥ + [φ(F (xk ) − F (x̄)) − φ(F (xk+1 ) − F (x̄)].
γ c1
for all k ≥ 1. By Lemma 2.4, (3.27) guarantees that
∞
X
∥xk − yk ∥ < ∞.
k=1

Note that yk+1 is either xk if F (xk ) ≤ F (vk ) or vk = xk +βk (xk −xk−1 ) if F (xk ) > F (vk ).
In the latter case, we have ∥xk − xk+1 ∥ ≤ ∥yk+1 − xk+1 ∥ + βk ∥xk − xk−1 ∥ and we have
estimates on the partial sums:
k
X k
X k
X
∥xi − xi+1 ∥ ≤ ∥yi+1 − xi+1 ∥ + βi ∥xi − xi−1 ∥.
i=1 i=1 i=1
It turns out that
k−1
X k
X
(1 − βi+1 )∥xi − xi+1 ∥ ≤ ∥yi+1 − xi+1 ∥ + β1 ∥x1 − x0 ∥.
i=1 i=1
1 i+1 2
Since βi+1 = , 1 − βi+1 = ≥ for i ≥ 1. Consequently, we derive from the last
i+2 i+2 3
inequality that
∞ ∞
!
X 3 X
∥xi − xi+1 ∥ ≤ ∥yi+1 − xi+1 ∥ + ∥x1 − x0 ∥ < ∞
i=1
2 i=1
and (3.14) is proved. □

Acknowledgement. We were grateful to the reviewers for their helpful suggestions and
comments which improved the presentation of this manuscript.

R EFERENCES
[1] Attouch, H., Bolte, J., Redont, P. and Soubeyran, A., Proximal alternating minimization and projection methods
for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality, Math. Operations Research,
35 (2010), No. 2, 438–457
[2] Attouch, H., Bolte, J. and Svaiter, B. F., Convergence of descent methods for semi-algebraic and tame problems:
proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods, Math. Program., 137
(2013), 91–129
[3] Beck A. and Teboulle, M., A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J.
Imaging Sci., 2 (2009), No. 1, 183–202
[4] Bolte, J., Daniilidis, A. and Lewis, A., The Łojasiewicz inequality for nonsmooth sub-analytic functions with
applications to subgradient dynamical systems, SIAM J. Optim., 17 (2007), 1205–1223
[5] Bolte, J., Sabach, S. and Teboulle, M., Proximal alternating linearized minimization for nonconvex and nonsmooth
problems, Math. Program., Ser. A , 146 (2014), 459–494
[6] Combettes, P. L. and Wajs, R., Signal recovery by proximal forward-backward splitting, Multiscale Model. Simul.,
4 (2005), No. 4, 1168–1200
[7] Frankel, P., Garrigos, G. and Peypouquet, J., Splitting methods with variable Metric for Kurdyka-Łojasiewicz
functions and general convergence rates, J. Optim. Theory Appl., 65 (2015), No. 3, 874–900
[8] Kurdyka, K., On gradients of functions definable in o-minimal structures, Ann. Inst. Fourier (Grenoble), 48
(1998), No. 3, 769–783
[9] Łojasiewicz, S., Une propriete topologique des sous-ensembles analytiques reels, in: Les Euations aux De-
rivees Partielles, Editions du centre National de la Recherche Scientifique, Paris, 1963, pp. 87–89
[10] Li, Q., Zhou, Y., Liang, Y. and Varshney, P. K., Convergence analysis of proximal gradient with momentum for
nonconvex optimization, in Proceedings of the 34th International Conference on Machine Learning, Sydney,
Australia, PMLR 70, 2017
Accelerated proximal gradient method 457

[11] Nesterov, Y. E., Introductory Lectures on Convex Optimization: a basic course Kluwer Academic Publishers,
Massachusetts, 2004
[12] Rockafellar, R. T. and Wets, R. J.-B., Variational Analysis, Grundlehren der Mathematischen Wissenschaften,
vol. 317, Springer, Berlin, 1998

D EPARTMENT OF M ATHEMATICS
S CHOOL OF S CIENCE
H ANGZHOU D IANZI U NIVERSITY
H ANGZHOU 310018 C HINA
Email address: [email protected]
Email address: [email protected]

ComfortStar CCI-CHI-2013
No ratings yet
ComfortStar CCI-CHI-2013
5 pages
Glider Report Team Anil Vu
No ratings yet
Glider Report Team Anil Vu
55 pages
Practical RF Circuit Design
100% (3)
Practical RF Circuit Design
47 pages
Howrah Bridge Study
85% (13)
Howrah Bridge Study
9 pages
MATH Grade 7 Quarter 1 Module 1 Week 1
100% (6)
MATH Grade 7 Quarter 1 Module 1 Week 1
17 pages
Partial Differential Equations Notes
No ratings yet
Partial Differential Equations Notes
13 pages
Lecture Notes On Differentiability
No ratings yet
Lecture Notes On Differentiability
14 pages
O4MD 02 Foundations
No ratings yet
O4MD 02 Foundations
8 pages
18.657: Mathematics of Machine Learning: S R LR LK K
No ratings yet
18.657: Mathematics of Machine Learning: S R LR LK K
9 pages
Lecture 7
No ratings yet
Lecture 7
4 pages
Convex Optimization
No ratings yet
Convex Optimization
108 pages
(Strong, Strict) Convexity (Princeton. Lecture 14 Pages. ORF523 - Lec7)
No ratings yet
(Strong, Strict) Convexity (Princeton. Lecture 14 Pages. ORF523 - Lec7)
14 pages
1 Theory of Convex Functions
No ratings yet
1 Theory of Convex Functions
14 pages
Ritter Analysis Handout 08
No ratings yet
Ritter Analysis Handout 08
4 pages
Basics of Wavelets: Isye8843A, Brani Vidakovic Handout 20
No ratings yet
Basics of Wavelets: Isye8843A, Brani Vidakovic Handout 20
27 pages
Inversefunction THM
No ratings yet
Inversefunction THM
3 pages
The Set F May Be Specified by Equations of The Form (1.1) And/or (1.2) - Alternatively, The Term Global Minimiser Can Be Used To Denote A Point at Which The Function F Attains Its Global Minimum
No ratings yet
The Set F May Be Specified by Equations of The Form (1.1) And/or (1.2) - Alternatively, The Term Global Minimiser Can Be Used To Denote A Point at Which The Function F Attains Its Global Minimum
4 pages
2021 Spring Nonlinear Techniques For Nonlinear Dispersive PDEs 4
No ratings yet
2021 Spring Nonlinear Techniques For Nonlinear Dispersive PDEs 4
10 pages
Convexity, Lipschitzness, Smoothness
No ratings yet
Convexity, Lipschitzness, Smoothness
5 pages
The Stone-Weierstrass Theorem
No ratings yet
The Stone-Weierstrass Theorem
6 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
Learning Hessian Matrix PDF
No ratings yet
Learning Hessian Matrix PDF
100 pages
Homework 6
No ratings yet
Homework 6
3 pages
Lecture 6
No ratings yet
Lecture 6
3 pages
HW2 Sol
No ratings yet
HW2 Sol
4 pages
Caltech Vector Calculus 7
No ratings yet
Caltech Vector Calculus 7
8 pages
Homework_4_2024
No ratings yet
Homework_4_2024
3 pages
Chapter9-18
No ratings yet
Chapter9-18
23 pages
Some Special Class of Functions in Optimization: Convex, Lipschitz, Strongly Convex
No ratings yet
Some Special Class of Functions in Optimization: Convex, Lipschitz, Strongly Convex
17 pages
Distribution Theory Generalized Functions Notes: Ivan F Wilde
No ratings yet
Distribution Theory Generalized Functions Notes: Ivan F Wilde
66 pages
Ondiculas en Espacios de Hilbert
No ratings yet
Ondiculas en Espacios de Hilbert
35 pages
Derivatives
No ratings yet
Derivatives
20 pages
Problem Sheet 1
No ratings yet
Problem Sheet 1
5 pages
GP in The Polydisk - 5
No ratings yet
GP in The Polydisk - 5
14 pages
Calculus Notes T1 1112
No ratings yet
Calculus Notes T1 1112
51 pages
10.3934_naco.2024001
No ratings yet
10.3934_naco.2024001
11 pages
Notes On Subgradients
No ratings yet
Notes On Subgradients
13 pages
Lecture_3_taxonomy_taylor
No ratings yet
Lecture_3_taxonomy_taylor
4 pages
Gemini 5
No ratings yet
Gemini 5
2 pages
fa5_compact_subsets
No ratings yet
fa5_compact_subsets
5 pages
II-5 Continuity
No ratings yet
II-5 Continuity
9 pages
Inverse and Implicit Function Theorem
No ratings yet
Inverse and Implicit Function Theorem
4 pages
Taylor Series Notes
No ratings yet
Taylor Series Notes
5 pages
Anal21.Dvi-calculus On Banach Space
No ratings yet
Anal21.Dvi-calculus On Banach Space
21 pages
ch2 Revised
No ratings yet
ch2 Revised
17 pages
Homework 7
No ratings yet
Homework 7
3 pages
Directionally Lipschitzian
No ratings yet
Directionally Lipschitzian
12 pages
7-Convex Optimization
No ratings yet
7-Convex Optimization
34 pages
Ch1part1 2019
No ratings yet
Ch1part1 2019
29 pages
Math6338 hw8 PDF
No ratings yet
Math6338 hw8 PDF
6 pages
斯坦福大学机器学习数学基础 41-48
No ratings yet
斯坦福大学机器学习数学基础 41-48
8 pages
Ds 6
No ratings yet
Ds 6
10 pages
Math
No ratings yet
Math
14 pages
Div, Grad, and Curl
No ratings yet
Div, Grad, and Curl
6 pages
quals-F24-no-solutions
No ratings yet
quals-F24-no-solutions
7 pages
Tanvi Quaternions
No ratings yet
Tanvi Quaternions
12 pages
Exam Prep
No ratings yet
Exam Prep
29 pages
gadhi_2004 Sufficient Optimality Condition for Vector opt prob under D.C. data
No ratings yet
gadhi_2004 Sufficient Optimality Condition for Vector opt prob under D.C. data
12 pages
Llerena D
No ratings yet
Llerena D
14 pages
Epigrafo PDF
No ratings yet
Epigrafo PDF
12 pages
Gadhi - 2005 Optimality Conditions For D.C. Vector Optimization Prob Under Reverse Convex Constraints
No ratings yet
Gadhi - 2005 Optimality Conditions For D.C. Vector Optimization Prob Under Reverse Convex Constraints
14 pages
1190 543 PB
No ratings yet
1190 543 PB
17 pages
Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative
No ratings yet
Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative
39 pages
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
XC90 Heated Oxygen Sensors Replace.
No ratings yet
XC90 Heated Oxygen Sensors Replace.
6 pages
Jet and Rocket Propulsion PPT Cbit
No ratings yet
Jet and Rocket Propulsion PPT Cbit
44 pages
Introduction To Analog and Digital Communication: Chapter 10
No ratings yet
Introduction To Analog and Digital Communication: Chapter 10
77 pages
Patterns in Nature
No ratings yet
Patterns in Nature
3 pages
The Puzzle of The Unmarked Clock and The New Rational Reflection Principle
No ratings yet
The Puzzle of The Unmarked Clock and The New Rational Reflection Principle
13 pages
Algorithm Complexity
No ratings yet
Algorithm Complexity
35 pages
General Physiology Test Paper and answer key Batch 2024
No ratings yet
General Physiology Test Paper and answer key Batch 2024
2 pages
Sparcast LW 50
No ratings yet
Sparcast LW 50
1 page
Yang Et Al. - 2020 - Experimental Study On Single-Phase Hybrid Microcha
No ratings yet
Yang Et Al. - 2020 - Experimental Study On Single-Phase Hybrid Microcha
11 pages
9-Bis-New-for-VGT Latin Drums
No ratings yet
9-Bis-New-for-VGT Latin Drums
1 page
9 Chemistry For Engineers Soil
No ratings yet
9 Chemistry For Engineers Soil
21 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Math 215 Lecture Notes
No ratings yet
Math 215 Lecture Notes
90 pages
Ch-3 Mode Choise
No ratings yet
Ch-3 Mode Choise
24 pages
Oil Recovery by Imbibition Austin and Buds Water Flooding in The Formations
No ratings yet
Oil Recovery by Imbibition Austin and Buds Water Flooding in The Formations
7 pages
Enhancing IMG
No ratings yet
Enhancing IMG
14 pages
Grade 7 Science Review Quiz
No ratings yet
Grade 7 Science Review Quiz
3 pages
B1 RC18 User Guide
No ratings yet
B1 RC18 User Guide
20 pages
Grade Boundaries - Edexcel Maths A-Level PDF
No ratings yet
Grade Boundaries - Edexcel Maths A-Level PDF
11 pages
LT Motor Datasheet
100% (1)
LT Motor Datasheet
4 pages
Class 9 B
No ratings yet
Class 9 B
2 pages
Motherboard Enclosure Design Guide
No ratings yet
Motherboard Enclosure Design Guide
14 pages
Half-Diminished Seventh Chord - Wikipedia
No ratings yet
Half-Diminished Seventh Chord - Wikipedia
22 pages
Us - Army.music - Course Jazz - Harmony.ii - Mu3322
100% (1)
Us - Army.music - Course Jazz - Harmony.ii - Mu3322
202 pages

A note on the accelerated proximal gradient method for nonconvex optimization

Uploaded by

A note on the accelerated proximal gradient method for nonconvex optimization

Uploaded by

CARPATHIAN J. MATH. Online version at https://www.carpathian.cunbm.utcluj.

Dedicated to Professor Yeol Je Cho on the occasion of his retirement

A note on the accelerated proximal gradient method for

H UIJUAN WANG and H ONG -K UN X U

where H = Rd is a Euclidean d-space, and f and g are proper, lower-semicontinuous

Received: 17.09.2017. In revised form: 13.06.2018. Accepted: 15.07.2018

We say that f ∈ Γ(Rd ) is a KŁ-function provided it satisfies KŁP at each point x∗ ∈

2.3. Proximal Mappings.

Using the Lipschitz condition (2.6), we get

and the desired inequality (2.7) follows immediately. □

Algorithm 1 (Algorithm APG nonconvex problem)

The following result regarding Algorithm 1 is proved in [10].

Algorithm 2 (Algorithm APG nonconvex problem with variable stepsizes)

Proof. Apply (2.8) to the case where λ := λk and u := yk to find that

It is a straightforward observation from the definition of Algorithm 2 that F (yk+1 ) ≤

Now let Ω be the set of cluster points of {xk }, that is,

On the one hand, from (3.18) we immediately deduce that

Fix γ ∈ (0, 1). It is then not hard to get from (3.26)

You might also like