0% found this document useful (0 votes)

34 views19 pages

Bellocruz 2016

This document summarizes a research paper on a proximal subgradient splitting method for minimizing the sum of two nonsmooth convex functions. The method extends the classical subgradient iteration and proximal forward-backward splitting iteration to problems where one or both functions are nondifferentiable. It presents the proximal subgradient splitting iteration and establishes its weak convergence under suitable assumptions, even when the functions are nondifferentiable. The method generalizes prior approaches to important classes of nonsmooth optimization problems.

Uploaded by

zhongyu xia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views19 pages

Bellocruz 2016

Uploaded by

zhongyu xia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Set-Valued Var.

Anal
DOI 10.1007/s11228-016-0376-5

On Proximal Subgradient Splitting Method

for Minimizing the sum of two Nonsmooth
Convex Functions

José Yunier Bello Cruz1

Received: 3 May 2015 / Accepted: 16 May 2016

Abstract In this paper we present a variant of the proximal forward-backward splitting

iteration for solving nonsmooth optimization problems in Hilbert spaces, when the objec-
tive function is the sum of two nondifferentiable convex functions. The proposed iteration,
which will be called Proximal Subgradient Splitting Method, extends the classical subgra-
dient iteration for important classes of problems, exploiting the additive structure of the
objective function. The weak convergence of the generated sequence was established using
different stepsizes and under suitable assumptions. Moreover, we analyze the complexity of
the iterates.

Keywords Convex problems · Nonsmooth optimization problems · Proximal

forward-backward splitting iteration · Subgradient method

Mathematics Subject Classification (2010) 65K05 · 90C25 · 90C30

1 Introduction

The purpose of this paper is to study the convergence properties of a variant of the proximal
forward-backward splitting method for solving the following optimization problem:
min f (x) + g(x) s.t. x ∈ H, (1)
where H is a nontrivial real Hilbert space, and f : H → R := R ∪ {+∞} and g : H → R
are two proper lower semicontinuous and convex functions. We are interested in the case
where both functions f and g are nondifferentiable, and when the domain of f contains
the domain of g. The solution set of this problem will be denoted by S∗ , which is a closed

José Yunier Bello Cruz

[email protected]

1 Department of Mathematical Sciences, Northern Illinois University, DeKalb, IL 60115, USA

J. Y. B. Cruz

and convex subset of the domain of g. Problem (1) has recently received much attention
from the optimization community due to its broad applications to several different areas
such as control, signal processing, system identification, machine learning and restoration
of images; see, for instance, [18, 19, 24, 32] and the references therein.
A special case of problem (1) is the nonsmooth constrained optimization problem, taking
g = δC where δC is the indicator function of a nonempty closed and convex set C in H,
defined by δC (y) := 0, if y ∈ C and +∞, otherwise. Then, problem (1) reduces to the
constrained minimization problem
min f (x) s.t. x ∈ C. (2)
Another important case of problem (1), which has had much interest in signal denoising
and data mining, is the following optimization problem with 1 -regularization
min f (x) + λx1 s.t. x ∈ H, (3)
where λ > 0 and the norm · 1 is used to induce the sparsity in the solutions. Moreover,
problem (3) covers the important and the well-studied 1 -regularized least square minimiza-
tion problem, when H = Rn and f (x) = Ax − b22 where A ∈ Rm×n , m << n, and
b ∈ Rm , which is just a convex approximation of the very famous 0 -minimization prob-
lem; see [12]. Recently, this problem became popular in signal processing and statistical
inference; see, for instance, [23, 43].
We focus here on the so-called proximal forward-backward splitting iteration [32], which
contains a forward gradient step of f (an explicit step) followed by a backward proximal
step of g (an implicit step). The main idea of our approach consists of replacing, in the
forward step of the proximal forward-backward splitting iteration, the gradient of f by a
subgradient of f (note that here f is assumed nondifferentiable in general). In the particular
case that g is the indicator function, the proposed iteration reduces to the classical projected
subgradient iteration.
To describe and motivate our iteration, first we recall the definition of the so-called prox-
imal operator as proxg : H → H associated to g a proper lower semicontinuous convex
function, where proxg (z), z ∈ H is the unique solution of the following strongly convex
optimization problem
1
min g(y) + y − z2 s.t. y ∈ H. (4)
2
√
Note that the norm · is induced by this inner product of H, i.e., x := x, x

for all x ∈ H. The proximal operator proxg is well-defined and has many attractive prop-
erties, e.g., it is continuous and firmly nonexpansive, i.e., for all x, y ∈ H, proxg (x) −
proxg (y)2 ≤ x − y2 − [x − proxg (x)] − [y − proxg (y)]2 . This nice property can be
used to construct algorithms to solve optimization problems [39]; for other properties and
algebraic rules see [3, 18, 19]. If g = δC is the indicator function, the orthogonal projection
onto C, PC (x) := {y ∈ C : x − y = dist(x, C)} is the same as proxδC (x) for all x ∈ H
[2]. For an exhaustive discussion about the evaluation of the proximity operator of a wide
variety of functions see Section 6 of [32]. Now, let us recall the definition of the subdiffer-
ential operator ∂g : H ⇒ H by ∂g(x) := {w ∈ H : g(y) ≥ g(x) + w, y − x
, ∀ y ∈ H} .
We also present the relation of the proximal operator proxαg with the subdifferential oper-
ator ∂g, i.e., proxαg = (Id + α∂g)−1 and as a direct consequence of the first optimality
condition of (4), we have the following useful inclusion:
z − proxαg (z)
∈ ∂g(proxαg (z)), (5)
α
On Proximal Subgradient Splitting Method for Minimizing the sum...

for any z ∈ H and α > 0. The iteration proposed in this paper, called Proximal Subgradient
Splitting Method, is motivated by the well-known fact that x ∈ S∗ if and only if there exists
u ∈ ∂f (x) such that x = proxαg (x − αu). Thus, the iteration generalizes the proximal
forward-backward splitting iteration for the differentiable case, as a fixed point iteration of
the above equation, which is defined as follows: starting at x 0 belonging to the domain of
g, set
x k+1 = proxαk g (x k − αk uk ), (6)

where uk ∈ ∂f (x k ) and the stepsize αk is positive for all k ∈ N. Iteration (6) recovers the
classical subgradient iteration [38], when g = 0, and the proximal point iteration [39], when
f = 0. Moreover, it covers important situations in which f is nondifferentiable and it can
also be seen as a forward-backward Euler discretization of the subgradient flow differential
inclusion
ẋ(t) ∈ −∂[f (x(t)) + g(x(t))],
with variable x : R+ → H; see [32]. Actually, if the derivative on the left side is replaced by
the divided difference (x k+1 − x k )/αk , then the discretization obtained is (x k − x k+1 )/αk ∈
∂f (x k ) + ∂g(x k+1 ), which is the proximal subgradient iteration (6).
The nondifferentiability of the function f has a direct impact on the computational effort
and the importance of such problems, when f is nonsmooth, is underlined because they
occur frequently in applications. Nondifferentiability arises, for instance, in the problem of
minimizing the total variation of a signal over a convex set, in the problem of minimizing
the sum of two set-distance functions, in problems involving maxima of convex functions,
the Dantzing selector-type problems, the non-Gaussian image denoising problem and in
Tykhonov regularization problems with L1 norms and others; see, for instance, [4, 13, 17,
26]. The iteration of the proximal subgradient splitting method, proposed in (6), can be
applied in these important instances, extending the classical subgradient iteration for more
general problems as (3). In problem (1), f is usually assumed to be differentiable as in [35],
which is not necessarily the case in this work. Moreover, the convergence of the iteration (6)
to a solution of (1) has been established in the literature, when the gradient of f is globally
Lipschitz continuous and the stepsizes αk , k ∈ N have to be chosen very small, i.e., for
all k, αk is less than some constant related with the Lipschitz constant of the gradient of
f ; see, for instance, [19]. Recently, when f is continuously differentiable but the Lipschitz
constant is not available, the steplengths can be chosen using backtracking procedures; see
[6, 10, 32, 35].
It is important to mention that the forward-backward iteration also finds applications in
solving more general problems, like the variational inequality and inclusion problems; see,
for instance, [9, 11, 14, 15, 42] and the references therein. On the other hand, the standard
convergence analysis of this iteration, for solving these general problems, requires at least
a co-coercivity assumption of the operator and the stepsizes to lie within a suitable inter-
val; see, for instance, Theorem 25.8 of [3]. Note that co-coercive operators are monotone
and Lipschitz continuous, but the converse does not hold in general; see [44]. Although, for
gradients of lower semicontinuous, proper and convex functions, the co-coercivity is equiv-
alent to the global Lipschitz continuity assumption. This nice and surprising fact, which
is strongly used in the convergence analysis of the proximal forward-backward method
for problem (1), when f is differentiable, is known as the Baillon-Haddad Theorem; see
Corollary 18.16 of [3].
The main aim of this work is to remove the differentiability assumption from f in the
forward-backward splitting method, extending the classical projected subgradient method
J. Y. B. Cruz

and containing, as particular case, a new proximal subgradient iteration for more general
problems.
This work is organized as follows. The next subsection provides our notations and
assumptions, and some preliminaries results that will be used in the remainder of this
paper. The proximal subgradient splitting method and its weak convergence are analyzed by
choosing different stepsizes in Section 2. Finally, Section 3 gives some concluding remarks.

1.1 Assumptions and Preliminaries

In this section, we present our assumptions, classical definitions and some results needed
for the convergence analysis of the proposed method.
We start by recalling some definitions and notation used in this paper, which are standard
and follows from [3, 32]. Throughout this paper, we write p := q to indicate that p is
defined to be equal to q. We write N for the nonnegative integers {0, 1, 2, . . .} and remind
that the extended-real number system is R := R ∪{+∞}. The closed ball centered at x ∈ H
with radius γ > 0 will be denoted by B[x; γ ], i.e., B[x; γ ] := {y ∈ H : y − x ≤ γ }. The
domain of any function h : H → R, denoted by dom(h), is defined as dom(h) := {x ∈ H :
h(x) < +∞}. The optimal value of problem (1) will be denoted by s∗ := inf{(f + g)(x) :
x ∈ H}, noting that when S∗ = ∅, s∗ = min{(f + g)(x) : x ∈ H} = (f + g)(x∗ ) for any
x∗ ∈ S∗ . Finally, 1 (N) denotes the set of summable sequences in [0, +∞).
Throughout this paper we assume the following:
A1. ∂f is bounded on bounded sets on the domain of g, i.e., there exists ζ = ζ (V ) > 0
such that ∂f (x) ⊆ B[0; ζ ] for all x ∈ V , where V is any bounded and closed subset of
dom(g).
A2. ∂g has bounded elements on the domain of g, i.e., ∃ρ ≥ 0 such that ∂g(x) ∩
B[0; ρ] = ∅ for all x ∈ dom(g).
In connection with Assumption A1, we recall that ∂f is locally bounded on its open
domain. In finite dimension spaces, this result implies that A1 always holds when dom(f ) is
open. A widely used sufficient condition for A1 is the Lipschitz continuity of f on dom(g).
Furthermore, the boundedness of the subgradients is crucial for the convergence analysis of
many classical subgradient methods in Hilbert spaces and it has been widely considered in
the literature; see, for instance, [1, 8, 9, 38].
Regarding Assumption A2, we emphasize that it holds trivially for important instances
of problem (1), e.g., problems (2) and (3) because ∂δC (x) = NC (x) and ∂x1 = {u ∈ H :
u∞ ≤ 1 , u, x
= x1 }, respectively, or when dom(g) is a bounded set or also when
H is a finite dimensional space. Note that Assumption A2 allows instances where ∂g is an
unbounded set as is the particular case when g is the indicator function. It is an existence
condition, which is in general weaker than A1.
Let us end the section by recalling the well-known concepts so-called quasi-Fejér and
Fejér convergence.

Definition 1.1 Let S be a nonempty subset of H. A sequence (x k )k∈N in H is said to be

quasi-Fejér convergent to S if and only if for all x ∈ S there exists a sequence (
k )k∈N in
1 (N) and x k+1 − x2 ≤ x k − x2 +
k for all k ∈ N. When (
k )k∈N is a null sequence,
we say that (x k )k∈N is Fejér convergent to S.

The definition originates in [22] and has been elaborated further in [16]. This definition,
originated in [22], has been elaborated further in [16]. In the following we present two
well-known fact for quasi-Fejér convergent sequences.
On Proximal Subgradient Splitting Method for Minimizing the sum...

Fact 1.1 If the sequence (x k )k∈N is quasi-Fejér convergent to S, then:

(a) The sequence (x k )k∈N is bounded.
(b) (x k )k∈N is weakly convergent iff all weak accumulation points of (x k )k∈N belong to
S.

Proof Item (a) follows from Proposition 3.3(i) of [16], and Item (b) follows from Theorem
3.8 of [16].

2 The Proximal Subgradient Splitting Method

In this section we propose the proximal subgradient splitting method extending the classi-
cal subgradient iteration. We prove that the sequence of points generated by the proposed
method converges weakly to a solution of (1) using different strategies for choosing of the
stepsizes. Moreover, we show the complexity analysis for the generated sequence.
The method is formally stated as follows:

If PSS Method stops at step k, then x k = proxαk g x k − αk uk with uk ∈ ∂f (x k ), implying
that x k is solution of problem (1). Then, from now on, we assume that PSS Method gener-
ates an infinite sequence (x k )k∈N . Moreover, it follows directly from (6) that the sequence
(x k )k∈N belongs to dom(g).
Before the formal analysis of the convergence properties of PSS Method, we discuss
below about the necessity of taking a (forward) subgradient step of f instead of another
(backward) proximal step.

Remark 2.1 To evaluate the proximal operator of f is necessary to solve a strongly con-
vex minimization problem as (4). Thus, in the context of problem (1), we assume that it is
hard to evaluate the proximal operator of f , leaving out the possibility to use the standard
and very powerful iteration so-called Douglas-Rachford splitting method presented in [17].
Such situations appear mainly when f has a complicated algebraic expression and therefore
it may impossibility to solve, explicitly or efficiently, subproblem (4). Indeed, very often
in the applications, the formula for the proximity operator is not available in closed form
and ad hoc algorithms or approximation procedures have be used to compute proxαf . This
happens for instance when applying proximal methods to image deblurring with total vari-
ation [5], or to structured sparsity regularization problems in machine learning and inverse
problems [31]. A classical problem of the form of (1), when the subgradient of f is easily
available and proxf does not has explicitly formula is the dual formulation of the following
constrained convex problem:
min h0 (y) subject that hi (y) ≤ 0 (i = 1, . . . , n), (7)
J. Y. B. Cruz

where hi : Rm → R (i = 0, . . . , n) are convex. It has as dual problem

min f (x) + δRn+ (x)
x∈Rn
n
with f : Rn → R defined as f (x) = − infy∈Rm {h0 (y) + i=1 xi hi (y)}. It is well-known
that

n
∂f (x) = conv h(yx ) : f (x) = h0 (yx ) + xi hi (yx ) ,
i=1
and conv{S} denotes the convex hull of a set S. However, compute proxf does not look
an easy problem. This argument is used widely in the literature to motivated the projected
subgradient method, which can be easily modified for recovering problems as (1), when g
is not necessary the indicator function. Indeed, consider problem (7) when n = m with an
additional and simple restriction g0 , that is: y ∈ Rn
min h0 (y) subject that hi (y) ≤ 0 (i = 1, . . . , n), g0 (y) ≤ 0, (8)
which can be rewritten now as minx∈Rn f (x) + δRn+ (x) + λg0 (x), λ > 0, where

n
f (x) = − inf n {h0 (y) + xi hi (y)}.
y∈R
i=1

This last problem is a particular case of (1), by taking g = δRn+ + λg0 . Note that if
dom(g0 ) ⊆ Rn+ then g = λg0 .
Thus, PSS Method uses the proximal operator of g and the explicit subgradient iteration
of f (i.e., the proximal operator of f is never evaluated), which is, in general, much easier
to implement than the proximal operator of f + g or f , as happens in the standard proxi-
mal point iteration or the Douglas-Rachford algorithm, respectively for solving nonsmooth
problems, as (1); see, for instance, [17]. Furthermore, note that in our case the subgradient
iteration for the sum f +g is not possible, because the domains of f and g are not the whole
space.

In the following we prove a crucial property of the iterates generated by PSS Method.

Lemma 2.1 Let (x k )k∈N and (uk )k∈N be the sequences generated by PSS Method. Then,
for all k ∈ N and x ∈ dom(g),

x k+1 − x2 ≤ x k − x2 + 2αk (f + g)(x) − (f + g)(x k ) + αk2 uk + w k 2 ,

where w k ∈ ∂g(x k ) is arbitrary.

x k − x k+1
Proof Take any x ∈ dom(g). Note that (5) and (6) imply that w̄ k+1 := − uk ,
αk
with uk ∈ ∂f (x k ) as defined by PSS Method, belongs to ∂g(x k+1 ). Then,

αk2 uk + w̄ k+1 2 + x k − x2 − x k+1 − x = x k+1 − x k 2 + x k − x2 − x k+1 − x2

= 2 x k − x k+1 , x k − x
= 2αk uk , x k − x
+ 2 x k − x k+1 − αk uk , x k − x

x k − x k+1 x k − x k+1
= 2αk u , x − x
+ 2αk
k k
− u ,x
k k+1
− x + 2αk − u ,x − x
k k k+1
αk αk

= 2αk uk , x k − x
+ 2αk w̄ k+1 , x k+1 − x + 2αk uk , x k+1 − x k
+ 2x k − x k+1 2 .
On Proximal Subgradient Splitting Method for Minimizing the sum...

x k − x k+1
Now using again that − uk = w̄ k+1 ∈ ∂g(x k+1 ) and the convexity of g and f ,
αk
the above equality leads us

2 x k − x k+1 , x k − x
≥ 2αk f (x k ) − f (x) + g(x k+1 ) − g(x) + uk , x k+1 − x k
+ 2x k − x k+1 2

= 2αk (f + g)(x k ) − (f + g)(x) + g(x k+1 ) − g(x k ) + uk , x k+1 − x k
+ 2αk2 uk + w̄k+1 2

≥ 2αk (f + g)(x k ) − (f + g)(x) + wk + uk , x k+1 − x k
+ 2αk2 uk + w̄k+1 2 ,

for any w k ∈ ∂g(x k ). We thus have shown that

x k+1 − x2 ≤ x k − x2 + 2αk (f + g)(x) − (f + g)(x k )

+2αk2 uk + w k , uk + w̄ k+1
− αk2 uk + w̄ k+1 2

= x k − x2 + 2αk (f + g)(x) − (f + g)(x k ) + αk2 uk + w k 2 − αk2 w k − w̄ k+1 2 .

Note that w k ∈ ∂g(x k ) is arbitrary and the result follows.

Since subgradient methods are not descent methods, as the proposed method here, it is
common to keep track of the best point found so far, i.e., the one with minimum function
value among the iterates. At each step, we set it recursively as (f + g)0best := (f + g)(x 0 )
and
(f + g)kbest := min (f + g)k−1best , (f + g)(x k
) , (9)

for all k. Since (f + g)kbest k∈N is a decreasing sequence, it has a limit (which can be −∞).
When the function f is differentiable and its gradient Lipschitz continuous, it is possible to
prove the complexity of the iterates generated by PSS Method; see [35]. In our instance (f
is not necessarily differentiable) we expect, of course, slower convergence.
Next we present
a convergence rate result for the sequence of the best functional values
(f + g)kbest k∈N to min{(f + g)(x) : x ∈ H}.

Lemma 2.2 Let (f + g)kbest k∈N be the sequence defined by (9). If S∗ = ∅ then, for all
k ∈ N,
[ dist(x 0 , S∗ )]2 + Ck ki=0 αi2
(f + g)best − min (f + g)(x) ≤
k
,
x∈H 2 ki=0 αi

where Ck := max ui + w i 2 : 0 ≤ i ≤ k with w i ∈ ∂g(x i ) (i = 0, . . . , k) are arbitrary.

Proof Define x∗ := PS∗ (x 0 ). Note that x∗ exists because S∗ is a nonempty closed and
convex set of H. By applying Lemma 2.1, k + 1 times, for i ∈ {0, 1, . . . , k} at x∗ ∈ S∗ , we
get

x k+1 − x∗ 2 ≤ x k − x∗ 2 + 2αk (f + g)(x∗ ) − (f + g)(x k ) + αk2 uk + w k 2

k k
≤ x 0 − x∗ 2 + 2 αi (f + g)(x∗ ) − (f + g)(x i ) + αi2 ui + w i 2
i=0 i=0

k
k
≤ [dist(x 0 , S∗ )]2 + 2 min (f + g)(x) − (f + g)kbest αi + Ck αi2 , (10)
x∈H
i=0 i=0

where (f + g)kbest is defined by (9) and the result follows after simple algebra.
J. Y. B. Cruz

Next we establish the rate of convergence of the rate of the objective values at the ergodic
sequence (x̄ k )k∈N of (x k )k∈N , which is defined recursively as x̄ 0 = x 0 and given σ0 = α0
and σk = σk−1 + αk , we define

αk αk
x̄ k = 1 − x̄ k−1 + x k .
σk σk
k
After easy induction, we have σk = i=0 αi and

1
k
x̄ k = αi x i , (11)
σk
i=0
for all k ∈ N.
The following result is similar to Lemma 2.2, considering now the ergodic sequence
defined by (11).

Lemma 2.3 Let (x̄ k )k∈N be the ergodic sequence defined by (11). If S∗ = ∅, then

[ dist(x 0 , S∗ )]2 + Ck ki=0 αi2
(f + g)(x̄ ) − min (f + g)(x) ≤
k
,
x∈H 2 ki=0 αi

where Ck = max ui + w i 2 : 0 ≤ i ≤ k with w i ∈ ∂g(x i ) (i = 0, . . . , k) are arbitrary.

Proof
Proceeding as in the proof of Lemma 2.2 until inequality (10) and after dividing by
2 ki=0 αi , we get

1 Ck 2
k k
αi
(f + g)(x i ) − min (f + g)(x) ≤ [ dist(x 0 , S∗ )]2 − x k+1 − x∗ 2 + αi
σk x∈H 2σk 2σk
i=0 i=0

1 k
≤ [ dist(x , S∗ )] + Ck
0 2 2
αi , (12)
2σk
i=0
k αi
where σk := i=0 αi . Using the convexity of f + g after note that σk ∈ [0, 1] for all i ∈
k αi
{0, 1, . . . , k} and i=0 σk = 1 and (11) in the above inequality (12), the result follows.

Next we focus on constant stepsizes, which is motivated by the fact that we are interested
in quantifying the progress of the proposed method to find an approximate solution.

Corollary 2.4 Let (x k )k∈N be the sequence

generated by PSS Method with the stepsizes
αk constant equal to α, (f + g)kbest k∈N be the sequence defined by (9) and (x̄ k )k∈N be the
dist(x 0
ergodic sequence as (11). Then, the iteration attains the optimal rate at α = √ ,S∗ ) ·
Ck
√1 , i.e., for all k ∈ N,
k+1
√
[dist(x 0 , S∗ )]2 + α 2 (k + 1)Ck dist(x 0 , S∗ ) · Ck
(f + g)kbest − min (f + g)(x) ≤ ≤ √
x∈H 2(k + 1)α k+1
and
√
[dist(x 0 , S∗ )]2 + α 2 (k + 1)Ck dist(x 0 , S∗ ) · Ck
(f + g)(x̄ k ) − min (f + g)(x) ≤ ≤ √ ,
x∈H 2(k + 1)α k+1

where Ck = max ui + w i 2 : 0 ≤ i ≤ k with w i ∈ ∂g(x i ) (i = 0, . . . , k) are arbitrary.
On Proximal Subgradient Splitting Method for Minimizing the sum...

Proof If we consider constant stepsizes, i.e., αk = α for all k ∈ N, then the optimal rate
0
is obtained when α = dist(x
√ ,S∗ ) · √ 1
Ck
from minimizing the right part of Lemmas 2.2
k+1
and 2.3.

Note that under Assumption A2, Ck ≤ (max1≤i≤k ui +ρ)2 . Hence when ∂f is bounded
on the dom(g) (which occurs when dom(g) is bounded over our assumptions), Assump-
tion A1 implies that Ck ≤ (ζ + ρ)2 for all k ∈ N. In this case, our analysis showed that
the expected error of the iterates
generated by PSS Method with constant stepsizes after
k iterations is O (k + 1)−1/2 . Hence, we can search an ε-solution of problem (1) with

O ε−2 iterations. Of course, this is worse than the rate O(k −1 ) and O ε−1 iterations of
the proximal forward-backward iteration for the differentiable and convex f with Lipschitz
continuous gradient; see, for instance, [35]. However, as was showed in Section 3.2.1, The-
orem 3.2.1 of [37], the worst expected
error after k iterations of the classical subgradient
iteration is attainable equal to O (k + 1)−1/2 for general nonsmooth problems.

2.1 Exogenous Stepsizes

In this subsection we analyze the convergence of PSS Method using exogenous stepsizes,
βk
i.e., the positive exogenous sequence of stepsizes (αk )k∈N satisfies that αk = where
ηk
ηk := max{1, uk } for all k, and
∞
∞

βk2 < +∞ and βk = +∞. (13)
k=0 k=0

We begin with a useful consequence of Lemma 2.1.

Corollary 2.5 Let x ∈ dom(g). Then, for all k ∈ N,

βk
x k+1 − x2 ≤ x k − x2 + 2 (f + g)(x) − (f + g)(x k ) + 1 + 2ρ + ρ 2 βk2 ,
ηk
where ρ ≥ 0 is as defined in Assumption A2.

Proof The result follows by noting that ηk ≥ uk , ηk ≥ 1 for all k ∈ N and letting
w k ∈ ∂g(x k ) such that w k ≤ ρ for all k ∈ N in view of Assumption A2. Then,
uk + w k 2 uk 2 uk w k w k 2
2
≤ 2
+2 + ≤ 1 + 2ρ + ρ 2 .
ηk ηk ηk2 ηk2
Now, Lemma 2.1 implies the desired result.

Now we define the auxiliary set

Slev (x 0 ) := x ∈ dom(g) : (f + g)(x) ≤ (f + g)(x k ), ∀k ∈ N . (14)

When the solution set of problem (1) is nonempty, Slev (x 0 ) = ∅ because S∗ ⊆ Slev (x 0 ).
Next, we prove the two main results of this subsection.

Theorem 2.6 Let (x k )k∈N be the sequence generated by PSS Method with exogenous
stepsizes. If there exists x̄ ∈ Slev (x 0 ), then:
J. Y. B. Cruz

(a) The sequence (x k )k∈N is quasi-Fejér convergent to

Lf +g (x̄) := {x ∈ dom(g) : (f + g)(x) ≤ (f + g)(x̄)} .
(b) limk→∞ (f + g)(x k ) = (f + g)(x̄).
(c) The sequence (x k )k∈N is weakly convergent to some x̃ ∈ Lf +g (x̄).

Proof By assumption there exists x̄ ∈ Slev (x 0 ), i.e., (f +g)(x̄) ≤ (f +g)(x k ), for all k ∈ N.
(a) To show that (x k )k∈N is quasi-Fejér convergent to Lf +g (x̄) (which is nonempty
because x̄ ∈ Lf +g (x̄)), we use Corollary 2.5, for any x ∈ Lf +g (x̄) ⊆ dom(g), estab-
lishing that x k+1 − x2 ≤ x k − x2 + (1 + 2ρ + ρ 2 )βk2 , for all k ∈ N. Thus, (x k )k∈N
is quasi-Fejér convergent to Lf +g (x̄).
(b) The sequence (x k )k∈N is bounded from Fact 1.1a, and hence it has accumulation
points in the sense of the weak topology. To prove that
lim (f + g)(x k ) = (f + g)(x̄), (15)
k→∞

we use Corollary 2.5, with x = x̄ ∈ Lf +g (x̄) ⊆ dom(g), to get

βk (f + g)(x k ) − (f + g)(x̄) ≤ 1
2 (x
k − x̄2 − x k+1 − x̄2 ) + 12 (1 + 2ρ + ρ 2 )βk2 .

Summing, from k = 0 to m, the above inequality, we have

m
m
βk (f + g)(x k ) − (f + g)(x̄) ≤ 1
2 (x
0 − x̄2 − x m+1 − x̄2 ) + 12 (1 + 2ρ + ρ 2 ) βk2 ,
k=0 k=0

and taking limit, when m goes to ∞,

∞

βk (f + g)(x k ) − (f + g)(x̄) < +∞. (16)
k=0

Then, (16)i together with (13) implies that there exists a subsequence
(f + g)(x k ) k∈N of (f + g)(x k ) k∈N such that

lim inf (f + g)(x ik ) − (f + g)(x̄) = 0. (17)
k→∞

Indeed, if (17) does not hold, then there exist σ > 0 and k ≥ k̃, such that (f +
g)(x k ) − (f + g)(x̄) ≥ σ and using (16), we get
∞
∞

+∞ > βk (f + g)(x k ) − (f + g)(x̄) ≥ σ βk ,
k=k̃ k=k̃

in contradiction with (13). Next, define ϕk := (f + − (f + g)(x̄), which is

g)(x k )
positive for all k because x̄ ∈ Slev (x 0 ). Then, for any uk ∈ ∂f (x k ) and w k ∈ ∂g(x k ),
we get
ϕk − ϕk+1 = (f + g)(x k ) − (f + g)(x k+1 ) ≤ uk + w k , x k − x k+1

≤ uk + w k x k − x k+1 ≤ (ζ + ρ)x k − x k+1 , (18)

where ζ > 0 such that uk ≤ ζ , for all k ∈ N (ζ exists in virtue of the boundedness
of (x k )k∈N and Assumption A1) and w k ≤ ρ, for all k ∈ N (ρ exists because
w k ∈ ∂g(x k ) are arbitrary and the use of Assumption A2). Using Corollary 2.5, with
On Proximal Subgradient Splitting Method for Minimizing the sum...

x = x k , we have x k − x k+1 ≤ 1 + 2ρ + ρ 2 · βk , which together with (18) implies
that
ϕk − ϕk+1 ≤ 1 + 2ρ + ρ 2 · (ζ + ρ)βk := ρ̄βk (19)

for all k ∈ N. From (17), there exists a subsequence ϕik k∈N of (ϕk )k∈N such that
limk→∞ ϕik = 0. Ifthe claim given in (15) does not hold, then there exists some δ > 0
and a subsequence ϕk k∈N of (ϕk )k∈N , such that ϕk ≥ δ for all k ∈ N. Thus, we can

construct a third subsequence ϕjk k∈N of (ϕk )k∈N , where the indices jk are chosen in
the following way:
j0 := min{m ≥ 0 | ϕm ≥ δ},
j2k+1 := min{m ≥ j2k | ϕm ≤ δ/2},
j2k+2 := min{m ≥ j2k+1 | ϕm ≥ δ},

for each k. The existence of the subsequences ϕik k∈N , ϕk k∈N of (ϕk )k∈N , guaran-

tees that the subsequence ϕjk k∈N of (ϕk )k∈N is well-defined for all k ≥ 0. It follows
from the definition of jk that
ϕm ≥ δ for j2k ≤ m ≤ j2k+1 − 1 (20)
δ
ϕm ≤ for j2k+1 ≤ m ≤ j2k+2 − 1
2
for all k, and hence
δ
ϕj2k − ϕj2k+1 ≥
, (21)
2
for all k ∈ N. In view of (16) and remind that ϕk = (f + g)(x k ) − (f + g)(x̄) ≥ 0
for all k ∈ N,
∞
−1
∞ j2k+1
∞ j2k+1 −1
δ
+∞ > βk ϕk ≥ βm ϕm ≥ βm
2
k=0 k=0 m=j2k k=0 m=j2k

δ −1
∞ j2k+1
δ −1
∞ j2k+1
δ
∞
= ρ̄βm ≥ (ϕm − ϕm+1 ) = (ϕj2k − ϕj2k+1 )
2ρ̄ 2ρ̄ 2ρ̄
k=0 m=j2k k=0 m=j2k k=0
∞

δ δ
≥ = +∞,
2ρ̄ 2
k=0
where we have used (20) in the second inequality and (19) in the third inequality and
(21) in the last one. Thus, lim (f + g)(x k ) = (f + g)(x̄), establishing (b).
k→∞
(c) Let x̃ be a weak accumulation point of (x k )k∈N , and note that x̃ exists by Item (a) and
Fact 1.1a. From now on, we use (x ik )k∈N to denote any subsequence of (x k )k∈N that
converges weakly to x̃. Since f + g is weakly lower semicontinuous, using (15), we
get
(f + g)(x̃) ≤ lim inf(f + g)(x ik ) = lim (f + g)(x k ) = (f + g)(x̄),
k→∞ k→∞

implying that (f + g)(x̃) ≤ (f + g)(x̄) and thus x̃ ∈ Lf +g (x̄). As consequence, all

weak accumulation points of (x k )k∈N belong to Lf +g (x̄) and since (x k )k∈N is quasi-Fejér
convergent to Lf +g (x̄), we get that (x k )k∈N converges weakly to x̃ ∈ Lf +g (x̄) from
Fact 1.1b.

Theorem 2.7 Let (x k )k∈N be the sequence generated by PSS Method with exogenous
stepsizes. Then,
J. Y. B. Cruz

(a) lim infk→∞ (f + g)(x k ) = infx∈H (f + g)(x) = s∗ (possibly s∗ = −∞).

(b) If S∗ = ∅, then limk→∞ (f + g)(x k ) = minx∈H (f + g)(x) and (x k )k∈N converges
weakly to some x̄ ∈ S∗ .
(c) If S∗ = ∅, then (x k )k∈N is unbounded.

Proof (a) Since (x k )k∈N ⊂ dom(g), we get s∗ ≤ lim infk→∞ (f + g)(x k ). Suppose that
s∗ < lim infk→∞ (f + g)(x k ). Hence, there exists x̂ such that

(f + g)(x̂) < lim inf(f + g)(x k ). (22)

k→∞

It follows from (22) that there exists k̄ ∈ N such that (f + g)(x̂) ≤ (f + g)(x k ) for all
k ≥ k̄. Since k̄ is finite we can assume without loss of generality that (f + g)(x̂) ≤
(f + g)(x k ) for all k ∈ N. Using the definition of Slev (x 0 ), given in (14), we have that
x̂ ∈ Slev (x 0 ). By Theorem 2.6b limk→∞ (f + g)(x k ) = (f + g)(x̂), in contradiction
with (22).
(b) Since S∗ = ∅, take x∗ ∈ S∗ and note that this implies Lf +g (x∗ ) = S∗ . Since
(x k )k∈N ⊂ dom(g), we get (f + g)(x∗ ) ≤ (f + g)(x k ) for all k ∈ N implying that
x∗ ∈ Slev (x 0 ). By applying items (b) and (c) of Theorem 2.6, at x̄ = x∗ , we get that
limk→∞ (f + g)(x k ) = (f + g)(x∗ ) and (x k )k∈N converges weakly to some x̃ ∈ S∗ ,
respectively.
(c) Assume that S∗ is empty but the sequence (x k )k∈N is bounded. Let (x k )k∈N be a sub-
sequence of (x k )k∈N such that limk→∞ (f + g)(x k ) = lim infk→∞ (f + g)(x k ). Since
(x k )k∈N is bounded, without loss of generality (i.e., refining (x k )k∈N if necessary),
we may assume that (x k )k∈N converges weakly to some x̄ ∈ dom(g). By the weak
lower semicontinuity of f + g on dom(g),

(f + g)(x̄) ≤ lim inf(f + g)(x k ) = lim (f + g)(x k ) = lim inf(f + g)(x k ) = s∗ ,

k→∞ k→∞ k→∞
(23)
using Item (a) in the last equality. By (23), x̄ ∈ S∗ , in contradiction with the hypothesis
and the result follows.

For exogenous stepsizes, Theorem 2.7a guarantees the convergence of (f + g)(x k ) k∈N
to the optimal
value of problem (1), i.e., lim infk→∞ (f + g)(x k ) = s∗ , implying the conver-
gence of (f + g)kbest k∈N , defined in (9), to s∗ . It is important to mention that in the proof
of the above two crucial results, we have used a similar idea recently presented in [7] for a
different instance.
In the following we present a direct consequence of Lemmas 2.2 and 2.3, when the
stepsizes satisfy (13).

Corollary 2.8 Let (x̄ k )k∈N be the ergodic sequence defined by (11) and (βk )k∈N as (13). If
S∗ = ∅, then, for all k ∈ N,
k
[ dist(x 0 , S∗ )]2 + (1 + 2ρ + ρ 2 ) 2
i=0 βi
(f + g)kbest − min (f + g)(x) ≤ ζ
x∈H 2 ki=0 βi
On Proximal Subgradient Splitting Method for Minimizing the sum...

and
k
[ dist(x 0 , S∗ )]2 + (1 + 2ρ + ρ 2 ) 2
i=0 βi
(f + g)(x̄ ) − min (f + g)(x) ≤ ζ
k
,
x∈H 2 ki=0 βi
where ζ > 0 and ρ ≥ 0 are as in Assumptions A1 and A2, respectively.

The above corollary shows that if we assume existence of solutions, the expected error of
the iterates
generated by PSS Method with the exogenous stepsizes (13) after k iterations
is O ( ki=0 βi )−1 . Since (βk )k∈N satisfies (13) the best performance of the iteration (in
term of functional values) is archived for example taking βk ∼
= 1/k r with r bigger than 1/2,
but near of this value, for all k.

2.2 Polyak Stepsizes

In this subsection we analyze the convergence of PSS Method using Polyak stepsizes. Hav-
ing chose any w k ∈ ∂g(x k ) and denoted ρk := w k for all k ∈ N. Then define, for all
k ∈ N,
(f + g)(x k ) − sk
α k = γk , (24)
u 2 + 2ρk uk + ρk2
k

where 0 < γ ≤ γk ≤ 2 − γ . We assume that sk a monotone decreasing variable target value

approximating s∗ := inf{(f +g)(x) : x ∈ H} is available, and satisfies that sk ≤ (f +g)(x k )
for all k ∈ N. When s∗ is known, the simplest variant of the stepsizes proposed in (24) is
obtained the stepsizes
(f + g)(x k ) − s∗
α k = γk , (25)
uk 2 + 2ρk uk + ρk2
for all k ∈ N. Unfortunately, to find an optimal solution, scheme (25) requires prior knowl-
edge of the optimal objective function value s∗ . As s∗ is usually unknown, we prefer to do
our analysis over (24), and replace s∗ by the variable target value sk . When g is the indicator
function of a closed and convex set further discussion about how to choose sk is presented
in the literature for problems where a good upper or lower bound of the optimal objective
function value is available; see, for instance, [25, 27, 41].
Now we present a direct consequence of Lemma 2.1. Denote
Lf +g (s) := {x ∈ dom(g) : (f + g)(x) ≤ s} .

Corollary 2.9 Suppose that limk→∞ sk = s̃ ≥ s∗ and let any x ∈ Lf +g (s̃). Then,
2
sk − (f + g)(x k )
x k+1 − x2 ≤ x k − x2 − γ (2 − γ ) ,
uk 2 + 2ρk uk + ρk2
for all k ∈ N.

Proof Take x ∈ Lf +g (s̃) = {x ∈ dom(g) : (f + g)(x) ≤ s̃}. Since (sk )k∈N is a monotone
decreasing sequence convergent to s̃, which is less than the function values of the iterates,
(f + g)(x k ) ≥ sk ≥ s̃ ≥ (f + g)(x), ∀x ∈ Lf +g (s̃), (26)
J. Y. B. Cruz

for all k ∈ N. Then, applying Lemma 2. and using (26), we get, for all k ∈ N,

sk − (f + g)(x k ) (f + g)(x) − (f + g)(x k )
x k+1
− x ≤ x − x − 2γk
2 k 2
uk 2 + 2ρk uk + ρk2
2
sk − (f + g)(x k )
+γk2
uk 2 + 2ρk uk + ρk2
2
sk − (f + g)(x k )
≤ x − x − γk (2 − γk )
k 2
uk 2 + 2ρk uk + ρk2
2
sk − (f + g)(x k )
≤ x − x − γ (2 − γ )
k 2
, (27)
uk 2 + 2ρk uk + ρk2
where we used that x ∈ Lf +g (s̃), (24) and (26) in the second inequality. The result follows
from (27).

Now we prove the first main result of this subsection in the following theorem.

Theorem 2.10 Let (x k )k∈N be the sequence generated by PSS Method with αk as in (24).
If limk→∞ sk = s̃ ≥ s∗ and Lf +g (s̃) = ∅, then
(a) (x k )k∈N is Fejér convergent to Lf +g (s̃).
(b) limk→∞ (f + g)(x k ) = s̃.
(c) (x k )k∈N is weakly convergent to some x̃ ∈ Lf +g (s̃).

Proof (a) It is direct consequence of Corollary 2.9.

(b) By Item (a), the sequence (x k )k∈N is bounded. By using Corollary 2.9, at any x ∈
Lf +g (s̃), we get
2
γ (2 − γ ) sk − (f + g)(x k ) ≤ uk 2 + 2ρk uk + ρk2 x k − x2 − x k+1 − x2 (28)

≤ ζ 2 + 2ρζ + ρ 2 x k − x2 − x k+1 − x2

:= ρ̂ x k − x2 − x k+1 − x2 , (29)

where the last inequality follows from Assumptions A1 and A2 (uk ≤ ζ and
ρk = w k ≤ ρ for all k ∈ N). Summing (29), over k = 0 to m, we obtain
m 2
γ (2 − γ ) sk − (f + g)(x k ) ≤ ρ̂ x 0 − x2 − x m+1 − x2
k=0
≤ ρ̂x 0 − x2 .
Taking limits, when m goes to ∞, we get the desired result.
(c) From Item (b), if s̃ = limk→∞ sk then limk→∞ (f + g)(x k ) = s̃. Let x̃ be a weak
accumulation point of (x k )k∈N , which exists by the boundedness of (x k )k∈N direct
consequence of Item (a). From now on, we denote (x k )k∈N any subsequence of
(x k )k∈N , which converges weakly to x̃. Since f + g is weakly lower semicontinuous,
we get (f + g)(x̃) ≤ lim infk→∞ (f + g)(x k ) = limk→∞ (f + g)(x k ) = s̃, implying
that (f + g)(x̃) ≤ s̃ and thus x̃ ∈ Lf +g (s̃). The result follows from Fact 1.1(b) and
Item (a).
On Proximal Subgradient Splitting Method for Minimizing the sum...

Before the analysis of the inconsistent case when s̃ = limk→∞ sk is strictly less than
s∗ = inf{(f + g)(x) : x ∈ H}, we present a useful corollary which is a direct consequence
of Theorem 2.10, that shall be used for the analysis of this case, s̃ < s∗ . In the next corollary,
we show the special case when the optimal value s∗ is known and finite and the stepsize αk
is defined by (25), i.e., for all k ∈ N,

(f + g)(x k ) − s∗
α k = γk ,
uk 2 + 2ρk uk + ρk2

where 0 < γ ≤ γk ≤ 2 − γ .

Corollary 2.11 Let (x k )k∈N be the sequence generated by PSS Method with αk given by
(25). Assume that S∗ = ∅. Then,
(a) (x k )k∈N is Fejér convergent to S∗ .
(b) limk→∞ (f + g)(x k ) = minx∈H (f + g)(x).
(c) (x k )k∈N is weakly
√ convergent
to some x̃ ∈ S∗ .
(d) lim infk→∞ k + 1 · (f + g)(x k ) − minx∈H (f + g)(x) = 0.

Proof Items (a) to (c) are direct consequence

√ of Theorem
2.10. The proof of Item (d) is
by contradiction. Assume that lim infk→∞ k + 1· (f + g)(x k ) − minx∈H (f + g)(x) ≥
2δ, for some δ > 0. Then, for k̄ large enough, we have (f + g)(x k ) − minx∈H (f + g)(x) ≥
√δ for all k ≥ k̄. Thus,
k+1

∞
2 ∞
1
(f + g)(x k ) − min (f + g)(x) ≥ δ 2 = +∞. (30)
x∈H k+1
k=k̄ k=k̄

On the other hand, by substituting the expression for the stepsize αk given by (25), in (29)
(sk = minx∈H (f + g)(x) for all k ∈ N), we get, for all k ≥ k̄,
∞
2
(f + g)(x k ) − min (f + g)(x) < +∞,
x∈H
k=k̄

which contradicts (30), thus establishing the result.

Next we present a result on the complexity of the iterates.

Lemma 2.12 Let (x k )k∈N be the sequence generated by PSS Method with αk , given by
(24). If limk→∞ sk = s̃ ≥ s∗ and Lf +g (s̃) = ∅, then, for all k ∈ N,

Dk dist(x 0 , Lf +g (s̃))
(f + g)best − s̃ ≤
k
· √ ,
γ (2 − γ ) k+1

where Dk := max ui 2 + 2ρi ui + ρi2 : 1 ≤ i ≤ k with ρi := w i and w i ∈ ∂g(x i )
(i = 0, . . . , k) are arbitrary. Moreover,

lim (f + g)kbest = s̃.

k→∞
J. Y. B. Cruz

Proof Repeating the proof of Theorem 2.10, with x̃ := PLf +g (s̃) (x 0 ) ∈ Lf +g (s̃), until (28),
we obtain
2
k 2 Dk 2
(k + 1) (f + g)kbest − s̃ ≤ (f + g)(x i ) − sk ≤ dist(x 0 , Lf +g (s̃)) ,
γ (2 − γ )
i=0

where Dk := max ui 2+ 2ρi ui + ρi2 : 1 ≤ i ≤ k with ρi = w i and w i ∈ ∂g(x i )
(i = 0, . . . , k) are arbitrary. After simple algebra the result follows.

Our analysis proved that the expected error of the iterates generated
by PSS Method
with the Polyak stepsizes (24) after k iterations is O (k + 1)−1/2 if we assume sk ≥ s∗ for
all k ∈ N and (Dk )k∈N is bounded.
Now we are ready to prove the last main result of this subsection.

Theorem 2.13 Let (x k )k∈N be the sequence generated by PSS Method with αk , given by
(24). If S∗ = ∅ and limk→∞ sk = s̃ < minx∈H (f + g)(x), then

2−γ
lim (f +g)kbest = lim min (f +g)(x i ) ≤ min (f +g)(x)+ min (f + g)(x) − s̃ .
k→∞ k→∞ 0≤i≤k x∈H γ x∈H

Proof Suppose that (f + g)(x k ) > minx∈H (f + g)(x), otherwise the result holds trivially.
It is clear that, for all k ∈ N,
(f + g)(x k ) − sk (f + g)(x k ) − minx∈H (f + g)(x)
α k = γk
(f + g)(x k ) − minx∈H (f + g)(x) uk 2 + 2ρk uk + ρk2
(f + g)(x k ) − minx∈H (f + g)(x)
:= γ̃k ,
uk 2 + 2ρk uk + ρk2
where
(f + g)(x k ) − sk
γ ≤ γ̃k = γk ,
(f + g)(x k ) − minx∈H (f + g)(x)
which implies that γ̃k is greater than 2 − γ for some k̄ ∈ N. Otherwise, if
γ̃k ≤ 2 − γ (31)
for all k ∈ N, we can apply Corollary 2.11b to get limk→∞ (f + = minx∈H (f +
g)(x k )
g)(x), implying that γ̃k goes to +∞ (note that for all sufficiently large k, sk < minx∈H (f +
g)(x) ≤ (f + g)(x k ), because s̃ < minx∈H (f + g)(x)), which is a contradiction with (31).
Thus, there exist k̄ and δ > 0 arbitrary such that

(f + g)(x k̄ ) − sk̄
γk̄ = γ̃k̄ > 2 − δ.
(f + g)(x k̄ ) − minx∈H (f + g)(x)
After simple algebra and using that sk̄ ≥ s̃, we get that
γk̄
(f + g)(x k̄ ) < min (f + g)(x) + [min (f + g)(x) − s̃]
x∈H 2 − δ − γk̄ x∈H
2−γ
≤ min (f + g)(x) + [min (f + g)(x) − s̃],
x∈H γ − δ x∈H
since δ > 0 was arbitrary and the result follows.
On Proximal Subgradient Splitting Method for Minimizing the sum...

Finally in the following

corollary we summarize the behaviour of the limit of the
sequence of (f + g)kbest k∈N depending on the limit of s̃ = limk→∞ sk , which is direct
consequence of Theorems 2.10b and 2.13 and Lemma 2.12.

Corollary 2.14 Let (x k )k∈N be the sequence generated by PSS Method with αk , given by
(24). If S∗ = ∅ and limk→∞ sk = s̃, then
⎧
⎨ = k→∞
⎪ lim (f + g)(x k ) = s̃, if s̃ ≥ min (f + g)(x)
x∈H
lim (f + g)kbest
⎪ 2−γ
k→∞ ⎩ ≤ min (f + g)(x) + min (f + g)(x) − s̃ , if s̃ < min (f + g)(x).
x∈H γ x∈H x∈H

3 Final Remarks

In this work we dealt with the weak convergence and the complexity analysis of the new
approach called the Proximal Subgradient Splitting (PSS) Method for minimizing the sum
of two nonsmooth and convex functions under standard assumptions (namely Assumptions
A1 and A2). It worth mentioning that, these kind of boundedness assumptions, are needed
even for the convergence analysis of the classical subgradient iteration, and hopefully its
relaxation will be addressed in future research. Adding that, in the iteration of the proposed
iteration, none of the functions need be differentiable or finite on H and, therefore, a broad
class of problems can be solved. PSS Method is very useful when the proximal operator of
f is complicated to evaluate and its (sub)gradient is simple to compute.
As future research, we will investigate variations of our scheme for solving structured
convex optimization problems with the aim of finding new methods, like the coordinate
gradient method, which have been proposed, for instance, in [36] only for the differentiable
case. We also look at the incremental subgradient method [28, 33] for problem (1), when
f is the sum of a large number of nonsmooth convex functions. The idea is to perform
subgradient iterations incrementally, by sequentially taking steps along the subgradients of
the component functions, followed by proximal steps. On the other hand, it is important
to mention that the main drawback of subgradient iterations is their slow rate of conver-
gence. However, subgradient methods are distinguished by their applicability, simplicity
and efficient use of memory, which is very important for large scale problems; especially
if the required accuracy for the solution is not too high; see, for instance, [34] and the
references therein. We also will intend to study fast and variable metric versions of the prox-
imal subgradient splitting method proposed here to achieve better performance, as in the
differentiable case; see [20].
Finally, we hope that this study serves as a basis for future research on other more
efficient variants on the proximal subgradient iteration, like cutting-plane method,
-
subgradients and proximal bundle method and its variations; see [28, 29, 40]. Moreover, in
future work we discuss useful modifications on the proximal subgradient iteration adding
conditional, ergodic and deflected techniques combining the ideas presented in [21, 30].

Acknowledgments The author was partially supported by CNPq grants 303492/2013-9, 474160/2013-0
and 202677/2013-3. This work was partially completed while the author was visiting the University of British
Columbia. The author is very grateful for the warm hospitality of the Irving K. Barber School of Arts and
Sciences, Mathematics at the University of British Columbia Okanagan and particularly to Professors Heinz
H. Bauschke and Shawn Wang for the generous hospitality. The author would like to thank to anonymous
referees and associate editors whose suggestions helped us to improve the presentation of this paper.
J. Y. B. Cruz

References

1. Alber, Y.I., Iusem, A.N., Solodov, M.V.: On the projected subgradient method for nonsmooth convex
optimization in a Hilbert space. Math Program 81, 23–37 (1998)
2. Bauschke, H.H., Borwein, J.: On projection algorithms for solving convex feasibility problems. SIAM
Rev 38, 367–426 (1996)
3. Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces.
Springer, New York (2011)
4. Bauschke, H.H., Koch, V.R., Phan, H.M.: Stadium norm and Douglas-Rachford splitting: a new approach
to road design optimization. Operations Research (2016) in press
5. Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image denoising
and deblurring. IEEE Trans Image Process 18, 2419–2434 (2009)
6. Beck, A., Teboulle, M.: Gradient-Based Algorithms with Applications to Signal Recovery Problems. In:
Palomar, D., Eldar, Y. (eds.) Convex Optimization in Signal Processing and Communications, pp. 42–88.
University Press, Cambribge (2010)
7. Bello Cruz, J.Y.: A subgradient method for vector optimization problems. SIAM J Optim 23, 2169–2182
(2013)
8. Bello Cruz, J.Y., Iusem, A.N.: A strongly convergent method for nonsmooth convex minimization in
Hilbert spaces. Numer Funct Anal Optim 32, 1009–1018 (2011)
9. Bello Cruz, J.Y., Iusem, A.N.: Convergence of direct methods for paramonotone variational inequalities.
Comput Optim Appl 46, 247–263 (2010)
10. Bello Cruz, J.Y., Nghia, T.T.A.: On the convergence of the proximal forward-backward splitting method
with linesearches. Technical report, 2015, Available in arXiv:1501.02501.pdf
11. Bot, R., Csetnek, E.R.: Forward-Backward and Tseng’s type penalty schemes for monotone inclusion
problems. Set-Valued and Variational Analysis 22, 313–331 (2014)
12. Candes, E.J., Tao, T.: Decoding by linear programming. IEEE Trans Inf Theory 51, 4203–4215 (2005)
13. Chavent, G., Kunisch, K.: Convergence of Tikhonov regularization for constrained ill-posed inverse
problems. Inverse Prob 10, 63–76 (1994)
14. Chen, G.H.-G., Rockafellar, R.T.: Convergence rates in forward-backward splitting. SIAM J Optim 7,
421–444 (1997)
15. Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators.
Optimization 53, 475–504 (2004)
16. Combettes, P.L.: Quasi-Fejérian analysis of some optimization algorithms. Inherently Parallel Algo-
rithms in Feasibility and Optimization and Their Applications. Studies in Computational Mathematics 8
115–152 North-Holland Amsterdam (2001)
17. Combettes, P.L., Pesquet, J.-C.: A Douglas-Rachford splitting approach to nonsmooth convex variational
signal recovery. IEEE Journal of selected topics in signal precessing 1, 564–574 (2007)
18. Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Fixed-Point Algo-
rithms for Inverse Problems. Science and Engineering. Springer Optimization and Its Applications 49
185–212 Springer New York (2011)
19. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model
Simul 4, 1168–1200 (2005)
20. Combettes, P.L., Vũ, B.C.: Variable metric forward-backward splitting with applications to monotone
inclusions in duality. Optimization 63, 1289–1318 (2014)
21. D’Antonio, G., Frangioni, A.: Convergence analysis of deflected conditional approximate subgradient
methods. SIAM J Optim 20, 357–386 (2009)
22. Ermoliev, Y.U.M.: On the method of generalized stochastic gradients and quasi-Fejér sequences.
Cybernetics 5, 208–220 (1969)
23. Figueiredo, M., Novak, R., Wright, S.J.: Gradient projection for sparse reconstruction: application to
compressed sensing and other inverse problems. IEEE J Sel Top Sign Proces 1, 586–597 (2007)
24. Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images.
IEEE Trans Pattern Anal Mach Intell 6, 721–741 (1984)
25. Held, M., Wolfe, P., Crowder, H.: Validation of subgradient optimization. Math Program 6, 66–68 (1974)
26. James, G.M., Radchenko, P., Lv, J.: DASSO: connections between the Dantzig selector and lasso. J R
Stat Soc Ser B Stat Methodol 71, 127–142 (2009)
27. Kim, S., Ahn, H., Cho, S.-C.: Variable target value subgradient method. Math Program 49, 359–369
(1991)
28. Kiwiel, K.C.: Convergence of approximate and incremental subgradient methods for convex optimiza-
tion. SIAM J Optim 14, 807–840 (2006)
On Proximal Subgradient Splitting Method for Minimizing the sum...

29. Kiwiel, K.C.: The efficiency of subgradient projection methods for convex optimization, Parts I: general
level methods. SIAM J Control Optim 34, 660–676 (1996)
30. Larson, T., Patriksson, M., Stromberg, A.-B.: Conditional subgradient optimization - Theory and
application. Eur J Oper Res 88, 382–403 (1996)
31. Mosci, S., Rosasco, L., Santoro, M., Verri, A., Villa, S., Sebag, M.: Solving structured sparsity reg-
ularization with proximal methods. In: Balczar, J., Bonchi, F., Gionis, A. (eds.) Machine Learning
and Knowledge Discovery in Databases, 6322 of Lecture Notes in Computer Science, Springer, 2010,
418–433
32. Neal, P., Boyd, S.: Proximal Algorithms. Foundations and Trends in Optimization 1, 127–239 (2014)
33. Nedic, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM
J Optim 12, 109–138 (2001)
34. Nesterov, Y.U.: Subgradient methods for huge-scale optimization problems. Math Program 146, 275–
297 (2014)
35. Nesterov, Y.U.: Gradient methods for minimizing composite functions. Math Program 140, 125–161
(2013)
36. Nesterov, Y.U.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM
J Optim 22, 341–362 (2012)
37. Nesterov, Y.U.: Introductory lectures on convex optimization: A Basic Course. Kluwer Academic
Publishers Norwel (2004)
38. Polyak, B.T.: Minimization of unsmooth functionals U.S.S.R. Comput Math Math Phys 9, 14–29 (1969)
39. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J Control Optim 14,
877–898 (1976)
40. Sagastizb̈al, C.: Composite proximal bundle method. Math Program 140, 189–233 (2013)
41. Sherali, H.D., Choi, G., Tuncbilek, C.H.: A variable target value method for nondifferentiable optimiza-
tion. A variable target value method for nondifferentiable optimization, 1–8 (1997)
42. Svaiter, B.F.: A class of Fejér convergent algorithms, approximate resolvents and the hybrid Proximal-
Extragradient method. J Optim Theory Appl 162, 133–153 (2014)
43. Tropp, J.: Just relax: convex programming methods for identifying sparse signals. IEEE Trans Inf Theory
51, 1030–1051 (2006)
44. Zhu, D.L., Marcotte, P.: Co-coercivity and its role in the convergence of iterative schemes for solving
variational inequalities. SIAM J Optim 6, 714–726 (1996)

C100 Service Training Manual:: All Wheel Drive (AWD)
No ratings yet
C100 Service Training Manual:: All Wheel Drive (AWD)
18 pages
Bojowald - Canonical Gravity and Applications: Cosmology, Black Holes, and Quantum Gravity
100% (1)
Bojowald - Canonical Gravity and Applications: Cosmology, Black Holes, and Quantum Gravity
313 pages
Playing Changes
100% (2)
Playing Changes
13 pages
MOU (00) - Introduction L
100% (2)
MOU (00) - Introduction L
37 pages
Work Sampling
100% (1)
Work Sampling
69 pages
ONGC Uran
No ratings yet
ONGC Uran
10 pages
DX Diag
No ratings yet
DX Diag
27 pages
Vibration DNV
100% (1)
Vibration DNV
10 pages
Extraction Process in The Ethanol Produc PDF
No ratings yet
Extraction Process in The Ethanol Produc PDF
7 pages
Water Heater Spreadsheet
No ratings yet
Water Heater Spreadsheet
16 pages
Design Shell Tube
No ratings yet
Design Shell Tube
3 pages
Chapter 4 Measures of Location
No ratings yet
Chapter 4 Measures of Location
37 pages
Ne XTFAQ
No ratings yet
Ne XTFAQ
103 pages
Lecture Notes Solid State Physics 1
No ratings yet
Lecture Notes Solid State Physics 1
28 pages
Steldeck Slab Design
No ratings yet
Steldeck Slab Design
18 pages
Chapter 3
No ratings yet
Chapter 3
77 pages
BODMAS 1new
No ratings yet
BODMAS 1new
2 pages
International Section 4 8 Mathematics Scholastic Year: 2022 / 2023
No ratings yet
International Section 4 8 Mathematics Scholastic Year: 2022 / 2023
25 pages
BULLETIN FOR THE HISTORYvol30-2
No ratings yet
BULLETIN FOR THE HISTORYvol30-2
100 pages
ITF24-DS-Assignment #1
No ratings yet
ITF24-DS-Assignment #1
3 pages
Amptec 601ES - Explosive Safety Digital Multimeter (DMM)
No ratings yet
Amptec 601ES - Explosive Safety Digital Multimeter (DMM)
2 pages
Brochure ADELE
No ratings yet
Brochure ADELE
12 pages
EN UserGuideISAKMetry
No ratings yet
EN UserGuideISAKMetry
32 pages
Math Project Correction
No ratings yet
Math Project Correction
8 pages
JTT v6.21 en
No ratings yet
JTT v6.21 en
32 pages
"Node - CPP" : #Include #Include #Include Class Public New
No ratings yet
"Node - CPP" : #Include #Include #Include Class Public New
9 pages
Everything You Ever Wanted To Functional Global Variables
No ratings yet
Everything You Ever Wanted To Functional Global Variables
51 pages
Reliability and Validity of The Research Methods Skills Assessment
No ratings yet
Reliability and Validity of The Research Methods Skills Assessment
11 pages
Worksheet Graphing Systems
No ratings yet
Worksheet Graphing Systems
3 pages
Cold Storage of Tomato The Good The Bad en The Ug-Wageningen University and Research 444870
No ratings yet
Cold Storage of Tomato The Good The Bad en The Ug-Wageningen University and Research 444870
1 page
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1175)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

Bellocruz 2016

Uploaded by

Bellocruz 2016

Uploaded by

Set-Valued Var.

On Proximal Subgradient Splitting Method

José Yunier Bello Cruz1

Received: 3 May 2015 / Accepted: 16 May 2016

Abstract In this paper we present a variant of the proximal forward-backward splitting

Keywords Convex problems · Nonsmooth optimization problems · Proximal

Mathematics Subject Classification (2010) 65K05 · 90C25 · 90C30

José Yunier Bello Cruz

1 Department of Mathematical Sciences, Northern Illinois University, DeKalb, IL 60115, USA

1.1 Assumptions and Preliminaries

Definition 1.1 Let S be a nonempty subset of H. A sequence (x k )k∈N in H is said to be

Fact 1.1 If the sequence (x k )k∈N is quasi-Fejér convergent to S, then:

2 The Proximal Subgradient Splitting Method

where hi : Rm → R (i = 0, . . . , n) are convex. It has as dual problem

where w k ∈ ∂g(x k ) is arbitrary.

αk2 uk + w̄ k+1 2 + x k − x2 − x k+1 − x = x k+1 − x k 2 + x k − x2 − x k+1 − x2

for any w k ∈ ∂g(x k ). We thus have shown that

Note that w k ∈ ∂g(x k ) is arbitrary and the result follows.

Corollary 2.4 Let (x k )k∈N be the sequence

2.1 Exogenous Stepsizes

We begin with a useful consequence of Lemma 2.1.

Corollary 2.5 Let x ∈ dom(g). Then, for all k ∈ N,

Now we define the auxiliary set

(a) The sequence (x k )k∈N is quasi-Fejér convergent to

we use Corollary 2.5, with x = x̄ ∈ Lf +g (x̄) ⊆ dom(g), to get

Summing, from k = 0 to m, the above inequality, we have

and taking limit, when m goes to ∞,

in contradiction with (13). Next, define ϕk := (f + − (f + g)(x̄), which is

≤ uk + w k x k − x k+1  ≤ (ζ + ρ)x k − x k+1 , (18)

implying that (f + g)(x̃) ≤ (f + g)(x̄) and thus x̃ ∈ Lf +g (x̄). As consequence, all

(a) lim infk→∞ (f + g)(x k ) = infx∈H (f + g)(x) = s∗ (possibly s∗ = −∞).

(f + g)(x̂) < lim inf(f + g)(x k ). (22)

(f + g)(x̄) ≤ lim inf(f + g)(x k ) = lim (f + g)(x k ) = lim inf(f + g)(x k ) = s∗ ,

2.2 Polyak Stepsizes

where 0 < γ ≤ γk ≤ 2 − γ . We assume that sk a monotone decreasing variable target value

Proof (a) It is direct consequence of Corollary 2.9.

Proof Items (a) to (c) are direct consequence

which contradicts (30), thus establishing the result.

Next we present a result on the complexity of the iterates.

lim (f + g)kbest = s̃.

Finally in the following

You might also like

αk2 uk + w̄ k+1 2 + x k − x2 − x k+1 − x = x k+1 − x k 2 + x k − x2 − x k+1 − x2

≤ uk + w k x k − x k+1 ≤ (ζ + ρ)x k − x k+1 , (18)