Constrained Optimization Solving Sample Average Aproximation CCP
Constrained Optimization Solving Sample Average Aproximation CCP
Abstract: Sample average approximation (SAA) is a tractable approach to deal with the chance
constrained programming, a challenging issue in stochastic programming. The constraint is
usually characterized by the 0/1 loss function which results in enormous difficulties in designing
numerical algorithms. Most current methods have been created based on the SAA reformulation,
such as binary integer programming or the relaxation. However, no viable algorithms have been
developed to tackle SAA directly, not to mention theoretical guarantees. In this paper, we
investigate a novel 0/1 constrained optimization problem, which provides a new way to address
arXiv:2210.11889v2 [math.OC] 5 Jan 2023
SAA. Specifically, by deriving the Bouligand tangent and Fréchet normal cones of the 0/1
constraint, we establish several optimality conditions including the one that can be equivalently
expressed by a system of equations, thereby allowing us to design a smoothing Newton type
method. We show that the proposed algorithm has a locally quadratic convergence rate and
high numerical performance.
1 Introduction
Chance constrained programming (CCP) is an efficient tool for decision making in uncertain envi-
ronments to hedge risk and has been extensively studied recently [1, 13, 9, 37, 30]. Applications of
CCP include supply chain management [20], optimization of chemical processes [16], surface water
quality management [38], just naming a few. A simple CCP problem takes the form of
or equivalently,
min f (x), s.t. P{g(x, ξ) 0} ≤ α,
x∈RK
1
1.1 Related work
Problem (CCP) is difficult to solve numerically in general for two reasons. The first reason is that
quantity P{g(x, ξ) ≤ 0} is hard to compute for a given x as it requires multi-dimensional integration.
The second reason is that the feasible set generally is nonconvex even g(·, ξ) is convex. Therefore,
to solve the problem, one common strategy is to make some assumptions on the distributions of
ξ. For example, for the case of the single chance constraint, the feasible set is convex provided
that α < 0.5 and ξ has a nondegenerate multivariate normal distribution [18] or (A(ξ); b(ξ)) has
a symmetric log-concave density [19]. It also can be expressed as a second-order cone constraint if
ξ has an elliptically symmetric distribution [15].
Without making assumptions on ξ, there are several approaches by sampling to approximate the
probabilistic constraint. These techniques can reformulate problem (CCP) as integer programming
or nonlinear programming. In [1], a single chance constrained problem has been reformulated as a
mixed-integer nonlinear programming and relaxed the integer variables into continuous ones. The
sample approximation [23, 1] is reformulated as a mixed-integer programming [24] which can be
solved by Branch and Bound approach [4] or branch-and-cut decomposition algorithm [22]. To avoid
solving mixed-integer programming, some other approximation methods have been proposed, such
as difference-of-convex (DC) functions approximation to the indicator function [17], inner-outer
approximation [13], and convex approximations [27]. In [17], a gradient-based Monte Carlo method
has been developed to solve a sequence of convex approximations which aimed at addressing the
DC approximation of problem (CCP).
As pointed out before, it is hard to solve (CCP) numerically. A popular approach is the
sample average approximation (SAA). Specifically, let ξ 1 , · · · , ξ N be independent and identically
distributed samples of N realizations of random vector ξ. Then based on model in [23, 29], SAA
takes the form of
N
1 X
min f (x), s.t. `0/1 max (g(x, ξ n ))m ≤ α, (SAA)
x∈RK N m=1,··· ,M
n=1
where `0/1 (t) is the 0/1 loss function [12, 21, 14] defined as
1, t > 0,
`0/1 (t) := (1.1)
0, t ≤ 0.
The function is known as the (Heaviside) step function [28, 11, 41] named after Oliver Heaviside
(1850-1925), an English mathematician and physicist. We would like to point out that (SAA) with
α > 0 is always non-convex. However, it gains the popularity because it requires relatively few
assumptions on the structure of (CCP) or the distribution of ξ and only yields statistical bounds
on solution feasibility and optimality, and requires replication to do so [22].
Therefore, there is an impressive body of work on developing numerical algorithms [29, 30, 35]
and establishing asymptotic convergence [23, 36, 37] for problem (SAA). One way to solve the
problem is to adopt the integer programming-based method [2]. When f and g are linear, a
branch-and-cut decomposition algorithm is proposed for problem (SAA) based on mixed-integer
linear optimization techniques [22]. Another method is by structuring (SAA) as nonlinear pro-
gramming. For example, the authors in [30] rewrote the chance constraint as a quantile constraint
and reformulated (CCP) through (SAA) as nonlinear programming. Then they cast a trust-region
2
method to tackle the problem with a joint chance constraint. In [9], (SAA) has been reformu-
lated as a cardinality-constrained nonlinear optimization problem which was solved by a sequential
method, where trial steps were computed through piecewise-quadratic penalty function models.
The proposed method has been proven to converge to a stationary point of the penalty function.
In summary, most aforementioned work focused on the surrogates of (SAA), not to mention
directly providing thorough optimality analysis and developing efficient algorithms for (SAA).
Zmax
:n := maxm∈M Zmn .
• If G(x) = (g(x, ξ 1 ) · · · g(x, ξ N )) and s = dαN e, the minimal integer no less than αN , then
model (SCO) turns to (SAA).
• If G(·) is a linear mapping and M = 1, then model (SCO) enables us to deal with support
vector machine [8, 40] and one-bit compressed sensing [5, 42].
1.3 Contributions
As far as we know, this is the first paper that processes (SAA) with 0/1 constraints directly and pro-
vides some theoretical guarantees as well as a viable numerical algorithm. The main contributions
of this paper are threefold.
S G := x ∈ RK : kG(x)k+
0 ≤s , (1.3)
and
S := Z ∈ RM ×N : kZk+
0 ≤s , (1.4)
3
we derive the projection of a point onto S as well as the Bouligand tangent and Fréchet normal
cones of S G and S, see Propositions 2.1 and 2.3. In addition, the established properties on
S G allow us to conduct extensive optimality analysis. Specifically, we introduce a KKT point,
define a τ -stationary point of (SCO), and then reveal their relationships to the local minimizers.
1.4 Organization
This paper is organized as follows. In the next section, we calculate the projection of one point
onto set S and derive the Bouligand tangent cone and normal cones of S G and S. In Section 3,
we establish two kinds of optimality conditions of problem (SCO) based on the normal cone of
S G . They are KKT points and τ -stationary points. We then reveal their relationships to the local
minimizers. In Section 4, we equivalently rewrite the τ -stationary point as a system of τ -stationary
equations. Then we develop a smoothing Newton type method, SNSCO, to solve the equations and
establish its locally quadratic convergence property. We implement SNSCO to solve problems with
a single chance constraint and joint chance constraints and give some concluding remarks in the
last two sections.
1.5 Notation
We end this section with defining some notation employed throughout this paper. Denote two
index sets as
Given a subset T ⊆ N , its cardinality and complement are |T | and T := N \ T . For a scalar a ∈ R,
dae represents the smallest integer no less than a. For two matrices A = (Amn )M ×N ∈ RM ×N and
B = (Bmn )M ×N ∈ RM ×N , we denote
4
The nth largest singular value of A ∈ RN ×N is denoted by σn (A), namely σ1 (A) ≥ σ2 (A) ≥ · · · ≥
σN (A). Particularly, we write
Γ+ := {n ∈ N : Zmax
:n > 0} ,
Γ0 := {n ∈ N : Zmax
:n = 0} ,
(1.6)
Γ− := {n ∈ N : Zmax
:n < 0} ,
VΓ := {(m, n) ∈ M × N : Zmn = 0, ∀ n ∈ Γ} , Γ ⊆ N .
We point out that Γ+ , Γ0 , Γ− , and VΓ depend on Z, but we will drop their dependence if no
additional explanations are provided. Recalling (1.2), the above definitions indicate that
kZk+
0 = |Γ+ |. (1.7)
For G(x) : RK → RM ×N and an index set J , we denote Gmn (x) as its (m, n)th element and
∇J G(x) ∈ RK×|J | as a matrix with columns consisting of ∇Gmn (x) with (m, n) ∈ J , namely,
where W ∈ RM ×N . Particularly,
P
∇G(x) ◦ W := (m,n)∈M×N Wmn ∇Gmn (x).
One can observe that Γs contains |Γs | indices corresponding to the first |Γs | largest values in
{kZ+
:i k : i ∈ Γ+ }. Moreover, the set defined above has the following properties.
5
• For any T ∈ T(Z; s) and any n ∈ T
∈ (0, Z↓s ], n ∈ Γ+ \ Γs ,
kZ+
:n k (2.2)
= 0 ≤ Z↓s , n ∈ Γ0 ,
and
T = N \ T = Γs ∪ Γ− . (2.3)
• If kZk+
0 ≤ s, then Γs = Γ+ . Hence T(Z; s) = {Γ0 } and T = Γ+ ∪ Γ− .
Since Γ+ = {1, 2}, Γ0 = {3}, and Γ− = {4}, it is easy to check that kZk+
0 = 2, T(Z; 3) = T(Z; 2) =
{{3}} and T(Z; 1) = {{1, 3}, {2, 3}}.
2.1 Projection
It is well known that the solution set of the right hand side problem is a singleton when Ω is convex
and might have multiple elements otherwise. The following property shows that the projection
onto S has a closed form.
The proof is quite straightforward and thus is omitted here. We provide an example to illustrate
(2.6). Again consider case (2.4). For s = 3 or 2, ΠS (Z) = {Z} due to T(Z; 3) = T(Z; 2) = {{3}}.
For s = 1, T(Z; 1) = {{1, 3}, {2, 3}} and thus
(" # " #)
0 2 0 −1 2 0 0 −1
ΠS (Z) = , .
0 −1 −2 −3 0 −1 −2 −3
With the help of the closed form of projection ΠS (Z), we establish the following fixed point inclusion.
Z ∈ ΠS (Z + τ W) (2.7)
• kZk+
0 < s and W = 0;
6
• kZk+
0 = s and
We first show ‘if’ part. Relation (2.7) is clearly true for the case of kZk+
0 < s by W = 0. For
kZk+
0 = s, we have
Z:Γ0 + τ W:Γ0 , Z:Γ+ + τ W:Γ+ , Z:Γ− + τ W:Γ−
(2.10)
(2.8)
= Z + τ W = Z:Γ0 + τ W:Γ0 , Z:Γ+ , Z:Γ− ,
Moreover, we have
(2.8,2.9) (2.8)
∀ n ∈ Γ0 , k(Z:n + τ W:n )+ k = τ kW:n k ≤ Z↓s . (2.12)
↓ ↓
Since kZk+
0 = s, there is |Γ+ | = s. This together with (2.10) and (2.12) suffices to Zs = (Z + τ W)s ,
which by (2.11) and (2.12) implies Γ0 ∈ T(Z + τ W; s). Then it follows from Proposition 2.1 that
h i
ΠS (Z + τ W) 3 (Z:Γ0 + τ W:Γ0 )− , Z:Γ0 + τ W:Γ0
(2.8,2.9)
h i
= Z:Γ0 , Z:Γ0 = Z.
Next, we prove ‘only if’ part. We note that (2.7) means Z ∈ S, namely, kZk+
0 ≤ s. Now we claim
the conclusion by two cases.
Case 1: kZk+ +
0 < s. Condition (2.7) implies kZ + τ Wk0 < s, leading to
Z ∈ ΠS (Z + τ W) = {Z + τ W},
7
It follows from Proposition 2.1 that there is a set T ∈ T(Z + τ W; s) such that
(2.13)
[Z:T , Z:Γ0s , Z:Γ0− ] = Z:T , Z:T = Z
(2.6)
(Z:T + τ W:T )− , Z:T + τ W:T (2.14)
=
(2.13)
= [(Z:T + τ W:T )− , Z:Γ0s + τ W:Γ0s , Z:Γ0− + τ W:Γ0− ],
Conditions (2.14) and the definitions of Γ0s and Γ0− enable us to obtain
≤ 0, n ∈ T,
max
Z:n > 0, n ∈ Γ0s , (2.16)
n ∈ Γ0− .
< 0,
Recalling Γ0 = {n ∈ N : Zmax
:n = 0}, it follows
where ‘≥’ is due to the definition of {Γ0 } = T(Z + τ W; s) in (2.3). Finally, (2.18), (2.15), and
(2.19) enable us to conclude (2.8).
Recalling that for any nonempty closed set Ω ⊆ RK , its Bouligand tangent cone TΩ (x) and corre-
sponding Fréchet normal cone NbΩ (x) at point x ∈ Ω are defined as [34]:
n o
Ω
TΩ (x) := d ∈ RK : ∃ η` ≥ 0, x` → x such that η` (x` − x) → d , (2.20)
n o
N
bΩ (x) := u ∈ RK : hu, di ≤ 0, ∀ d ∈ TΩ (x) , (2.21)
8
Ω
where x` →x represents lim`→∞ x` = x and x` → x stands for x` ∈ Ω for every k and x` → x. Let
S
n∈N Ωn be the union of finitely many nonempty and closed subsets Ωn . Then by [3, Proposition
S
3.1], for any x ∈ n∈N Ωn with N (x) := {n ∈ N : x ∈ Ωn }, we have
S
TSn∈N Ωn (x) = n∈N (x) TΩn (x). (2.22)
and corresponding index sets Γ+ , Γ0 , Γ− , and VΓ in (1.6) are defined for G(x). Therefore, we have
Γ+ = {n ∈ N : (G(x))max
:n > 0} ,
Γ− = {n ∈ N : (G(x))max
:n < 0} ,
(2.24)
Γ0 = {n ∈ N : (G(x))max
:n = 0} ,
Based on set P(Γ0 , s − |Γ+ |) and the above notation, we express the Bouligand tangent cones and
corresponding Fréchet normal cone of S G explicitly by the following theorem.
Proposition 2.3. Suppose ∇VΓ0 G(x) is full column rank. Then Bouligand tangent cone TS G (x)
bS G (x) at x ∈ S G are given by
and Fréchet normal cone N
[ n o
TS G (x) = d ∈ RK : d> ∇VΓ0 \Γ G(x) ≤ 0 , (2.25)
Γ∈P(Γ0 ,s−|Γ+ |)
n o
if kG(x)k+
∇VΓ G(x) ◦ WVΓ : WVΓ ≥ 0 ,
0 = s,
0 0 0
N
bS G (x) = (2.26)
kG(x)k+
{0}, if 0 < s.
Proof. For notational simplicity, denote P := P(Γ0 , s − |Γ+ |). For any fixed x ∈ S G , it follow from
(2.24) that
> 0, n ∈ Γ+ ,
x ∈ SΓG K max
:= z ∈ R : (G(z)):n < 0, n ∈ Γ− , , ∀ Γ ∈ P,
≤ 0, n ∈ Γ \ Γ
0
and hence
SΓG .
T
x∈ Γ∈P (2.27)
9
SΓG ⊆ S G , which by (2.27) and (2.22) yields
S
It is easy to see that Γ∈P
Since ∇VΓ0 G(x) is full column rank, so is ∇VΓ0 \Γ G(x). Then for any Γ ∈ P, the Bouligand tangent
cone of SΓG at x is
n o
TS G (x) = d ∈ RK : d> ∇VΓ0 \Γ G(x) ≤ 0 . (2.29)
Γ
(2.25)
n o
TS G (x) = d ∈ RK : d> ∇VΓ0 G(x) ≤ 0 .
bS G (x) = {0}. If |Γ0 | > s − |Γ+ |, then |Γ0 | ≥ 2 due to |Γ+ | = kG(x)k+ < s. Let
then N 0
The above condition and the full column rankness of ∇VΓ0 G(x) yield that,
for each `. Now taking ` = 2, 3, · · · , |Γ0 | enables us to show that WV1 Γ = 0, which by (2.31)
0 \Γ1
proves u = 0, as desired.
10
Remark 2.1. In particular, Bouligand tangent cone TS (Z) and corresponding Fréchet normal cone
bS (Z) at Z ∈ S are
N
n o
W ∈ RM ×N : WVΓ0 \Γ ≤ 0 ,
S
TS (Z) = (2.32)
Γ∈P(Γ0 ,s−|Γ+ |)
n o
W ∈ RM ×N : WVΓ ≥ 0, W , kZk+
= 0, 0 = s,
0 V Γ0
NS (Z) =
b (2.33)
+
{0}, kZk0 < s,
where V Γ0 = (M × N ) \ VΓ0 . It should be noted that if ∇VΓ0 G(x) is full column rank, together with
(2.26) and (2.33), the Fréchet normal cone of S G at x ∈ S G can be written as
n o
bS G (x) = ∇G(x) ◦ W : W ∈ N
N bS (Z), Z = G(x) . (2.34)
We end this section with giving one example to illustrate the tangent and normal cones of
S = {Z ∈ R2×4 : kZk+
0 ≤ 2} at
" # " #
2 2 0 −1 2 0 0 −1
Z= , Z0 = , M = 2, N = 4. (2.35)
0 −1 −2 −3 0 −1 −2 −3
bS (Z0 ) = {0} .
N
3 Optimality Analysis
In this section, we aim at establishing the first order necessary or sufficient optimality conditions
of (SCO). Hereafter, we always assume its feasible set is non-empty if no additional information is
provided and let
Z∗ = G(x∗ ). (3.1)
Similar to (2.24), we always denote Γ∗+ , Γ∗0 , Γ∗− , VΓ∗ for Z∗ = G(x∗ ) and
∗
J∗ := VΓ∗0 = {(m, n) : Zmn = 0, ∀ n ∈ Γ∗0 }. (3.2)
We first build the relation between a KKT point and a local minimizer.
11
Theorem 3.1 (KKT points and local minimizers). The following relationships hold for (SCO).
a) A local minimizer x∗ is a KKT point if ∇J∗ G(x∗ ) is full column rank. Furthermore, if
kZ∗ k+ ∗ ∗
0 < s, then W = 0 and ∇f (x ) = 0.
b) A KKT point x∗ is a local minimizer if functions f and Gmn for all m ∈ M, n ∈ N are
locally convex around x∗ .
Proof. a) From [34, Theorem 6.12], a minimizer of problem (1.3) satisfies −∇f (x∗ ) ∈ N
bS G (x∗ ).
bS G (x∗ ) in (2.34), the results hold obviously.
Together with the expression of N
b) Let (x∗ , W∗ ) be a KKT point satisfying (3.3). We prove the conclusion by two cases.
Case kZ∗ k+ ∗ ∗
0 < s. Condition (3.3) and NS (Z ) = {0} from (2.33) suffice to W = 0 and
b
∇f (x∗ ) = 0, which by the local convexity of f around x∗ yields
∗ =G
Zmn − Zmn ∗ ∗ ∗ (3.4)
mn (x) − Gmn (x ) ≥ h∇Gmn (x ), x − x i,
• It follows from W∗ ∈ N
bS (Z∗ ) in (3.3), (3.2), and (2.33) that
∗ ≥ 0,
WJ ∗ = 0,
WJ Z∗J∗ = 0. (3.5)
∗ ∗
Γ+ = {n ∈ N : Zmax ∗ max ∗
:n > 0} = {n ∈ N : (Z ):n > 0} = Γ+ .
ZJ∗ ≤ 0. (3.6)
Finally, these three facts and the convexity of f can conclude that
12
3.2 τ -stationary points
Our next result is about the τ -stationary point of (SCO) defined as follows: A point x∗ ∈ RK is
called a τ -stationary point of (SCO) for some τ > 0 if there is a W∗ ∈ RM ×N such that
∇f (x∗ ) + ∇G(x∗ ) ◦ W∗ = 0,
(3.7)
(Z∗ τ W∗ ) Z∗ .
ΠS + 3
The following result shows that s τ -stationary point also has a close relationship with the local
minimizer of problem (SCO).
Theorem 3.2 (τ -stationary points and local minimizers). The following results hold for (SCO).
a) Suppose ∇J∗ G(x∗ ) is full column rank. A local minimizer x∗ is also a τ -stationary point
∗ k : ∇f (x∗ ) + ∇ G(x∗ ) ◦ W∗ = 0 .
r∗ := maxn∈Γ∗0 kW:n J∗ J∗ (3.8)
b) A τ -stationary point with τ > 0 is a local minimizer if functions f and Gmn for all m ∈ M, n ∈
N are locally convex around x∗ .
Proof. a) It follows from Theorem 3.1 that a local minimizer x∗ is also a KKT point. Therefore,
we have condition (3.3). So to prove the τ -stationarity in (3.7), we only need to show
Z∗ ∈ ΠS (Z∗ + τ W∗ ) . (3.9)
If kZ∗ k+ ∗ ∗
0 < s, then (3.3) and NS (Z ) = {0} from (2.33) yield W = 0, resulting in (3.9) for any
b
τ > 0. Now consider the case of kZ∗ k+
0 = s. Under such a case, conditions (3.5) hold, which by
J∗ = VΓ∗0 ⊆ M × Γ∗0
∗
W:Γ∗ = 0,
∗
0 ≥ Z∗:Γ∗0 ⊥ W:Γ∗ ≥ 0.
0
(3.10)
0
(3.5)
0 = ∇f (x∗ ) + ∇G(x∗ ) ◦ W∗ = ∇f (x∗ ) + ∇J∗ G(x∗ ) ◦ WJ
∗ ,
∗
(3.8)
∗ ↓
∀ n ∈ Γ∗0 , τ kW:n
∗ k ≤ τ max
∗ i∈Γ∗0 kW:n k = τ∗ r∗ = (Z )s .
The above condition and (3.10) show that (3.9) by Proposition 2.2.
b) We only prove that a τ -stationary point is a KKT point because Theorem 3.1 b) enables us
to conclude the conclusion immediately. We note that a τ -stationary point satisfies (3.9), leading
13
to kZ∗ k+ ∗ ∗ ∗ +
0 ≤ s. Comparing (3.7) and (3.3), we only need to prove W ∈ NS (Z ). If kZ k0 < s,
b
bS (Z∗ ) by (2.33). If kZ∗ k+ = s, then Proposition 2.2 shows
then Proposition 2.2 shows W∗ = 0 ∈ N 0
(3.10), which by the definition of VΓ∗0 in (3.2) indicates
WV∗ Γ∗ ≥ 0, WV∗ = 0,
0 Γ∗
0
contributing to W∗ ∈ N
bS (Z∗ ) by (2.33).
4 Newton Method
In this section, we cast a Newton-type algorithm that aims to find a τ -stationary point to problem
(SCO). Hereafter, for τ > 0, we always denote
Z := G(x),
(4.1)
Λ := Z + τ W,
where (x; y) = (x> y> )> and vec(W) is the vector formed by entries in W along with the columns.
Similar definitions to (4.1) are also applied into w∗ := (x∗ ; vec(W∗ )) and w` := (x` ; vec(W` )),
where the former is a τ -stationary point and the latter is the point generated by our proposed
algorithm at the `th step. Moreover, we denote a system of equations as
∇f (x) + ∇J G(x) ◦ WJ
F(w; J ) := (4.2)
vec(ZJ )
vec(WJ )
To employ the Newton method, we need to convert a τ -stationary point satisfying (3.7) to a system
of equations, stated as the following theorem.
14
Theorem 4.1 (τ -stationary equations). A point x∗ is a τ -stationary point with τ > 0 of (SCO) if
and only if there is a W∗ ∈ RM ×N such that
T(Λ∗ ; s) 3 Γ∗0 ,
UΓ∗0 = J∗ , (4.5)
F(w∗ ; J∗ ) = 0,
∗
W:Γ∗ = 0,
∗
0 ≥ Z∗:Γ∗0 ⊥ W:Γ ∗ ≥ 0,
0
(4.7)
0
∗
To show J∗ = UΓ∗0 , we only need to show Zmn = 0 ⇔ (Z ∗ + τ W ∗ )mn ≥ 0 for any n ∈ Γ∗0 . This
is clearly true due to the second condition in (4.7). The conditions in (4.7) and (4.6) indicate
∗ = 0, thereby resulting in
WJ−
∗ ∗
WJ = WJ ∗ = 0.
∗ − ∪(M×Γ0 )
(3.7)
0 = ∇f (x∗ ) + ∇G(x∗ ) ◦ W∗ = ∇f (x∗ ) + ∇J∗ G(x∗ ) ◦ WJ
∗ .
∗
Sufficiency. We aim to prove (3.7). Condition ∇f (x∗ ) + ∇G(x∗ ) ◦ W∗ = 0 follows from the
first and third equations in (4.5) immediately. We next show Z∗ ∈ ΠS (Λ∗ ) in (3.7). Condition
J∗ = UΓ∗0 implies that
Z∗:Γ∗0 = 0, ∗
W:Γ ∗ ≥ 0,
0
(4.8)
∗
We finally show the left-hand side of (4.9) is Z∗ . Since J∗ ⊆ M × Γ∗0 , we have M × Γ0 ⊆ J ∗ ,
∗ = 0 from (4.5). Hence,
thereby leading to W:Γ ∗
0
15
Condition (4.5) means that WJ∗− = 0 due to J− ⊆ J ∗ . As a result,
(4.6) (4.8)
Λ∗J∗ = (Z∗ + τ W∗ )J∗ = τ WJ
∗
∗
≥ 0,
(4.6)
Λ∗J− = (Z∗ + τ W∗ )J− = Z∗J− < 0.
Using the above conditions and (4.6) enables us to show (Λ∗J∗ )− = Z∗J∗ and (Λ∗J− )− = Z∗J− , which
combining J− ∪ J∗ = (M × Γ∗0 ) and conditions (4.9) and (4.10) proves Z∗ ∈ ΠS (Λ∗ ).
We note that equations F(w∗ ; J∗ ) = 0 in (4.5) involve an unknown set J∗ . Therefore, to proceed
with the Newton method, we have to find J∗ , which will be adaptively updated by using the
approximation of w∗ . More precisely, let w` be the current point, we first select a T` ∈ T(Λ` ; s),
based on which we find Newton direction d` ∈ Rn+mK by solving the following linear equations:
with ν ∈ (0, 1) and ρ > 0. The framework of our proposed method is presented in Algorithm 1.
i) One of the halting conditions makes use of kF(w` ; UT` )k. The reason behind this is that if point
w` satisfies kF(w` ; UT` )k = 0, then it is a τ -stationary point of (SCO) by Theorem 4.1.
16
ii) Recalling (4.5), we are expected to update d` by solving
instead of (4.11). However, the major concern is made on the existence of d` by solving (4.14).
To overcome such a drawback, we add a smoothing term −µ` I|J | to increase the possibilities of
the non-singularity of ∇Fµ` (w` ; UT` ). Such an idea has been extensively used in literature, e.g.,
[7, 43]
iii) When the algorithm derives a direction d` , we use condition (4.13) to decide the step size. This
condition allows the next point to be chosen in a larger region to some extent by setting γ > 0.
However, it can ensure that the next point does not step far away from the feasible region by
setting a small value of γ (e.g, γ = 0.25). In this way, the algorithm performs relatively steadily.
In addition, we will show that if the starting point is chosen close to a stationary point, condition
(4.13) can be always satisfied with π t` = 1 from Theorem 4.2.
To establish the locally quadratic convergence, we need the following assumptions for a given τ -
stationary point x∗ of (SCO).
Assumption 4.1. Let x∗ be any τ -stationary point of (SCO). Suppose that ∇2 f (·) and ∇2 G(·)
are locally Lipschitz continuous around x∗ , respectively.
Assumption 4.2. Let x∗ be any τ -stationary point of (SCO). Assume functions f and G are twice
continuously differentiable on RK , ∇J∗ G(x∗ ) is full column rank, where J∗ is given by (3.2), and
∇2 f (x∗ ) + ∇2J∗ G(x∗ ) ◦ WJ
∗ is positive definite, where W∗ is uniquely decided by ∇ G(x∗ ) ◦
∗ J∗ J∗
∗ = −∇f (x∗ ).
WJ∗
We point out that the above two assumptions are related to the regularity conditions [33, 10]
usually used to achieve the convergence results for Newton type methods. Moreover, establishing
the quadratic convergence of the proposed smooth Newton method, SNSCO, is not trivial, because
differing from the standard system of equations, the τ -stationary equations, (4.5), involve an un-
known set UΓ∗0 . If we know this set in advance, then there is not much difficulty in building the
quadratic convergence. However, set UT` may change from one iteration to another. A different
set leads to a different system of equations F(w; UT` ) = 0. Hence, in each step, the algorithm finds
a Newton direction for a different system of equations instead of a fixed system. This is where
the standard proof for quadratic convergence fails to fit our case. Hence, we take a long way to
establish the locally quadratic convergence in the sequel.
Lemma 4.1. Let w∗ be a τ -stationary point with 0 < τ < τ∗ of (SCO). Then there is an η∗ > 0
such that for any w ∈ N(w∗ , η∗ ) and any T ∈ T(Λ; s),
F(w∗ ; UT ) = 0. (4.15)
Proof. a) First of all, let Γ∗+ , Γ∗− , and Γ∗0 be defined for Z∗ , while let Γ+ , Γ− , and Γ0 be defined for
17
Λ as
Γ+ = {n ∈ N : Λmax
:n > 0} ,
Γ− = {n ∈ N : Λmax
:n < 0} ,
(4.16)
Γ0 = {n ∈ N : Λmax
:n = 0} .
Similar to (2.1), let Γs ⊆ Γ+ extract s indices in Γ+ that correspond to the first s largest elements
in {kΛ+
:n k : n ∈ Γ+ }. Moreover, we define J∗ as (3.2) and J as
It follows from (4.5) in Theorem 4.1 that a τ -stationary point w∗ satisfies Γ∗0 ∈ T(Λ∗ ; s), UΓ∗0 = J∗ ,
and
Consider any w ∈ N(w∗ , η∗ ) with a sufficiently small radius η∗ > 0. For such w, we define Z and
Λ, which also defines J = UT as (4.17). To show (4.15), we need to prove
∇f (x∗ ) + ∇J G(x∗ ) ◦ WJ
∗ ∗ = 0.
= 0, Z∗J = 0, WJ (4.19)
Then condition (4.18) can immediately derive (4.19) due to (4.20) and
∗ = ∇f (x∗ ) + ∇G(x∗ ) ◦ W∗
∇f (x∗ ) + ∇J G(x∗ ) ◦ WJ
= 0.
n∈N
n ∈ Γ∗0 n ∈ Γ∗+ n ∈ Γ∗−
Z∗ = 0
mn
(m, n) ∈ J∗ : ∗ ≥0
Wmn
(Z∗ )max
:n > 0
(Z∗ )max
:n < 0
∗
Λmn ≥ 0
∗ =0
∗ =0
m∈M Wmn Wmn
Z∗ < 0
mn (Λ∗ )max (Λ∗ )max
:n > 0 :n < 0
(m, n) ∈ J− : ∗ =0
Wmn
∗
Λmn < 0
18
In the table, we used the facts from Λ∗ = Z∗ + τ W ∗ , definition (3.2), and Proposition 2.2 that
∗ = 0,
W:Γ 0 ≥ Z∗:Γ∗ ⊥ W:Γ
∗ ≥0
∗ ∗
0 0 0
Since η∗ > 0 can be set sufficiently small, w can be close to w∗ , and so is Λ to Λ∗ , which shows
∗ max
∀ n ∈ Γ∗+ : (Z∗ )max max
:n = (Λ ):n > 0 =⇒ (Λ):n > 0,
(4.21)
∗ max
∀ n ∈ Γ∗− : (Z∗ )max max
:n = (Λ ):n < 0 =⇒ (Λ):n < 0.
The definition of T(Λ, s) in (2.1) implies T = Γ0 ∪ (Γ+ \ Γs ) for any given T ∈ T(Λ, s). Condition
(4.21) suffices to Γ∗+ ⊆ Γs ⊆ Γ+ and Γ∗− ⊆ Γ− . Therefore, we must have T ⊆ Γ∗0 . Now combining
this condition, Table 1, (4.17), (m0 , n0 ) ∈ J = UT , and (m0 , n0 ) ∈
/ J∗ , we can claim that (m0 , n0 ) ∈
J− , thereby resulting in Λ∗m0 n0 <0. However, since Λ is relatively close to Λ∗ , we have Λm0 n0 <0,
which contradicts to (m0 , n0 ) ∈ J in (4.17). So we prove J ⊆ J∗ , the first condition in (4.20)
∗ = 0. If kZ∗ k+ < s, then W∗ = 0 by Proposition 2.2. The conclusion
Finally, we prove WJ 0
is clearly true. We focus on the case of kZ∗ k+ ∗ ∗ +
0 = s. This indicates |Γ+ | = kZ k0 = s. Again, by
(4.21), we can derive
Lemma 4.2. If Hessian ∇2 ϕ(·) is locally Lipschitz continuous around w∗ , then so are gradient
∇ϕ(·) and function ϕ(·).
Proof. Let Lϕ be the Lipschitz continuous constant of ∇2 ϕ(·) around w∗ . For any w ∈ N (w∗ , δ∗ ),
by letting wβ∗ := w∗ + β(w − w∗ ) for some β ∈ (0, 1), the Mean Value Theory results in
R1
k∇ϕ(w) − ∇ϕ(w∗ )k = k 0 [∇2 ϕ(wβ∗ )(w − w∗ )dβk
R1
=k 0 [∇2 ϕ(wβ∗ ) − ∇2 ϕ(w∗ )](w − w∗ )dβk
R1
+k 0 ∇2 ϕ(w∗ )(w − w∗ )dβk
R1
≤ Lϕ kw − w∗ k 0 kwβ∗ − w∗ kdβ + k∇2 ϕ(w∗ )kkw − w∗ k
showing that gradient ∇ϕ(·) is locally Lipschitz continuous around w∗ , which using the similar
reasoning allows us to check the locally Lipschitz continuity of ϕ(·) around w∗ .
19
Lemma 4.3. Under Assumption 4.1, for any w ∈ N(w∗ , δ∗ ),
Proof. By Lemma 4.2, the locally Lipschitz continuity of ∇2 G(·) around x∗ implies that ∇G(·) is
also locally Lipschitz continuous around x∗ . Let the locally Lipschitz constants of ∇2 G(·), ∇G(·),
and ∇2 f (·) around x∗ be L2 , L1 , and Lf . Then we have
and also
max{kx − x∗ k, kW − W∗ kF } ≤ kw − w∗ k
k∇F(w; J ) − ∇F(w∗ ; J )k
20
Lemma 4.4. Let w∗ be a τ -stationary point with 0 < τ < τ∗ of problem (SCO). If Assumptions
4.2 and 4.1 hold, the there always exist positive constants c∗ , C∗ , δ∗ , and µ∗ such that for any
µ ∈ [0, µ∗ ], w ∈ N(w∗ , δ∗ ), and T ∈ Tτ (Λ; s),
Proof. Consider any w ∈ N(w∗ , δ∗ ) with a sufficiently small radius δ∗ ∈ (0, η∗ ], where η∗ is given
in Lemma 4.1. Similarly, we define Γ+ , Γ− , and Γ0 for Λ as (4.16) and J := UT as (4.17). Using
the notation in (4.4), we have
H(w; J ) 0
∇F(w; J ) =
,
(4.27)
0 0
where
∇2 f (x) + ∇2J G(x) ◦ WJ ∇J G(x)
H(w; J ) :=
.
(4.28)
∇J G(x)> 0
Since w ∈ N(w∗ , δ∗ ) for a sufficiently small δ∗ ≤ η∗ , it follows from (4.20) that J ⊆ J∗ and
∗ = 0. Therefore,
WJ
∇2 f (x∗ ) + ∇2J∗ G(x∗ ) ◦ WJ
∗ ∇ G(x∗ )
∗ J
H(w∗ ; J ) =
, (4.29)
∇J G(x )∗ > 0
∗ due to J ⊆ J and W∗ = 0.
where we used the fact that ∇2J G(x∗ ) ◦ WJ∗ = ∇2J∗ G(x∗ ) ◦ WJ ∗ ∗ J
Therefore, H(w∗ ; J ) is a sub-matrix of H(w∗ ; J∗ ) owing to J ⊆ J∗ . Recall the full rankness of
∇J∗ G(x∗ ) and the positive definiteness of ∇2 f (x∗ ) + ∇2J∗ G(x∗ ) ◦ WJ
∗ in Assumption 4.2. We can
∗
conclude that H(w∗ ; J ) is non-singular for any J ⊆ J∗ , namely σmin (H(w∗ ; J )) > 0. Then by [39,
Theorem 1] that the maximum singular value of a matrix is no less than the maximum singular
value of its sub-matrix, we obtain
which contributes to
21
To show the lower bound of σmin (∇F(w; J )), we need the following fact:
for any two matrices A and B, where the first inequality holds from [25, Reminder (2), on Page
76] and i0 satisfies σi0 (B) = σmin (B). Let µ∗ := L∗ δ∗ . The above fact allows us to derive
(4.31)
σmin (∇Fµ (w; J )) ≥ σmin (∇Fµ (w; J )) − k∇Fµ (w; J ) − ∇F(w; J )k
Theorem 4.2 (Locally quadratic convergence). Let w∗ be a τ -stationary point of problem (SCO)
with 0 < τ < τ∗ and {w` } be the sequence generated by Algorithm 1. If Assumptions 4.2 and 4.1
hold, then there always exist positive constants c∗ , C∗ , ∗ and µ∗ ensuring the following results if
choosing µ ∈ (0, µ∗ ], γ ≥ |Γ∗0 |/s, and w0 ∈ U (w∗ , ∗ ).
l p √ m
` ≥ log2 3c3∗ C∗3 (L∗ + ρC∗ )kw0 − w∗ k − log2 ( tol) . (4.32)
22
Proof. Let η∗ be given in Lemma 4.1, c∗ , C∗ , δ∗ , µ∗ are given in Lemma 4.4, and L∗ be defined in
Lemma 4.3. For notational simplicity, for ` = 0, 1, 2, · · · , let
J` := UT` , (4.33)
n oi
1 1
∗ ∈ 0, min δ∗ , 6c∗ (L∗ +ρC∗ ) , 2ρC∗ ,
a) We note that µ0 ≤ µ ≤ µ∗ . Then it follows from Lemma 4.1, Lemma 4.4, and w0 ∈ N(w∗ , ∗ )
that for any T0 ∈ Tτ (Λ0 ; s),
(4.15) (4.26)
F(w∗ ; J0 ) = 0, k(∇Fµ0 (w0 ; J0 ))−1 k ≤ c∗ , (4.35)
Recalling (4.2), for given J0 , F(·; J0 ) is locally Lipschitz continuous around x∗ by Lemma 4.2 due
to the locally Lipschitz continuity of ∇2 f (·) and ∇2 G(·). Then by the first condition in (4.35),
kd0 k is close to zero for a sufficiently small ∗ . This allows us to derive the following condition
(We shall emphasize that we can find a strictly positive bound > 0 such that the above conditions
can be achieved for any ∗ ∈ (0, ] due to the locally Lipschitz continuity of G(·) around x∗ .
Therefore, such an ∗ can be away from zero.) Based on the above conditions, we can obtain
Overall, we showed that (4.11) is solvable and (4.13) is satisfied with π t` = 1. Hence, the full
Newton is admitted, namely,
w 1 = w 0 + d0 . (4.37)
In addition, by using the fact that µ0 ≤ ρkF(w0 ; J0 )k, we have the following chain of inequalities,
23
This enables us to obtain
(4.37)
= L∗ (1 + β)kw0 − w∗ k + ρC∗ kw1 − w0 k
due to both w0 and wβ0 ∈ N(w∗ , δ∗ ). Note that for a fixed J0 , function F(·; J0 ) is differentiable.
So we have the following mean value expression
R1
F(w0 ; J0 ) = F(w∗ ; J0 ) + 0 ∇F(wβ0 ; J0 )(w0 − w∗ )dβ
(4.39)
(4.35) R 1
= 0 ∇F(wβ0 ; J0 )(w0 − w∗ )dβ,
(4.37)
kw1 − w∗ k = kw0 + d0 − w∗ k
(4.36)
= kw0 − w∗ − (∇Fµ0 (w0 ; J0 ))−1 F(w0 ; J0 )k
(4.35)
≤ c∗ k∇Fµ0 (w;0 J0 )(w0 − w∗ ) − F(w0 ; J0 )k
(4.39) R1
= c∗ k∇Fµ0 (w0 ; J0 )(w0 − w∗ ) − 0 ∇F(wβ0 ; J0 )(w0 − w∗ )dβk
R1
≤ c∗ 0 k∇Fµ0 (w0 ; J0 ) − ∇F(wβ0 ; J0 )k · kw0 − w∗ kdβ
(4.38) R1h i
≤ c∗ 0 (L∗ (1 + β) + ρC∗ )kw0 − w∗ k + ρC∗ kw1 − w∗ k kw0 − w∗ kdβ
where the last inequality is from ρC∗ kw0 − w∗ k ≤ ρC∗ ∗ ≤ 1/2 by (4.34). The above condition
immediately results in
(4.34)
≤ (1/2)kw0 − w∗ k < ∗ .
24
This means w1 ∈ N(w∗ , ∗ ). Replacing J0 by J1 , the same reasoning allows us to show that
for ` = 1, (i) (4.11) is solvable; (ii) the full Newton update is admitted; and (iii) kw2 − w∗ k ≤
3c∗ (L∗ + ρC∗ )kw1 − w∗ k2 ≤ (1/2)kw1 − w∗ k. By the induction, we can conclude that for any `,
• w` ∈ N(w∗ , ∗ );
• (4.11) is solvable;
(4.34)
≤ (1/2)kw` − w∗ k.
By ∇F(w; J ) = ∇F0 (w; J ), wβ` ∈ N(w∗ , ∗ ), and ∗ ≤ δ∗ , Lemma 4.4 allows us to derive
Again, for fixed J` , function F(·; J` ) is differentiable. The Mean-Value theorem indicates that there
exists a β0 ∈ (0, 1) satisfying
Therefore, one can easily verify that if (4.32) is satisfied then the term of the right hand side of the
above inequality is smaller than tol, namely, kF(w` ; J` )k < tol, showing the desired result.
25
Remark 4.2. Regarding the assumptions in Theorem 4.2, µ and γ can be set easily in the numerical
experiments. For example, we could set a small value for µ (e.g., 10−4 ) and let γ ∈ (0, 1) as |Γ∗0 | is
usually pretty small in the numerical experiments. Therefore, to achieve the quadratic convergence
rate, we have to choose a proper starting point w0 , which is apparently impractical since the final
stationary point is unknown in advance.
However, in our numerical experiments, we find that the quadratic convergence rate can be
always observed for solving some problems for different starting points, which indicates that the
proposed algorithm seems not to rely on the initial points heavily. For example, Fig. 1 presents
the results of SNSCO solving the norm optimization problems (see Section 5.1) under three different
starting points. From the left to the right figure, the three starting points are the point with each
entry being 0, the point with each entry being randomly generated from [0, 0.5], and the point with
each entry being 0.5. It can be clearly seen that all error kF(w` , UT` )k declines dramatically when
the iteration is over a certain threshold.
101
Error
10-5
10 20 20 40 10 20 20 40 10 20 20 40
Iteration Iteration Iteration
8.5 45
Number of iteration
=0.01
8.0 35
Obective
=0.05
=0.10
=0.01
=0.05
7.5 =0.10
25
Figure 2: Effect of τ .
5 Numerical Experiments
In this section, we will conduct some numerical experiments of SNSCO using MATLAB (R2020a) on
a laptop of 32GB memory and Core i9.
26
Starting point is always initialized as w0 = 0. The parameters are set as follows: maxIt =
2000, tol = 10−9 KM N, ρ = 10−2 , µ = 10−2 , γ = 0.25, ν = 0.999, and π = 0.25 if no additional in-
formation is provided. In addition, we always set s = dαN e where α is chosen from {0.01, 0.05, 0.1}.
We point out that our main theory about the τ stationary point involves an important parameter τ .
However, we test SNSCO on solving the norm optimization problem described below under different
choices of τ ∈ [10−4 , 1]. The lines presented in Fig. 2 indicate that the results are quite robust to
τ . Therefore, in the following numerical experiments, we fix it as τ = 0.75.
We use the norm optimization problem described in [17, 1] to demonstrate the performance of
SNSCO. The problem takes the form of
min −kxk1 ,
x∈RK (5.1)
nP o
K 2 x2 ≤ b, m ∈ M ≥ 1 − α, x ≥ 0,
s.t. P ξ
k=1 mk k
where kxk1 is the 1-norm of x. However, to fit the above problem into our model (SCO) without
other additional constraints, we aim to address the following one,
where λ2 and λ1 are two positive penalty parameters. Here, kx− k1 is used to exactly penalize con-
straint x ≥ 0. We add kxk2 in the objective function to guarantee the non-singularity of coefficient
matrix in (4.11) when updating the Newton direction. In the following numerical experiments we
fix λ2 = 0.5 and λ1 = 0.5 for simplicity. It is known from [17] that if ξmk , m ∈ M, k ∈ K are
independent and identically distributed standard normal random variables, then optimal solution
x to problem (5.1) is s
b
x1 = · · · = xK = ,
F −1
1
(1−α) M
(5.3)
χ2
K
where Fχ−1
2 (·) stands for the inverse distribution function of a chi-square distribution of degree K.
K
We first focus on the scenarios where model (5.2) has a single chance constraint, namely, M = 1.
We will compare SNSCO with two algorithms proposed in [1]: regularized algorithm (RegAlg) with a
convex start and relaxed algorithm (RelAlg). Both algorithms are adopted to solve problem (5.2).
We choose α ∈ {0.01, 0.05, 0.10}. For each fixed (α, K, N ), we run 100 trials and report the average
results of the objective function values and computational time in seconds.
a) Effect of K. To see this, we fix N = 100 and choose K ∈ {10, 20, · · · , 50}. The average
results are presented in Fig. 3, where we display the computational time in the log domain to make
the difference evidently. For the objective function values, SNSCO and RegAlg basically obtain
similar ones, which are slightly better than RelAlg. However, SNSCO runs the fastest with taking
less than e−3 ≈ 0.05 seconds. The slowest solver is RegAlg, which is not surprising since it often
has to restart the method when the point does not satisfy the optimality conditions.
27
0
3
Obective ( =0.01)
ln(Time) ( =0.01)
-7 0
SNSCO
-3
-14 RegAlg
RelAlg
0
3
Obective ( =0.05)
ln(Time) ( =0.05)
-7 0
SNSCO
-3
-14 RegAlg
RelAlg
0
3
Obective ( =0.10)
ln(Time) ( =0.10)
-7 0
SNSCO
-3
-14 RegAlg
RelAlg
10 20 30 40 50 10 20 30 40 50
K K
Figure 3: Effect of K.
b) Effect of N . To see this, we fix K = 10 and select N ∈ {100, 150, · · · , 300}. We emphasize
that SNSCO is able to solve instances with a much higher dimension N . However, we do not test on
instances with larger N because it will take a long time for RegAlg to solve the problem. The average
results are given in Fig. 4. Once again, SNSCO produces similar objective function values to those
of RegAlg and runs much faster than the other two algorithms. The advantage of computational
speed tends to be more obvious when N is rising. For example, when N = 300 and α = 0.1, RegAlg
consumes more than e6 ≈ 400 seconds while SNSCO only takes e−5 ≈ 0.007 seconds.
For solving problem (5.1) with joint constraints, we fix N = 100 while choose K and M from
{10, 20, · · · , 50}. Average results over 100 trials are reported in Table 2 where we fix M = 10
and Table 3 where we fix K = 10. Since for problem (5.1), optimal solution x is known as (5.3).
Therefore we compare x generated by SNSCO with x by calculating the objective function value for
problem (5.1). It can be clearly seen that −kxk1 is close to −kxk1 but usually larger than it. This
is because SNSCO solves (5.2), a relaxation of problem (5.1).
28
0
5
Obective ( =0.01)
ln(Time) ( =0.01)
2
-7
0
-2
-14 SNSCO
RegAlg
RelAlg
-5
0
5
Obective ( =0.05)
ln(Time) ( =0.05)
2
-7
0
-2
-14 SNSCO
RegAlg
RelAlg
-5
0
5
Obective ( =0.10)
ln(Time) ( =0.10)
2
-7
0
-2
-14 SNSCO
RegAlg
RelAlg
-5
100 150 200 250 300 100 150 200 250 300
N N
Figure 4: Effect of N .
Table 2: Effect of K.
6 Conclusion
The 0/1 loss function ideally characterizes the constraints of SAA. However, due to its discontinuous
nature, it impedes the development of numerical algorithms solving SAA for a long time. In this
29
Table 3: Effect of M .
paper, we managed to address a general 0/1 constrained optimization problem that includes SAA
as a special case. One of the key factors of such a success resulted from the derivation of the normal
cone to the feasible set. Another crucial factor was the establishment of the τ -stationary equations,
a type of optimality condition that allows us to exploit the smooth Newton type method. We
feel that those results could be extended to a more general case where equalities or inequalities
constraints are included in model (SCO), which deserves future investigation as the general model
has more practical applications [17, 9].
References
[1] Adam, L., Branda, M.: Nonlinear chance constrained problems: optimality conditions, regu-
larization and solvers. J. Optim. Theory. Appl. 170(2), 419–436 (2016)
[2] Ahmed, S., Shapiro, A.: Solving chance-constrained stochastic programs via sampling and
integer programming. In: State-of-the-art decision-making tools in the information-intensive
age, pp. 261–269. Informs (2008)
[3] Ban, L., Mordukhovich, B.S., Song, W.: Lipschitzian stability of parametric variational in-
equalities over generalized polyhedra in banach spaces. Nonlinear Anal. Theory Methods Appl.
74(2), 441–461 (2011)
[4] Beraldi, P., Bruni, M.E.: An exact approach for solving integer problems under probabilistic
constraints with random technology matrix. Ann. Oper. Res. 177(1), 127–137 (2010)
[5] Boufounos, P.T., Baraniuk, R.G.: 1-bit compressive sensing. In: 2008 42nd Annual Conference
on Information Sciences and Systems, pp. 16–21. IEEE (2008)
[6] Charnes, A., Cooper, W.W., Symonds, G.H.: Cost horizons and certainty equivalents: an
approach to stochastic programming of heating oil. Manage Sci. 4(3), 235–263 (1958)
[7] Chen, X., Qi, L., Sun, D.: Global and superlinear convergence of the smoothing newton method
and its application to general box constrained variational inequalities. Math. Comput. 67(222),
519–540 (1998)
[8] Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
30
[9] Curtis, F.E., Wachter, A., Zavala, V.M.: A sequential algorithm for solving nonlinear opti-
mization problems with chance constraints. SIAM J. Optim. 28(1), 930–958 (2018)
[10] Dontchev, A.L., Rockafellar, R.T.: Newton’s method for generalized equations: a sequential
implicit function theorem. Math. Program. 123(1), 139–159 (2010)
[11] Evgeniou, T., Pontil, M., Poggio, T.: Regularization networks and support vector machines.
Adv. Comput. Math. 13(1), 1 (2000)
[12] Friedman, J.H.: On bias, variance, 0/1 loss, and the curse-of-dimensionality. Data Min. Knowl.
Discov. 1(1), 55–77 (1997)
[13] Geletu, A., Hoffmann, A., Kloppel, M., Li, P.: An inner-outer approximation approach to
chance constrained optimization. SIAM J. Optim. 27(3), 1834–1857 (2017)
[14] Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: data mining,
inference, and prediction. Springer Science & Business Media (2009)
[15] Henrion, R.: Structural properties of linear probabilistic constraints. Optim. 56(4), 425–440
(2007)
[16] Henrion, R., Möller, A.: Optimization of a continuous distillation process under random inflow
rate. Comput. Math. with Appl. 45(1-3), 247–262 (2003)
[17] Hong, L.J., Yang, Y., Zhang, L.: Sequential convex approximations to joint chance constrained
programs: A monte carlo approach. Oper. Res. 59(3), 617–630 (2011)
[18] Kataoka, S.: A stochastic programming model. J. Econom. pp. 181–196 (1963)
[19] Lagoa, C.M., Li, X., Sznaier, M.: Probabilistically constrained linear programs and risk-
adjusted controller design. SIAM J. Optim. 15(3), 938–951 (2005)
[20] Lejeune, M.A., Ruszczyński, A.: An efficient trajectory method for probabilistic production-
inventory-distribution problems. Oper. Res. 55(2), 378–394 (2007)
[21] Li, L., Lin, H.T.: Optimizing 0/1 loss for perceptrons by random coordinate descent. In: 2007
International Joint Conference on Neural Networks, pp. 749–754. IEEE (2007)
[22] Luedtke, J.: A branch-and-cut decomposition algorithm for solving chance-constrained math-
ematical programs with finite support. Math. Program. 146(1), 219–244 (2014)
[23] Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with probabilistic
constraints. SIAM J. Optim. 19(2), 674–699 (2008)
[24] Luedtke, J., Ahmed, S., Nemhauser, G.L.: An integer programming approach for linear pro-
grams with probabilistic constraints. Math. Program. 122(2), 247–272 (2010)
[26] Miller, B.L., Wagner, H.M.: Chance constrained programming with joint constraints. Oper.
Res. 13(6), 930–945 (1965)
31
[27] Nemirovski, A., Shapiro, A.: Convex approximations of chance constrained programs. SIAM
J. Optim. 17(4), 969–996 (2007)
[28] Osuna, E., Girosi, F.: Reducing the run-time complexity of support vector machines. In:
International Conference on Pattern Recognition (submitted) (1998)
[29] Pagnoncelli, B.K., Ahmed, S., Shapiro, A.: Sample average approximation method for chance
constrained programming: theory and applications. J. Optim. Theory. Appl. 142(2), 399–416
(2009)
[30] Peña-Ordieres, A., Luedtke, J.R., Wächter, A.: Solving chance-constrained problems via a
smooth sample-based nonlinear approximation. SIAM J. Optim. 30(3), 2221–2250 (2020)
[31] Prekopa, A.: Contributions to the theory of stochastic programming. Math. Program. 4(1),
202–221 (1973)
[32] Prékopa, A.: Stochastic programming, vol. 324. Springer Science & Business Media (2013)
[33] Robinson, S.M.: Strongly regular generalized equations. Math. Oper. Res. 5(1), 43–62 (1980)
[34] Rockafellar, R.T., Wets, R.J.B.: Variational analysis, vol. 317. Springer Science & Business
Media (2009)
[35] Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on stochastic programming: modeling
and theory. SIAM (2021)
[36] Sun, H., Xu, H., Wang, Y.: Asymptotic analysis of sample average approximation for stochastic
optimization problems with joint chance constraints via conditional value at risk and difference
of convex functions. J. Optim. Theory. Appl. 161(1), 257–284 (2014)
[37] Sun, H., Zhang, D., Chen, Y.: Convergence analysis and a dc approximation method for
data-driven mathematical programs with distributionally robust chance constraints. http :
//www.optimization − online.org/DBH T M L/2019/11/7465.html (2019)
[38] Takyi, A.K., Lence, B.J.: Surface water quality management using a multiple-realization
chance constraint method. Water Resour. Res. 35(5), 1657–1670 (1999)
[39] Thompson, R.C.: Principal submatrices ix: Interlacing inequalities for singular values of sub-
matrices. Linear Algebra Appl. 5(1), 1–12 (1972)
[40] Wang, H., Shao, Y., Zhou, S., Zhang, C., Xiu, N.: Support vector machine classifier via l {0/1}
soft-margin loss. IEEE Trans. Pattern Anal. Mach. Intell. (2021)
[42] Zhou, S., Luo, Z., Xiu, N., Li, G.Y.: Computing one-bit compressive sensing via double-
sparsity constrained optimization. IEEE Trans. Signal Process 70, 1593–1608 (2022)
[43] Zhou, S., Pan, L., Xiu, N., Qi, H.D.: Quadratic convergence of smoothing newton’s method
for 0/1 loss optimization. SIAM J. Optim. 31(4), 3184–3211 (2021)
32