0% found this document useful (0 votes)

92 views

Subgradients: Ryan Tibshirani Convex Optimization 10-725

This document discusses subgradients, which generalize the concept of gradients to non-differentiable convex functions. It provides examples of subgradients for various functions, including the absolute value function, L1 and L2 norms. It then covers properties of subgradients like subdifferentials, and how subgradients can be used to characterize optimal solutions. Specifically, it shows that a point x? is a minimizer of a convex function f if and only if 0 is a subgradient of f at x?. This allows deriving first-order optimality conditions for constrained optimization problems. The document concludes by applying these concepts to the lasso optimization problem.

Uploaded by

Saheli Chakraborty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views

Subgradients: Ryan Tibshirani Convex Optimization 10-725

Uploaded by

Saheli Chakraborty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Subgradients

Ryan Tibshirani
Convex Optimization 10-725
Last time: gradient descent
Consider the problem
min f (x)
x

for f convex and differentiable, dom(f ) = Rn . Gradient descent:

choose initial x(0) ∈ Rn , repeat

x(k) = x(k−1) − tk · ∇f (x(k−1) ), k = 1, 2, 3, . . .

Step sizes tk chosen to be fixed and small, or by backtracking line

If ∇f is Lipschitz, gradient descent has convergence rate O(1/).

Downsides:
• Requires f differentiable
• Can be slow to converge

2
Outline

Today: crucial mathematical underpinnings!

• Subgradients
• Examples
• Properties
• Optimality characterizations

3
Subgradients

Recall that for convex and differentiable f ,

f (y) ≥ f (x) + ∇f (x)T (y − x) for all x, y

That is, linear approximation always underestimates f

A subgradient of a convex function f at x is any g ∈ Rn such that

f (y) ≥ f (x) + g T (y − x) for all y

• Always exists1
• If f differentiable at x, then g = ∇f (x) uniquely
• Same definition works for nonconvex f (however, subgradients
need not exist)

1
On the relative interior of dom(f )
4
Examples of subgradients

Consider f : R → R, f (x) = |x|

2.0
1.5
1.0
f(x)

0.5
0.0
−0.5

−2 −1 0 1 2

• For x 6= 0, unique subgradient g = sign(x)

• For x = 0, subgradient g is any element of [−1, 1]

5
Consider f : Rn → R, f (x) = kxk2

f(x)

x2
x1

• For x 6= 0, unique subgradient g = x/kxk2

• For x = 0, subgradient g is any element of {z : kzk2 ≤ 1}

6
Consider f : Rn → R, f (x) = kxk1

f(x)

x2
x1

• For xi 6= 0, unique ith component gi = sign(xi )

• For xi = 0, ith component gi is any element of [−1, 1]

7
Consider f (x) = max{f1 (x), f2 (x)}, for f1 , f2 : Rn → R convex,
differentiable

15
10
f(x)

5
0

−2 −1 0 1 2

• For f1 (x) > f2 (x), unique subgradient g = ∇f1 (x)

• For f2 (x) > f1 (x), unique subgradient g = ∇f2 (x)
• For f1 (x) = f2 (x), subgradient g is any point on line segment
between ∇f1 (x) and ∇f2 (x)

8
Subdifferential

Set of all subgradients of convex f is called the subdifferential:

∂f (x) = {g ∈ Rn : g is a subgradient of f at x}

• Nonempty (only for convex f )

• ∂f (x) is closed and convex (even for nonconvex f )
• If f is differentiable at x, then ∂f (x) = {∇f (x)}
• If ∂f (x) = {g}, then f is differentiable at x and ∇f (x) = g

9
Connection to convex geometry

Convex set C ⊆ Rn , consider indicator function IC : Rn → R,

(
0 if x ∈ C
IC (x) = I{x ∈ C} =
∞ if x ∈ /C

For x ∈ C, ∂IC (x) = NC (x), the normal cone of C at x is, recall

NC (x) = {g ∈ Rn : g T x ≥ g T y for any y ∈ C}

Why? By definition of subgradient g,

IC (y) ≥ IC (x) + g T (y − x) for all y

• For y ∈
/ C, IC (y) = ∞
• For y ∈ C, this means 0 ≥ g T (y − x)

10
●
●

● ●

11
Subgradient calculus

Basic rules for convex functions:

• Scaling: ∂(af ) = a · ∂f provided a > 0
• Addition: ∂(f1 + f2 ) = ∂f1 + ∂f2
• Affine composition: if g(x) = f (Ax + b), then

∂g(x) = AT ∂f (Ax + b)

• Finite pointwise maximum: if f (x) = maxi=1,...,m fi (x), then

[
∂f (x) = conv ∂fi (x)
i:fi (x)=f (x)

convex hull of union of subdifferentials of active functions at x

12
• General composition: if

f (x) = h g(x) = h g1 (x), . . . , gk (x)

where g : Rn → Rk , h : Rk → R, f : Rn → R, h is convex
and nondecreasing in each argument, g is convex, then
n
∂f (x) ⊆ p1 q1 + · · · + pk qk :
o
p ∈ ∂h(g(x)), qi ∈ ∂gi (x), i = 1, . . . , k

• General pointwise maximum: if f (x) = maxs∈S fs (x), then

[
∂f (x) ⊇ cl conv ∂fs (x)
s:fs (x)=f (x)

Under some regularity conditions (on S, fs ), we get equality

13
• Norms: important special case. To each norm k · k, there is a
dual norm k · k∗ such that

kxk = max z T x
kzk∗ ≤1

(For example, k · kp and k · kq are dual when 1/p + 1/q = 1.)

In fact, for f (x) = kxk (and fz (x) = z T x), we get equality:
[
∂f (x) = cl conv ∂fz (x)
z:fz (x)=f (x)

Note that ∂fz (x) = z. And if z1 , z2 each achieve the max at

x, which means that z1T x = z2T x = kxk, then by linearity, so
will tz1 + (1 − t)z2 for any t ∈ [0, 1]. Thus

∂f (x) = argmax z T x
kzk∗ ≤1

14
Optimality condition

For any f (convex or not),

f (x? ) = min f (x) ⇐⇒ 0 ∈ ∂f (x? )

That is, x? is a minimizer if and only if 0 is a subgradient of f at

x? . This is called the subgradient optimality condition

Why? Easy: g = 0 being a subgradient means that for all y

f (y) ≥ f (x? ) + 0T (y − x? ) = f (x? )

Note the implication for a convex and differentiable function f ,

with ∂f (x) = {∇f (x)}

15
Derivation of first-order optimality

Example of the power of subgradients: we can use what we have

learned so far to derive the first-order optimality condition. Recall

min f (x) subject to x ∈ C

is solved at x, for f convex and differentiable, if and only if

∇f (x)T (y − x) ≥ 0 for all y ∈ C

Intuitively: says that gradient increases as we move away from x.

How to prove it? First recast problem as

min f (x) + IC (x)

Now apply subgradient optimality: 0 ∈ ∂(f (x) + IC (x))

16
Observe

0 ∈ ∂ f (x) + IC (x)
⇐⇒ 0 ∈ {∇f (x)} + NC (x)
⇐⇒ − ∇f (x) ∈ NC (x)
⇐⇒ − ∇f (x)T x ≥ −∇f (x)T y for all y ∈ C
⇐⇒ ∇f (x)T (y − x) ≥ 0 for all y ∈ C

as desired

Note: the condition 0 ∈ ∂f (x) + NC (x) is a fully general condition

for optimality in convex problems. But it’s not always easy to work
with (KKT conditions, later, are easier)

17
Example: lasso optimality conditions
Given y ∈ Rn , X ∈ Rn×p , lasso problem can be parametrized as
1
min ky − Xβk22 + λkβk1
β 2
where λ ≥ 0. Subgradient optimality:
1
0 ∈ ∂ ky − Xβk22 + λkβk1
2
⇐⇒ 0 ∈ −X T (y − Xβ) + λ∂kβk1
⇐⇒ X T (y − Xβ) = λv

for some v ∈ ∂kβk1 , i.e.,


{1}
 if βi > 0
vi ∈ {−1} if βi < 0 , i = 1, . . . , p

[−1, 1] if βi = 0


18
Write X1 , . . . , Xp for columns of X. Then our condition reads:
(
XiT (y − Xβ) = λ · sign(βi ) if βi 6= 0
|XiT (y − Xβ)| ≤ λ if βi = 0

Note: subgradient optimality conditions don’t lead to closed-form

expression for a lasso solution ... however they do provide a way to
check lasso optimality

They are also helpful in understanding the lasso estimator; e.g., if

|XiT (y − Xβ)| < λ, then βi = 0 (used by screening rules, later?)

19
Example: soft-thresholding
Simplfied lasso problem with X = I:
1
min ky − βk22 + λkβk1
β 2

This we can solve directly using subgradient optimality. Solution is

β = Sλ (y), where Sλ is the soft-thresholding operator:

yi − λ if yi > λ

[Sλ (y)]i = 0 if − λ ≤ yi ≤ λ , i = 1, . . . , n

yi + λ if yi < −λ


Check: from last slide, subgradient optimality conditions are

(
yi − βi = λ · sign(βi ) if βi 6= 0
|yi − βi | ≤ λ if βi = 0

20
Now plug in β = Sλ (y) and check these are satisfied:
• When yi > λ, βi = yi − λ > 0, so yi − βi = λ = λ · 1
• When yi < −λ, argument is similar
• When |yi | ≤ λ, βi = 0, and |yi − βi | = |yi | ≤ λ

1.0
0.5
Soft-thresholding in 0.0
one variable:
−0.5
−1.0

−1.0 −0.5 0.0 0.5 1.0

21
Example: distance to a convex set

Recall the distance function to a closed, convex set C:

dist(x, C) = min ky − xk2

y∈C

This is a convex function. What are its subgradients?

Write dist(x, C) = kx − PC (x)k2 , where PC (x) is the projection of

x onto C. It turns out that when dist(x, C) > 0,

x − PC (x)
∂dist(x, C) =
kx − PC (x)k2

Only has one element, so in fact dist(x, C) is differentiable and

this is its gradient

22
We will only show one direction, i.e., that

x − PC (x)
∈ ∂dist(x, C)
kx − PC (x)k2

Write u = PC (x). Then by first-order optimality conditions for a

projection,
(x − u)T (y − u) ≤ 0 for all y ∈ C
Hence
C ⊆ H = {y : (x − u)T (y − u) ≤ 0}

Claim:
(x − u)T (y − u)
dist(y, C) ≥ for all y
kx − uk2
Check: first, for y ∈ H, the right-hand side is ≤ 0

23
Now for y ∈/ H, we have (x − u)T (y − u) = kx − uk2 ky − uk2 cos θ
where θ is the angle between x − u and y − u. Thus

(x − u)T (y − u)
= ky − uk2 cos θ = dist(y, H) ≤ dist(y, C)
kx − uk2

as desired

Using the claim, we have for any y

(x − u)T (y − x + x − u)
dist(y, C) ≥
kx − uk2
T
x−u
= kx − uk2 + (y − x)
kx − uk2

Hence g = (x − u)/kx − uk2 is a subgradient of dist(x, C) at x

24
References and further reading

• S. Boyd, Lecture notes for EE 264B, Stanford University,

Spring 2010-2011
• R. T. Rockafellar (1970), “Convex analysis”, Chapters 23–25
• L. Vandenberghe, Lecture notes for EE 236C, UCLA, Spring
2011-2012

Math 9 Q2 Module 5
100% (2)
Math 9 Q2 Module 5
12 pages
Exercises With Solutions PDF
No ratings yet
Exercises With Solutions PDF
37 pages
06 SG Method
No ratings yet
06 SG Method
33 pages
Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative
No ratings yet
Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative
39 pages
subgradients_slides
No ratings yet
subgradients_slides
37 pages
Notes On Subgradients
No ratings yet
Notes On Subgradients
13 pages
lect4_removed
No ratings yet
lect4_removed
32 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Subgradients
No ratings yet
Subgradients
39 pages
Gradient - Descent Important 23-24
No ratings yet
Gradient - Descent Important 23-24
37 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Subgradient Methods
No ratings yet
Subgradient Methods
56 pages
L10_Subgrad_PGD (partially annotated)
No ratings yet
L10_Subgrad_PGD (partially annotated)
39 pages
Subgradient Method: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Subgradient Method: Ryan Tibshirani Convex Optimization 10-725
21 pages
Subgrad Method Slides
No ratings yet
Subgrad Method Slides
33 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
SGD
No ratings yet
SGD
19 pages
Week02 Convex Optimization
No ratings yet
Week02 Convex Optimization
48 pages
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
27 pages
Gradient_Descent
No ratings yet
Gradient_Descent
52 pages
02-Subgrad Method Notes
No ratings yet
02-Subgrad Method Notes
27 pages
13 Generalized Programming and Subgradient Optimization PDF
No ratings yet
13 Generalized Programming and Subgradient Optimization PDF
20 pages
Nisheeth VishnoiFall2014 ConvexOptimization PDF
No ratings yet
Nisheeth VishnoiFall2014 ConvexOptimization PDF
114 pages
Primal - Dual Decomposition Methods
No ratings yet
Primal - Dual Decomposition Methods
40 pages
Data Science - Convex Optimization and Examples PDF
No ratings yet
Data Science - Convex Optimization and Examples PDF
9 pages
Convex Problems
No ratings yet
Convex Problems
48 pages
Lec_11
No ratings yet
Lec_11
13 pages
lect5_removed
No ratings yet
lect5_removed
35 pages
Bregman
No ratings yet
Bregman
9 pages
09_convex
No ratings yet
09_convex
48 pages
5 Optimization: F Emp
No ratings yet
5 Optimization: F Emp
52 pages
OQM Lecture Note - Part 8 Unconstrained Nonlinear Optimisation
No ratings yet
OQM Lecture Note - Part 8 Unconstrained Nonlinear Optimisation
23 pages
Lecture 8
No ratings yet
Lecture 8
16 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
Mirror Descent Slides
No ratings yet
Mirror Descent Slides
35 pages
Convexity II: Optimization Basics: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Convexity II: Optimization Basics: Ryan Tibshirani Convex Optimization 10-725
28 pages
Convex Optimization For Machine Learning
No ratings yet
Convex Optimization For Machine Learning
110 pages
Chapter 4: Unconstrained Optimization
No ratings yet
Chapter 4: Unconstrained Optimization
25 pages
Gradient
No ratings yet
Gradient
37 pages
Lecture 7 (with notes)
No ratings yet
Lecture 7 (with notes)
39 pages
Chapter 4. Optimization
No ratings yet
Chapter 4. Optimization
62 pages
06 Optimization
No ratings yet
06 Optimization
42 pages
Chapter 4
No ratings yet
Chapter 4
65 pages
Mathematics For Economics (ECON 104)
No ratings yet
Mathematics For Economics (ECON 104)
46 pages
斯坦福大学机器学习数学基础 41-48
No ratings yet
斯坦福大学机器学习数学基础 41-48
8 pages
Practice questions-EE5180
No ratings yet
Practice questions-EE5180
2 pages
Lecture 10
No ratings yet
Lecture 10
4 pages
week 10 notes MLF
No ratings yet
week 10 notes MLF
20 pages
Gradient Methods For Minimizing Composite Objective Function
No ratings yet
Gradient Methods For Minimizing Composite Objective Function
31 pages
(Grundlehren der mathematischen Wissenschaften 305) Jean-Baptiste Hiriart-Urruty, Claude Lemaréchal (auth.) - Convex Analysis and Minimization Algorithms I_ Fundamentals-Springer-Verlag Berlin Heidelb
No ratings yet
(Grundlehren der mathematischen Wissenschaften 305) Jean-Baptiste Hiriart-Urruty, Claude Lemaréchal (auth.) - Convex Analysis and Minimization Algorithms I_ Fundamentals-Springer-Verlag Berlin Heidelb
431 pages
Math Lecture 4
No ratings yet
Math Lecture 4
27 pages
2. Algorithmic Stability
No ratings yet
2. Algorithmic Stability
87 pages
Unconstrained Optimization (Contd.) Constrained Optimization
No ratings yet
Unconstrained Optimization (Contd.) Constrained Optimization
19 pages
Gradient
No ratings yet
Gradient
31 pages
1 Convex Analysis: 1.1 Motivations: Convex Optimization Problems
No ratings yet
1 Convex Analysis: 1.1 Motivations: Convex Optimization Problems
24 pages
Optimality Conditions: Unconstrained Optimization: 1.1 Differentiable Problems
No ratings yet
Optimality Conditions: Unconstrained Optimization: 1.1 Differentiable Problems
10 pages
Epigrafo PDF
No ratings yet
Epigrafo PDF
12 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Fea Question Bank - ME 6603
No ratings yet
Fea Question Bank - ME 6603
6 pages
Polynomial Sample Problems
No ratings yet
Polynomial Sample Problems
3 pages
Mba 2306
No ratings yet
Mba 2306
6 pages
Bitsat 2024 - Top 100 Questions
100% (1)
Bitsat 2024 - Top 100 Questions
308 pages
Assignment of Unit 4 (BAS103- Engineering Maths-I)
No ratings yet
Assignment of Unit 4 (BAS103- Engineering Maths-I)
3 pages
2021fermat E Online 1
No ratings yet
2021fermat E Online 1
6 pages
Introduction To Factors & Multiples: What Is A Factor?
No ratings yet
Introduction To Factors & Multiples: What Is A Factor?
6 pages
Polar Coordinates and Polar Graphs Handout# 4
No ratings yet
Polar Coordinates and Polar Graphs Handout# 4
2 pages
Class 11 Straight Lines Complete Guide
No ratings yet
Class 11 Straight Lines Complete Guide
6 pages
Boole
No ratings yet
Boole
1 page
Sol 1
No ratings yet
Sol 1
3 pages
Schedule - JEE Advanced 2024 Crash Course (Batch 1)
No ratings yet
Schedule - JEE Advanced 2024 Crash Course (Batch 1)
3 pages
Answers
No ratings yet
Answers
39 pages
2019 Mu Gemini
No ratings yet
2019 Mu Gemini
5 pages
MTAP 2nd Year
100% (5)
MTAP 2nd Year
3 pages
paper's
No ratings yet
paper's
16 pages
Stanziola2022 - J-Wave - An Open-Source Differentiable Wave Simulato
No ratings yet
Stanziola2022 - J-Wave - An Open-Source Differentiable Wave Simulato
13 pages
Ap14 FRQ Calculus Ab
No ratings yet
Ap14 FRQ Calculus Ab
7 pages
Relation EXERCISE
No ratings yet
Relation EXERCISE
3 pages
CBSE Std. VIII Maths Chapter 3 - Squares and Square Roots - 26045750
No ratings yet
CBSE Std. VIII Maths Chapter 3 - Squares and Square Roots - 26045750
11 pages
Chapter-6 Root Locus Method (2)
No ratings yet
Chapter-6 Root Locus Method (2)
50 pages
Calculus Primary Report
No ratings yet
Calculus Primary Report
19 pages
Desmos - Islamic Art
No ratings yet
Desmos - Islamic Art
22 pages
Lecture#1 (Intro. To Numerical) NC
No ratings yet
Lecture#1 (Intro. To Numerical) NC
9 pages
Game Theory Solutions
100% (1)
Game Theory Solutions
16 pages
Exact Differential Equations
No ratings yet
Exact Differential Equations
5 pages
Solution Manual For Introduction To Digital Communications - Ali Grami
No ratings yet
Solution Manual For Introduction To Digital Communications - Ali Grami
30 pages
MEI Sequences and Series: Topic Assessment
No ratings yet
MEI Sequences and Series: Topic Assessment
2 pages
23 - Triple Integrals in Cylindrical and Spherical Coordinates Solutions
No ratings yet
23 - Triple Integrals in Cylindrical and Spherical Coordinates Solutions
2 pages

Subgradients: Ryan Tibshirani Convex Optimization 10-725

Uploaded by

Subgradients: Ryan Tibshirani Convex Optimization 10-725

Uploaded by

Subgradients

for f convex and differentiable, dom(f ) = Rn . Gradient descent:

x(k) = x(k−1) − tk · ∇f (x(k−1) ), k = 1, 2, 3, . . .

Step sizes tk chosen to be fixed and small, or by backtracking line

If ∇f is Lipschitz, gradient descent has convergence rate O(1/).

Today: crucial mathematical underpinnings!

Recall that for convex and differentiable f ,

f (y) ≥ f (x) + ∇f (x)T (y − x) for all x, y

That is, linear approximation always underestimates f

A subgradient of a convex function f at x is any g ∈ Rn such that

f (y) ≥ f (x) + g T (y − x) for all y

Consider f : R → R, f (x) = |x|

• For x 6= 0, unique subgradient g = sign(x)

• For x 6= 0, unique subgradient g = x/kxk2

• For xi 6= 0, unique ith component gi = sign(xi )

• For f1 (x) > f2 (x), unique subgradient g = ∇f1 (x)

Set of all subgradients of convex f is called the subdifferential:

• Nonempty (only for convex f )

Convex set C ⊆ Rn , consider indicator function IC : Rn → R,

For x ∈ C, ∂IC (x) = NC (x), the normal cone of C at x is, recall

NC (x) = {g ∈ Rn : g T x ≥ g T y for any y ∈ C}

Why? By definition of subgradient g,

IC (y) ≥ IC (x) + g T (y − x) for all y

Basic rules for convex functions:

• Finite pointwise maximum: if f (x) = maxi=1,...,m fi (x), then

convex hull of union of subdifferentials of active functions at x

• General pointwise maximum: if f (x) = maxs∈S fs (x), then

Under some regularity conditions (on S, fs ), we get equality

(For example, k · kp and k · kq are dual when 1/p + 1/q = 1.)

Note that ∂fz (x) = z. And if z1 , z2 each achieve the max at

For any f (convex or not),

f (x? ) = min f (x) ⇐⇒ 0 ∈ ∂f (x? )

That is, x? is a minimizer if and only if 0 is a subgradient of f at

Why? Easy: g = 0 being a subgradient means that for all y

f (y) ≥ f (x? ) + 0T (y − x? ) = f (x? )

Note the implication for a convex and differentiable function f ,

Example of the power of subgradients: we can use what we have

min f (x) subject to x ∈ C

is solved at x, for f convex and differentiable, if and only if

∇f (x)T (y − x) ≥ 0 for all y ∈ C

Intuitively: says that gradient increases as we move away from x.

min f (x) + IC (x)

Now apply subgradient optimality: 0 ∈ ∂(f (x) + IC (x))

Note: the condition 0 ∈ ∂f (x) + NC (x) is a fully general condition

for some v ∈ ∂kβk1 , i.e.,

Note: subgradient optimality conditions don’t lead to closed-form

They are also helpful in understanding the lasso estimator; e.g., if

This we can solve directly using subgradient optimality. Solution is

Check: from last slide, subgradient optimality conditions are

−1.0 −0.5 0.0 0.5 1.0

Recall the distance function to a closed, convex set C:

dist(x, C) = min ky − xk2

This is a convex function. What are its subgradients?

Write dist(x, C) = kx − PC (x)k2 , where PC (x) is the projection of

Only has one element, so in fact dist(x, C) is differentiable and

Write u = PC (x). Then by first-order optimality conditions for a

Using the claim, we have for any y

Hence g = (x − u)/kx − uk2 is a subgradient of dist(x, C) at x

• S. Boyd, Lecture notes for EE 264B, Stanford University,

You might also like

If ∇f is Lipschitz, gradient descent has convergence rate O(1/).