SGD
SGD
Manjesh K. Hanawal
SGD Introduction
(t+1) 1 (t) 2
(t) (t) (t)
w = arg min ∥w − w ∥ + η f (w ) + ⟨w − w , ∇f (w )⟩ .
w 2
For many practical uses, we do not need to calculate the whole set
of subgradients at a given point, as one member of this set would
suffice.
DS303 Manjesh K. Hanawal 9
Subgradient of a Maximum Function
∥v∥ ≤ ρ.
λ
⟨w − u, v⟩ ≥ f (w) − f (u) + ∥w − u∥2 .
2
SGD algorithm for minimizing a λ-strongly convex function
Goal: Solve minw∈H f (w)
Parameter: T
Initialize: w(1) = 0
for t = 1, . . . , T do
▶ Choose a random vector vt such that E[vt |w(t) ] ∈ ∂f (w(t) )
▶ Set ηt = 1
λt
(t+ 12 )
▶ Set w = w(t) − ηt vt
1
▶ Set w(t+1) = arg minw∈H ∥w − w(t+ 2 ) ∥2
Output: w̄ = T1 T (t)
P
t=1 w
DS303 Manjesh K. Hanawal 15
Adding Projection step for SGD
∥w-u∥2 = ∥v-u∥2 ≥ 0
Using the lemma, we can easily adapt the analysis of SGD to the
case in which we add projection steps on a closed and convex set.
Simply note that for every t,
∥w(t+1) − w⋆ ∥2 − ∥w(t) − w⋆ ∥2
1 1
= ∥w(t+1) −w⋆ ∥2 −∥w(t+ 2 ) −w⋆ ∥2 +∥w(t+ 2 ) −w⋆ ∥2 −∥w(t) −w⋆ ∥2
1
≤ ∥w(t+ 2 ) − w⋆ ∥2 − ∥w(t) − w⋆ ∥2 .
DS303 Manjesh K. Hanawal 17
Convex-Lipschitz bounded problems