0% found this document useful (0 votes)
27 views

Primal - Dual Decomposition Methods

Uploaded by

Tuấn Đỗ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Primal - Dual Decomposition Methods

Uploaded by

Tuấn Đỗ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Primal/Dual Decomposition Methods

Daniel P. Palomar
The Hong Kong University of Science and Technology (HKUST)

ELEC5470/IEDA6100A - Convex Optimization


The Hong Kong University of Science and Technology (HKUST)
Fall 2020-21
Outline of Lecture

• Subgradients

• Subgradient methods

• Primal decomposition

• Dual decomposition

• Summary

(Acknowledgement to Stephen Boyd for material for this lecture.)

Daniel P. Palomar 1
Gradient and First-Order Approximation

• Recall basic inequality for convex differentiable f :


T
f (y) ≥ f (x) + ∇f (x) (y − x) ∀y

• First-order approximation of f at x is a global underestimator, a


supporting hyperplane.

• What if f is not differentiable?

• The answer is given by the concept of subgradient.

Daniel P. Palomar 2
Subgradient of a Function

• g is a subgradient of f (not necessarily convex) at x if

f (y) ≥ f (x) + gT (y − x) ∀y

Daniel P. Palomar 3
Subgradient of a Function

• g is a subgradient if and only if f (x) + gT (y − x) is a global affine


underestimator of f .

• If f is convex and differentiable, then ∇f (x) is a subgradient of f


at x.

• Subgradients come up in several contexts:


– algorithms for nondifferentiable convex optimization
– convex analysis, e.g., optimality conditions and duality for nondif-
ferentiable problems.

Daniel P. Palomar 4
Example of Subgradient
• Consider f = max {f1, f2} with f1 and f2 convex and differentiable

• Subgradient at point x:
– If f1 (x) > f2 (x), the subgradient is unique g = ∇f1 (x).
– If f2 (x) > f1 (x), the subgradient is unique g = ∇f2 (x).
– If f1 (x) = f2 (x), the subgradients form an interval
[∇f1 (x) , ∇f2 (x)].

Daniel P. Palomar 5
Subdifferential

• The set of all subgradients of f at x is called the subdifferential of


f at x and is denoted ∂f (x).

• ∂f (x) is a closed convex set (can be empty).

• If f is convex:
– ∂f (x) is nonempty for x ∈ relint dom f
– if f is differentiable at x, then ∂f (x) = {∇f (x)} (i.e., a
singleton).
– If ∂f (x) = {g}, then f is differentiable at x and g = ∇f (x).

Daniel P. Palomar 6
Example of Subdifferential

• Consider f (x) = |x|:

Daniel P. Palomar 7
Subgradient Calculus
• Weak subgradient calculus: formulas for finding one subgradient
g ∈ ∂f (x).

• Strong subgradient calculus: formulas for finding the whole subdif-


ferential ∂f (x), i.e., all subgradients of f at x.

• Many algorithms for nondifferentiable convex optimization require


only one subgradient, so weak calculus suffices.

• Some algorithms and optimality conditions need the whole subdif-


ferential.

• In practice, if you can compute f (x), you can usually compute a


subgradient g ∈ ∂f (x).

Daniel P. Palomar 8
Some Basic Rules
(From now on, we will assume that f is convex and x ∈ relint dom f .)

• ∂f (x) = {∇f (x)} if f is differentiable at x

• Scaling: ∂ (αf ) = α∂f (assuming α > 0)

• Addition: ∂ (f1 + f2) = ∂f1 + ∂f2 (addition of sets)

• Affine transformation of variables: if g (x) = f (Ax + b), then


∂g (x) = AT ∂f (Ax + b)
• Finite pointwise maximum: if f = maxi fi, then
[
∂f (x) = Co {∂fi | fi (x) = f (x)} ,

i.e., convex hull of union of subdifferentials of active functions at x.

Daniel P. Palomar 9
Optimality Conditions: Unconstrained Case

• Recall that for f convex and differentiable, x? minimizes f (x) if


and only if:
0 = ∇f (x?) .

• The generalization to nondifferentiable convex f is

0 ∈ ∂f (x?) .

Proof. By definition:

f (y) ≥ f (x?) + 0T (y − x?) ∀y.

Daniel P. Palomar 10
Example: Piecewise Linear Maximization

• We want to minimize f (x) = maxi aTi x + bi .

• x minimizes f (x) ⇐⇒ 0 ∈ ∂f (x ) = Co ai | aTi x + bi = f (x?) ⇐⇒
? ?

there is a λ with
X
T
λ ≥ 0, 1 λ = 1, λi a i = 0
i

where λi = 0 if aTi x? + bi < f (x?).

• Interestingly, these are exactly the KKT conditions for the problem
in epigraph form:

minimize t
t,x
subject to aTi x + bi ≤ t, i = 1, · · · , m .
Daniel P. Palomar 11
Optimality Conditions: Constrained Case

minimize f0 (x)
x
subject to fi (x) ≤ 0, i = 1, · · · , m

where each fi is convex, defined on Rn (hence subdifferentiable) and


strict feasibility holds (Slater’s condition).

• The KKT necessary and sufficient conditions are

fi (x?) ≤ 0, λ?i ≥ 0
m
X
0 ∈ ∂f0 (x?) + λ?i∂fi (x?)
i=1
λ?ifi (x?) = 0 (complementary slackness).
Daniel P. Palomar 12
Numerical Methods for Nondifferentiable Problems
• In R, we can always use the bisection method for nondifferentiable
functions to reduce the interval (uncertainty) by half at each step.

• Can we generalize this to Rn? The problem is that Rn is not ordered,


as opposed to R.

• The answer is: yes. These methods are called localization methods:
– cutting-plane methods (date back to the 1960s in the Russian
literature): the uncertainty set is a polyhedron
– ellipsoid method (goes back to the 1970s in the Russian litera-
ture): the uncertainty set is an ellipsoid.

• Another convenient method is the subgradient method.

Daniel P. Palomar 13
Subgradient Method
• Subgradient method is a simple algorithm (looks similar to a gradient
method) to minimize a nondifferentiable convex function f :

x(k+1) = x(k) − αk g(k)

where
– x(k) is the kth iterate
– g(k) is any subgradient of f at x(k)
– αk > 0 is the kth stepsize

• Note that it is not a descent method (unlike a gradient method), so


we need to keep track of the best point so far:
 
(k)
fbest = min f x(i) .
i=1,··· ,k

Daniel P. Palomar 14
Stepsize Rules

• Different stepsize methods:


– constant stepsize: αk = α
– constant step length: αk = γ/ g(k) (so x(k+1) − x(k) 2
= γ)
– square summable but not summable: stepsizes satisfying

X ∞
X
αk2 < ∞, αk = ∞
k=1 k=1

– nonsummable diminishing: stepsizes satisfying



X
lim αk = 0, αk = ∞.
k→∞
k=1

Daniel P. Palomar 15
Convergence Results

Under some technical conditions (boundedness of optimum value,


boundedness of subgradients by G), the limiting value of the subgra-
¯ (k)
dient method f = limk→∞ fbest satisfies:

• constant stepsize: f¯ − f ? ≤ G2α/2, i.e., converges to G2α/2-


suboptimal (converges to f ? if f differentiable and α small enough)

• constant step length: f¯ − f ? ≤ Gγ/2, i.e., converges to Gγ/2-


suboptimal

• diminishing stepsize rule: f¯ = f ?, i.e., converges.

Daniel P. Palomar 16
Example: Piecewise Linear Minimization
• Consider the following nondifferentiable optimization problem:

aTi x + bi .

minimize f (x) = maxi

• To find a subgradient, simply choose an index j for which

aTj x aTi x + bi

+ bj = maxi

and take g = aj .

• The subgradient update is


(k)
x(k+1) = x(k) − αk aj .

Daniel P. Palomar 17
Example: Piecewise Linear Minimization (II)
• Problem instance with n = 20 variables, m = 100 terms, f ? ≈ 1.1

• Constant step length, first 100 iterations, f (k) − f ?:

Daniel P. Palomar 18
Example: Piecewise Linear Minimization (III)
(k)
• Constant step length, fbest − f ?:

Daniel P. Palomar 19
Example: Piecewise Linear Minimization (IV)

• Diminishing stepsize rule (αk = 0.1/ k) and square summable
stepsize rule (αk = 1/k):

Daniel P. Palomar 20
Projected Subgradient Method for
Constrained Optimization

• Consider the following constrained optimization problem:

minimize f (x)
x
subject to x ∈ X.

• The projected subgradient method guarantees that each update


produces a feasible point via a projection:
h i
x(k+1) = x(k) − αk g(k)
X

where [·]X denotes projection onto the convex set X .

Daniel P. Palomar 21
Decomposition Methods

• The idea of a decomposition method is to solve a problem by solving


smaller subproblems coordinated by a master problem.

• Different reasons to use a decomposition method:


– to allow the resolution of a problem otherwise unsolvable for mem-
ory reasons (useful in areas such as biology or image processing)
– to speed up the resolution of the problem via parallel computation
– to solve the problem in a distributed way (desirable for some
wireless networks)
– to derive nice, insightful, and efficient numerical algorithms as
alternative to the use of general-purpose interior-point methods.

Daniel P. Palomar 22
Decomposition Methods (II)

• In some (few) fortunate cases, problems decouple naturally:

minimize f1 (x1) + f2 (x2)


x
subject to x1 ∈ X1 , x2 ∈ X2 .

• We can solve for x1 and x2 separately (in parallel).

• Even if they are solved sequentially, there is still the advantage of


memory and of speed (if the computational effort is superlinear in
problem size).

Daniel P. Palomar 23
Decomposition Methods (III)

• In general, however, problems do not decouple so easily and that’s


when can resort to decomposition methods.

• There are two main types of decompostion methods:


– primal decomposition:
∗ deals with complicating variables that couple the subproblems
∗ the primal master problem controls directly the resources
– dual decomposition:
∗ deals with complicating constraints that couple the subproblems
∗ the dual master problem controls the prices of the resources.

Daniel P. Palomar 24
Primal Decomposition
• Consider the following problem with a coupling variable:
minimize f1 (x1, y) + f2 (x2, y)
x,y
subject to x1 ∈ X1, x2 ∈ X2
y ∈ Y.

• y is the complicating variable or coupling variable.

• When y is fixed the problem is separable in x1 and x2 and then


decouples into two subproblems that can be solved independently.

• x1 and x2 are local or private variables; and y can be interpreted as


a global or public variable that serves as an interface or boundary
variable between the two subproblems.

Daniel P. Palomar 25
Primal Decomposition (II)

• For a given fixed y we define the subproblems:

subproblem 1 : minimizex1∈X1 f1 (x1, y)


subproblem 2 : minimizex2∈X2 f2 (x2, y)

with optimal values f1? (y) and f2? (y).

• Since minx,y f ≡ miny minx f , it follows that the original problem


is equivalent to the master primal problem:

minimizey∈Y f1? (y) + f2? (y) .

Daniel P. Palomar 26
Primal Decomposition (III)
Observations:

• subproblems can be solved independently for a given y

• we don’t have a closed-form expression for each function fi? (y) and
its corresponding gradient or subgradient (differentiable?)

• instead, to evaluate each function fi? (y) at some point y we need


to solve an optimization problem

• interestingly, we can easily obtain a subgradient of fi? (y) “for free”


when we evaluate the function.

Daniel P. Palomar 27
Primal Decomposition: Solving the Master Problem
• If the original problem is convex, so is master problem.
• To solve the master problem, we can use different methods such as
– bisection (if y is scalar)
– gradient or Newton method (if fi? differentiable)
– subgradient, cutting-plane, or ellipsoid method.

• The subgradient method is very simple and allows itself to distributed


implementation; however, its converge is slow in practice.

• Projected subgradient method:


y(k + 1) = [y(k) − αk (s1(k) + s2(k))]Y

where si(k) is a subgradient of fi? (y (k)).

Daniel P. Palomar 28
Primal Decomposition Algorithm

repeat
1. Solve the suproblems:
Find x1(k) ∈ X1 that minimizes f1 (x1(k), y(k)), and
a subgradient s1(k) ∈ ∂f1? (y(k)).
Find x2(k) ∈ X2 that minimizes f2 (x2(k), y(k)), and
a subgradient s2(k) ∈ ∂f2? (y(k)).
2. Update the complicating variable to minimize the primal master
subproblem:

y(k + 1) = [y(k) − αk (s1(k) + s2(k))]Y .

3. k = k + 1

Daniel P. Palomar 29
Primal Decomposition: Subgradients
Lemma: Let f ? (y) be the optimal value of the convex problem

minimize f0 (x)
x
subject to hi (x) ≤ yi, i = 1, · · · , m.

A subgradient of f ? (y) is −λ? (y).

Proof. The Lagrangian is L (x, λ) = f0 (x) + λT (h (x) − y). Then,


f ? (y0) = f0 (x? (y0))
= g (λ? (y0))
≤ L (x, λ? (y0)) = f0 (x) + λ?T (h (x) − y0)
= f0 (x) + λ?T (h (x) − y) + λ?T (y − y0)
≤ f0 (x) + λ?T (y − y0) ∀y

Daniel P. Palomar 30
where the last inequality holds for any x such that h (x) ≤ y. In particular,

f ? (y0) ≤ min f0 (x) + λ?T (y − y0)


h(x)≤y

= f ? (y) + λ?T (y − y0)

or
f ? (y) ≥ f ? (y0) − λ?T (y − y0) .
2

Lemma: Let f ? (y) be the optimal value of the convex problem

minimize f0 (x, y)
x
subject to hi (x) ≤ yi, i = 1, · · · , m.

A subgradient of f ? (y) is (s0 (x? (y) , y) − λ? (y)).

Daniel P. Palomar 31
Dual Decomposition
• Consider the following problem with a coupling constraint:
minimize f1 (x1) + f2 (x2)
x
subject to x1 ∈ X1, x2 ∈ X2
h1 (x1) + h2 (x2) ≤ h0
• h1 (x1) + h2 (x2) ≤ h0 is the complicating or coupling constraint.

• If the coupling constraint is relaxed with a Lagrange multiplier λ,


then the problem decouples into two subproblems that can be solved
independently.

• x1 and x2 are local or private variables; and λ can be interpreted


as a global or public price variable that serves as an interface or
boundary variable between the two subproblems.
Daniel P. Palomar 32
Dual Decomposition (II)

• The partial Lagrangian after relaxing the coupling constraint is:

L (x, λ) = f1 (x1) + f2 (x2) + λT (h1 (x1) + h2 (x2) − h0) .

• The dual function is

g (λ) = infx∈X L (x, λ)


n o
= infx1∈X1 f1 (x1) + λT h1 (x1)
n o
+ infx2∈X2 f2 (x2) + λT h2 (x2) − λT h0

which clearly decouples.

Daniel P. Palomar 33
Dual Decomposition (III)

• For a given fixed λ we define the subproblems:

subproblem 1 : minimizex1∈X1 f1 (x1) + λT h1 (x1)


subproblem 2 : minimizex2∈X2 f2 (x2) + λT h2 (x2)

with optimal values g1 (λ) and g2 (λ).

• From strong duality, the original problem is equivalent to the master


dual problem:

maximizeλ≥0 g1 (λ) + g2 (λ) − λT h0.

Daniel P. Palomar 34
Dual Decomposition (IV)
Observations:

• subproblems can be solved independently for a given λ

• we don’t have a closed-form expression for each function gi (λ) and


its corresponding gradient or subgradient (differentiable?)

• instead, to evaluate each function gi (λ) at some point λ we need


to solve an optimization problem

• as in the primal case, we can easily obtain a subgradient of gi (λ)


“for free” when we evaluate the function.

Daniel P. Palomar 35
Dual Decomposition: Solving the Master Problem
• The dual master problem is always convex regardless of the original
problem. However, we still need convexity to have strong duality
(under some constraint qualifications like Slater’s condition).
• To solve the master problem, we can use different methods such as
– bisection (if λ is scalar)
– gradient or Newton method (if gi differentiable)
– subgradient, cutting-plane, or ellipsoid method.

• Projected subgradient method:


+
λ(k + 1) = [λ(k) + αk (s1(k) + s2(k) − h0)]

where si(k) is a subgradient of gi (λ (k)).

Daniel P. Palomar 36
Dual Decomposition Algorithm

repeat
1. Solve the suproblems:
Find x1(k) ∈ X1 that minimizes f1 (x1(k)) + λ(k)T h1 (x1(k)),
and a subgradient s1(k) ∈ ∂g1 (λ(k)).
Find x2(k) ∈ X2 that minimizes f2 (x2(k)) + λ(k)T h2 (x2(k)),
and a subgradient s2(k) ∈ ∂g2 (λ(k)).
2. Update the complicating variable to minimize the primal master
subproblem:
+
λ(k + 1) = [λ(k) + αk (s1(k) + s2(k) − h0)] .

3. k = k + 1

Daniel P. Palomar 37
Dual Decomposition: Subgradients
Lemma: Let g (λ) be the dual function corresponding to the problem
minimize f0 (x)
x
subject to hi (x) ≤ 0, i = 1, · · · , m.

A subgradient of g (λ) is h (x? (λ)).


Proof. The Lagrangian is L (x, λ) = f0 (x) + λT h (x) and the dual function
g (λ) = infx L (x, λ). Then,

g (λ) = infx f0 (x) + λT h (x)


≤ f0 (x? (λ0)) + λT h (x? (λ0))
T
= f0 (x? (λ0)) + λT0 h (x? (λ0)) + (λ − λ0) h (x? (λ0))
T
= g (λ0) + (λ − λ0) h (x? (λ0)) .

Daniel P. Palomar 38
Summary

• We have described the concept of subgradient as a generalization of


gradient.

• We have considered subgradient methods as formally similar to


gradient methods.

• Finally, we have derived the two basic decomposition techniques:


– primal decomposition
– dual decomposition.

Daniel P. Palomar 39

You might also like