Primal - Dual Decomposition Methods
Primal - Dual Decomposition Methods
Daniel P. Palomar
The Hong Kong University of Science and Technology (HKUST)
• Subgradients
• Subgradient methods
• Primal decomposition
• Dual decomposition
• Summary
Daniel P. Palomar 1
Gradient and First-Order Approximation
Daniel P. Palomar 2
Subgradient of a Function
f (y) ≥ f (x) + gT (y − x) ∀y
Daniel P. Palomar 3
Subgradient of a Function
Daniel P. Palomar 4
Example of Subgradient
• Consider f = max {f1, f2} with f1 and f2 convex and differentiable
• Subgradient at point x:
– If f1 (x) > f2 (x), the subgradient is unique g = ∇f1 (x).
– If f2 (x) > f1 (x), the subgradient is unique g = ∇f2 (x).
– If f1 (x) = f2 (x), the subgradients form an interval
[∇f1 (x) , ∇f2 (x)].
Daniel P. Palomar 5
Subdifferential
• If f is convex:
– ∂f (x) is nonempty for x ∈ relint dom f
– if f is differentiable at x, then ∂f (x) = {∇f (x)} (i.e., a
singleton).
– If ∂f (x) = {g}, then f is differentiable at x and g = ∇f (x).
Daniel P. Palomar 6
Example of Subdifferential
Daniel P. Palomar 7
Subgradient Calculus
• Weak subgradient calculus: formulas for finding one subgradient
g ∈ ∂f (x).
Daniel P. Palomar 8
Some Basic Rules
(From now on, we will assume that f is convex and x ∈ relint dom f .)
Daniel P. Palomar 9
Optimality Conditions: Unconstrained Case
0 ∈ ∂f (x?) .
Proof. By definition:
Daniel P. Palomar 10
Example: Piecewise Linear Maximization
• We want to minimize f (x) = maxi aTi x + bi .
• x minimizes f (x) ⇐⇒ 0 ∈ ∂f (x ) = Co ai | aTi x + bi = f (x?) ⇐⇒
? ?
there is a λ with
X
T
λ ≥ 0, 1 λ = 1, λi a i = 0
i
• Interestingly, these are exactly the KKT conditions for the problem
in epigraph form:
minimize t
t,x
subject to aTi x + bi ≤ t, i = 1, · · · , m .
Daniel P. Palomar 11
Optimality Conditions: Constrained Case
minimize f0 (x)
x
subject to fi (x) ≤ 0, i = 1, · · · , m
fi (x?) ≤ 0, λ?i ≥ 0
m
X
0 ∈ ∂f0 (x?) + λ?i∂fi (x?)
i=1
λ?ifi (x?) = 0 (complementary slackness).
Daniel P. Palomar 12
Numerical Methods for Nondifferentiable Problems
• In R, we can always use the bisection method for nondifferentiable
functions to reduce the interval (uncertainty) by half at each step.
• The answer is: yes. These methods are called localization methods:
– cutting-plane methods (date back to the 1960s in the Russian
literature): the uncertainty set is a polyhedron
– ellipsoid method (goes back to the 1970s in the Russian litera-
ture): the uncertainty set is an ellipsoid.
Daniel P. Palomar 13
Subgradient Method
• Subgradient method is a simple algorithm (looks similar to a gradient
method) to minimize a nondifferentiable convex function f :
where
– x(k) is the kth iterate
– g(k) is any subgradient of f at x(k)
– αk > 0 is the kth stepsize
Daniel P. Palomar 14
Stepsize Rules
Daniel P. Palomar 15
Convergence Results
Daniel P. Palomar 16
Example: Piecewise Linear Minimization
• Consider the following nondifferentiable optimization problem:
aTi x + bi .
minimize f (x) = maxi
aTj x aTi x + bi
+ bj = maxi
and take g = aj .
Daniel P. Palomar 17
Example: Piecewise Linear Minimization (II)
• Problem instance with n = 20 variables, m = 100 terms, f ? ≈ 1.1
Daniel P. Palomar 18
Example: Piecewise Linear Minimization (III)
(k)
• Constant step length, fbest − f ?:
Daniel P. Palomar 19
Example: Piecewise Linear Minimization (IV)
√
• Diminishing stepsize rule (αk = 0.1/ k) and square summable
stepsize rule (αk = 1/k):
Daniel P. Palomar 20
Projected Subgradient Method for
Constrained Optimization
minimize f (x)
x
subject to x ∈ X.
Daniel P. Palomar 21
Decomposition Methods
Daniel P. Palomar 22
Decomposition Methods (II)
Daniel P. Palomar 23
Decomposition Methods (III)
Daniel P. Palomar 24
Primal Decomposition
• Consider the following problem with a coupling variable:
minimize f1 (x1, y) + f2 (x2, y)
x,y
subject to x1 ∈ X1, x2 ∈ X2
y ∈ Y.
Daniel P. Palomar 25
Primal Decomposition (II)
Daniel P. Palomar 26
Primal Decomposition (III)
Observations:
• we don’t have a closed-form expression for each function fi? (y) and
its corresponding gradient or subgradient (differentiable?)
Daniel P. Palomar 27
Primal Decomposition: Solving the Master Problem
• If the original problem is convex, so is master problem.
• To solve the master problem, we can use different methods such as
– bisection (if y is scalar)
– gradient or Newton method (if fi? differentiable)
– subgradient, cutting-plane, or ellipsoid method.
Daniel P. Palomar 28
Primal Decomposition Algorithm
repeat
1. Solve the suproblems:
Find x1(k) ∈ X1 that minimizes f1 (x1(k), y(k)), and
a subgradient s1(k) ∈ ∂f1? (y(k)).
Find x2(k) ∈ X2 that minimizes f2 (x2(k), y(k)), and
a subgradient s2(k) ∈ ∂f2? (y(k)).
2. Update the complicating variable to minimize the primal master
subproblem:
3. k = k + 1
Daniel P. Palomar 29
Primal Decomposition: Subgradients
Lemma: Let f ? (y) be the optimal value of the convex problem
minimize f0 (x)
x
subject to hi (x) ≤ yi, i = 1, · · · , m.
Daniel P. Palomar 30
where the last inequality holds for any x such that h (x) ≤ y. In particular,
or
f ? (y) ≥ f ? (y0) − λ?T (y − y0) .
2
minimize f0 (x, y)
x
subject to hi (x) ≤ yi, i = 1, · · · , m.
Daniel P. Palomar 31
Dual Decomposition
• Consider the following problem with a coupling constraint:
minimize f1 (x1) + f2 (x2)
x
subject to x1 ∈ X1, x2 ∈ X2
h1 (x1) + h2 (x2) ≤ h0
• h1 (x1) + h2 (x2) ≤ h0 is the complicating or coupling constraint.
Daniel P. Palomar 33
Dual Decomposition (III)
Daniel P. Palomar 34
Dual Decomposition (IV)
Observations:
Daniel P. Palomar 35
Dual Decomposition: Solving the Master Problem
• The dual master problem is always convex regardless of the original
problem. However, we still need convexity to have strong duality
(under some constraint qualifications like Slater’s condition).
• To solve the master problem, we can use different methods such as
– bisection (if λ is scalar)
– gradient or Newton method (if gi differentiable)
– subgradient, cutting-plane, or ellipsoid method.
Daniel P. Palomar 36
Dual Decomposition Algorithm
repeat
1. Solve the suproblems:
Find x1(k) ∈ X1 that minimizes f1 (x1(k)) + λ(k)T h1 (x1(k)),
and a subgradient s1(k) ∈ ∂g1 (λ(k)).
Find x2(k) ∈ X2 that minimizes f2 (x2(k)) + λ(k)T h2 (x2(k)),
and a subgradient s2(k) ∈ ∂g2 (λ(k)).
2. Update the complicating variable to minimize the primal master
subproblem:
+
λ(k + 1) = [λ(k) + αk (s1(k) + s2(k) − h0)] .
3. k = k + 1
Daniel P. Palomar 37
Dual Decomposition: Subgradients
Lemma: Let g (λ) be the dual function corresponding to the problem
minimize f0 (x)
x
subject to hi (x) ≤ 0, i = 1, · · · , m.
Daniel P. Palomar 38
Summary
Daniel P. Palomar 39