Bryan and Shibberu - PeBryan and Shibberu - Penalty Functions and Constrained Optimizationnalty Functions and Constrained Optimization
Bryan and Shibberu - PeBryan and Shibberu - Penalty Functions and Constrained Optimizationnalty Functions and Constrained Optimization
(t) =
0
for t < 0
3
kt for t 0
(1)
where k is some positive constant. The function is a penalty function. It penalizes any
number t which is greater than zero (from the point of view of minimization). Heres a plot
of for k = 100:
800
600
400
200
1
t
1
t
0.5
1.5
t
2.5
(t) =
0
for t < 0
2
3kt for t 0
00
(t) =
0
for t < 0
6kt for t 0
If you run an unconstrained algorithm like golden section on f(x) in this case (with k = 100)
you find that the minimum is at x = 0.9012; the penalty approach didnt exactly solve the
problem, but it is reasonably close.
In fact, a reasonable procedure would be to increase the constant k, say by a factor of
10, and then re-run the unconstrained algorithm on f using 0.9012 as the initial guess. Increasing k enforces the constrained more rigorously, while using the previous final iterate as
an initial guess speeds up convergence (since we expect the minimum for the larger value of
k isnt that far from the minimum for the previous value of k). In this case increasing k to
104 moves the minimum to x = 0.989. We could then increase k and use x = 0.989 as an
initial guess, and continue this process until we obtain a reasonable estimate of the minimizer.
The General Case:
In general we want to minimize a function f (x) of n variables subject to both equality
and inequality constraints of the form
gi (x) 0, i = 1, . . . , m
hi (x) = 0, i = 1, . . . , n.
(2)
(3)
You should convince yourself that any equality or inequality constraints can be cast into the
above forms. The set of x in n dimensional space which satisfy the constraints is called the
feasible set, although it may be empty if the constraints are mutually contradictory.
We will call (, t) for 0, t lR, a penalty function if
1. is continuous.
2. (, t) 0 for all and t.
3. (, t) = 0 for t 0 and is strictly increasing for both > 0 and t > 0.
Its also desirable if has at least one continuous derivative in t, preferably two.
A typical example of a penalty function would be
(
(, t) =
0
for t < 0
tn for t 0
(4)
To minimize f (x) subject to constraints (2) and (3), define a modified objective function
by
f(x) = f (x) +
m
X
i=1
(i , gi (x)) +
n
X
i=1
where the i and i are positive constants that control how strongly the constraints will be
enforced. The penalty functions in the first sum modify the original objective function so
that if any inequality constraint is violated, a large penalty is invoked; if all constraints are
satisfied, no penalty. Similarly the second summation penalizes equality constraints which
are not satisfied, by penalizing both hi (x) < 0 and hi (x) > 0. We minimize the function f
with no constraints, and count on the penalty terms to keep the solution near the feasible set,
although no finite choice for the penalty parameters typically keeps the solution in the feasible set. After having minimized f with an unconstrained method, for a given set of i and
i , we may then increase the i and i and use the terminal iterate as the initial guess for a
new minimization, and continue this process until we obtain a sufficiently accurate minimum.
Example: Let us minimize the function f (x, y) = x2 + y 2 subject to the inequality constraint x + 2y 6 and the equality constraint x y = 3. In this case the constraints can by
written as
g1 (x, y) 0,
h1 (x, y) = 0,
where g1 (x, y) = 6 x 2y and h1 (x, y) = 3 x + y. We use the penalty function defined
in equation (4) with 1 = 5 and 1 = 5 to start. The modified objective function is
f(x, y) = f (x, y) + (5, g1 (x, y)) + (5, h1 (x, y)) + (5, h1 (x, y)).
Run any standard unconstrained algorithm on this, e.g., a BFGS quasi-Newton method;
the minimum occurs at x = 3.506 and y = 1.001. The equality constraint is violated
(3 x + y = 0.494), as is the inequality constraint (6 x 2y = 0.449 > 0). To increase
the accuracy with which the constraints are enforced, increase the penalty parameters. It is
very helpful to use the final estimate from the more modest penalty parameters as the initial
guess for the larger parameters. With 1 = 1 = 50 we obtain x = 3.836 and y = 1.008.
Increasing 1 = 1 = 500 we obtain x = 3.947, y = 1.003. The actual answer is x = 4, y = 1.
Increasing the penalty parameters does improve the accuracy of the final answer, but it will
also slow down the unconstrained algorithms convergence, for f(x) will then have a very
large gradient and the algorithm will spend a lot of time hunting for an accurate minimum.
Under appropriate assumptions one can prove that as the penalty parameters are increased without bound, any convergent subsequence of solutions to the unconstrained penalized problems must converge to a solution of the original constrained problem.
where the ri > 0. Notice that f(x) is undefined if any gi (x) 0, so we can only evaluate
f in the interior of the feasible region. However, even inside the feasible region the penalty
term is non-zero (but it becomes an anti-penalty if gi 1).
Suppose we start some choice for the ri and with initial feasible point x0 , and minimize
f. The terminal point xk , must be a feasible point, because the log terms in the definition
of f form a barrier of infinite height which prevents the optimization routine from leaving
the interior of the feasible region.
Example: Heres a 1D example with objective function f (t) = t4 and constraint t 1.
5
60
40
20
2
t
In general a barrier method works in a similar way to the penalty methods above. We start
with some positive ri and feasible point x0 . Minimize f using an unconstrained algorithm.
Now decrease the value of the ri and re-optimize, using the final iterate as an initial guess
for the newly decreased ri . Continue until an acceptable minimum is found.
One point on which we need to be careful is the line searchyou dont want to evaluate
f at any point outside the feasible set (or at least you need to deal with this gracefully).
Example: Let f (x, y) = x2 + y 2 . We want to minimize f subject to 6 x 2y 0.
If we take r1 = 5 in the definition of f (so f(x, y) = x2 + y 2 5 ln(x + 2y 6)) and start
with feasible point (5, 5) we obtain a minimum at (1.53, 3.05). Decreasing r1 to 0.5 gives a
minimum at (1.24, 2.48), and decreasing r1 to 0.05 gives a minimum at (1.204, 2.408) (the
true minimum is at (1.2, 2.4)).
One issue in using a barrier method is that of finding an initial feasible point which
is in the interior of the feasible region. In many cases such a point will be obvious from
considerations specific to the problem. If not, it can be rather difficult to find such a point
(or perhaps prove that the feasible region is in fact empty if the constraints are mutually
exclusive). One idea would be to use penalty functions, but on constraints gi (x) < 0
with f 0. If a solution a to this can be found with f(a) = 0 then a is a feasible point
which is in the interior of the feasible region defined by gi (x) 0.