Kuhun tucker and gradient method
Kuhun tucker and gradient method
GRADIENT-BASED NONLINEAR
OPTIMIZATION METHODS
While linear programming and its various related methods provide powerful tools for use in
water resources systems analysis, there are many water-related problems that cannot be ade-
quately represented only in terms of a linear objective function and linear constraints. When the
objective function and/or some or all of the constraints of a problem are nonlinear, other problem
solution methods must be used. A class of such solution techniques used information about the
problem geometry obtained from the gradient of the objective function. These solution methods
are collectively categorized here as “gradient-based” methods.
The purpose of this section is to provide a simple introduction to gradient solution methods.
Those interested in greater detail regarding the many gradient-based methods and the mathemati-
cal theory upon which they are based should refer to Wagner (1975), MacMillan (1975), among
many others.
The geometry of nonlinear problems places certain requirements on the topology of the objective
function and constraint set before the solution found by certain gradient methods can be guaran-
teed to be an optimum solution. In Section 3 of these notes, the concepts of convex and concave
sets were introduced. The notions of convex and concave functions were illustrated in Section 8,
but now we offer more formal definitions.
x = x(q) = q x1 + (1 - q) x2 ...[11.2]
lies on a line segment that connects the points x1 and x2 (0 ≤ q ≤ 1). This notion is illustrated in
Figure 11.1 for a nonlinear function of two variables.
115
f(x) = f(x1,x 2)
f(x 1 ) = f(x11 ,x12 )
q f(x 1 ) + (1 - q) f(x 2 )
f(x)
x2
x12
x 1 = (x11 ,x12 ) f(x 2 ) = f(x 21 ,x 22 )
x 22
x 2 = (x 21 ,x 22 )
x1
x11 x 21
The Kuhn-Tucker conditions extend the concept of Lagrange multipliers to mathematical models
with active and inactive inequality constraints. They supply a set of necessary and sufficient
conditions to determine if a solution to a nonlinear optimization problem is a global optimum.
Consider a problem where the objective function and constraints might be nonlinear:
x≥0 ...[11.8]
116
Ï x1 ¸
Ôx Ô
where x = Ì 2 ˝ .
Ô ... Ô
Óx n ˛
The approach in using the Kuhn-Tucker conditions to obtain a solution to a nonlinear optimiza-
tion problem is to introduce Lagrange multipliers to transform the problem into a single objective
function with no constraints. Applying this to Expressions [11.4] through [11.7] gives:
where l, m, and n are vectors of control variables/Lagrange multipliers associated with the “less-
than”, “greater-than”, and “equal-to” constraints, respectively.
Necessary Conditions. The necessary conditions for a solution to [11.4] through [11.7] to be a
critical or stationary point are:
—x L = 0 ...[11.10]
—l L = 0 ...[11.11]
—m L = 0 ...[11.12]
—u L = 0 ...[11.13]
Sufficient Conditions: These conditions (i.e., Equations [11.10] through [11.13]) are identical to
the requirements of Lagrange multipliers for equality constraints. The Kuhn-Tucker conditions,
however, specify additional requirements for a stationary point to be a global optimum. Further,
the specifications differ, depending upon whether is a global minimum or a global maximum that
is sought for the objective function.
117
the Kuhn-Tucker approach becomes:
s.t.:
necessary conditions:
∂L
(1) xj ≥ 0 and ≥0 (j = 1, 2, ..., n) ...[11.19]
∂x j
sufficient conditions:
118
gm(x) = bm (m = 1, 2, ...) ...[11.26]
s.t.:
necessary conditions:
∂L
(1) xj ≥ 0 and ≤0 (j = 1, 2, ..., n) ...[11.28]
∂x j
sufficient conditions:
Note that neither the Kuhn-Tucker conditions nor the Lagrangian approach require xi ≥ 0 (as
opposed to the LP simplex method).
119
11.1.3 Example of the Kuhn-Tucker Method
Min f = 2 x1 + x1 x2 + 3 x2 ...[11.32]
s.t.: x 21 + x2 ≥ 3 ...[11.33]
g = x 21 + x2 - 3 ≥ 0 ...[11.34]
Min h = 2 x1 + x1 x2 + 3 x2 - l (x 21 + x2 - 3) ...[11.35]
∂h
(1) = 2 + x2 - 2 l x1 = 0 ...[11.36]
∂x1
∂h
(2) = x1 + 3 - l = 0 ...[11.37]
∂x 2
∂h
(3) = x 21 + x2 - 3 ≥ 0 ...[11.38]
∂l
(4) l (x 21 + x2 - 3) = 0 ...[11.39]
(5) l ≥0 ...[11.40]
The solution approach for resolving Expressions [11.36] through [11.40] is to assume the con-
straint is non-binding (i.e., assume l = 0) and then obtain a solution for x1, x2, and g. This solu-
tion is checked to verify that the other Kuhn-Tucker conditions are met. If so, the solution is
optimal. If not, the assumption must have been incorrect, and the constraint must be binding. In
this case, a different solution for x1, x2, and l can be found.
Setting l = 0 and solving [11.36] and [11.37] yields: x1 = -3, x2 = -2, and g = 4 ≥ 0.
For a maximization or minimization problem subject to a single, equality constraint, the Lagran-
gian solution seeks a point where the slope of the objective function, f, is equal to the slope of
the constraint, g. As illustrated in Figure 11.2, this occurs at a point, A.
120
f, g
f = k3
A
f = k2
g
f = k1
The same is true of the Kuhn-Tucker inequalities, but instead of a one-point solution space, there
are an infinite number of possible solutions, as illustrated in Figure 11.4. However, the cone of
feasible vectors has the same meaning as in the Lagrangian condition for equality constraints.
121
x2
D
f
D
g2
D
q g1
A
f = constant
g2
g1
x1
x2
D
f
D
g2
D
q g1
A
f = constant
g2
feasible
region g1
x1
122
11.2 GRADIENT-BASED OPTIMIZATION
11.2.1 Overview
A number of gradient-based methods are available for solving constrained and unconstrained
nonlinear optimization problems. A common characteristic of all of these methods is that they
employ a numerical technique to calculate a direction in n-space in which to search for a better
estimate of the optimum solution to a nonlinear problem. This search direction relies on the
estimation of the value of the gradient of the objective function at a given point.
• do not yield desired information about the problem geometry (e.g., DP)
Gradient-based methods have the advantages that they are applicable to a broader class of prob-
lems than LP and they provide much more information about the problem geometry. They have
the disadvantages that they are computationally complex and costly, and they are much more
mathematically sophisticated and difficult than LP.
• conjugate gradient
• reduced gradient
Each of these methods can be found in commercially available mathematical programming soft-
ware. The method of steepest ascent is mathematically simple and easy to program, but con-
verges to an optimal solution only slowly. On the other end of the spectrum is the reduced
gradient method, which has a high rate of convergence, but is much more mathematically com-
plex and difficult to program.
Before presenting an example of the method of steepest feasible ascent, we must develop some
terminology. Define the gradient of a function, f(x), to be the vector of first partial derivatives of
the function:
123
Ï ∂f ¸
Ô ∂x Ô
Ô 1Ô
Ô ∂f Ô
Ô Ô
—f = Ì ∂x ˝ ...[11.41]
Ô . . 2. Ô
Ô Ô
Ô ∂f Ô
Ô Ô
Ó ∂x n ˛
The value of —f at a point Po [where Po is an n-tuple equal to (x1o, x2o, ..., xno)] is a vector, VPo,
with entries equal to
Ï ∂f ¸
Ô Ô
Ô ∂x1 Po Ô
Ô Ô
Ô ∂f Ô
Ô Ô
VPo = —f Po = Ì ∂x 2 ˝ ...[11.42]
Ô Po Ô
Ô . . . Ô
Ô Ô
Ô ∂f Ô
Ô ∂x n Ô
Ó P ˛
o
The vector, VPo, gives the direction in n-space of the steepest ascent from the point Po, i.e., the
direction in which the rate of increase in the objective function, f, is maximum.
VPo
vP o = ...[11.43]
VP
o
2
s.t.: x1 + x2 ≤ 8 ...[11.45]
3
124
5
- x1 + x2 ≤ 2 ...[11.46]
12
x2 ≤ 4 ...[11.47]
x1, x2 ≥ 0 ...[11.48]
The constraint set for this problem is illustrated in Figure 11.5. Note that the solution to the
unconstrained problem is x = (x1,x2) = (6,5).
The method of steepest feasible ascent will be used to solve the constrained problem. This
method is an iterative procedure whereby, with each iteration, information is garnered at a par-
ticular point in solution space to determine the direction and distance to go to find another point
which is feasible and at which an improvement in the objective function is found. This continues
until no further improvement in the objective function is possible.
Ï7 + x - 2 x ¸
—f = Ì4 + x 2 - 2 x 1 ˝ ...[11.49]
Ó 1 2˛
Assume a starting point of Po = (8,2) (refer to Figure 11.5), and calculate the value of the gradi-
ent at this point:
Ï7 + 2 - 2 (8) ¸ Ï-7¸
VPo = —f Po = Ì4 + 8 - 2 (2) ˝ = Ì 8 ˝ ...[11.50]
Ó ˛ Ó ˛
VPo Ï-0.659¸
vP o = = Ì 0.753 ˝ ...[11.51]
VPo Ó ˛
Note, as shown in Figure 11.5, moving from the point Po in the direction of vPo will violate a
constraint before the unconstrained optimum is reached.
If we assume the constrained optimum lies at a corner point of the constraint set, all we have to
do is project from Po along vPo to the nearest constraint. Thence we can move along that con-
straint to a corner point, and then to another corner, etc., until an optimum is found. If the opti-
mum is not at a corner point, then there will be an iteration in our search when the optimum
search direction (as projected along a constraint) will reverse itself. If this happens, we will
know that we have overshot the optimum, and that we must search for it between the most recent
two corner points we have visited.
125
x2
8
Unconstrained Solution
(6,5)
7
6 2 x1 + x 2 ≤ 8
3 GP
1
5
5
- x1 + x 2 ≤ 2
12
P3 P2
4
126
P1
3
x 2 ≤4
vP
o
2
Po = (8,2)
1 feasible
set
x1
0 1 2 3 4 5 6 7 8 9 10 11 12
ai =
[
ci - x O
1
xO
2 ] ÏÌÓab ¸˝˛
i
i
...[11.52]
Ïa ¸
[v1 v2 ] Ì i ˝
Ó bi ˛
where:
ai is the distance from the current point, Po, to constraint i, in the direction of vPo
Ïa i ¸
Ìb ˝ are the coefficients on x1 and x2 in constraint i
Ó i˛
Table 11.1 summarizes the values of ai for the five constraints of the problem.
127
To determine the next point in our search, we select that distance in Table 11.1 which is smallest,
but non-negative. Clearly, in the direction of , the first constraint is the closest, and to get to that
constraint we must move a distance of 2.2 units. The new point, P1, can be found from:
From P1, we wish to move along constraint 1 (i.e., the currently binding constraint) in the direc-
tion of a vector, G 1, that results from the projection of the gradient (calculated at P1) onto the
constraint. The direction of the “gradient projection” is the direction of “steepest feasible
ascent”.
Ïb ¸
[r s] ÌÓ-ak ˝˛
k
K= ...[11.54]
Ïa k ¸
[ ak bk ] Ì ˝
Ób k ˛
and
Ïb ¸
G = K Ì-ak ˝ ...[11.55]
Ó k˛
where:
Ï r¸
Ìs˝ = —f P1
Ó ˛
and ak and bk are coefficients on x1 and x2, respectively, of the constraint along which G is to be
projected.
Ï r¸ Ï-2.597¸
Ìs˝ = —f P1 = Ì 3.397 ˝ ...[11.56]
Ó ˛ Ó ˛
128
Ï 1 ¸
[-2.597 3.397] Ì- 2 ˝
K=
( )
Ó 3 ˛
= -3.366 ...[11.57]
Ï 1 ¸
[( ) ] ( )
2
3
1 Ì 2 ˝
Ó 3 ˛
Ï 1 ¸ Ï-3.366¸
G1 = -3.366 Ì- 2 ˝ = Ì 2.244 ˝ ...[11.58]
Ó 3 ˛ Ó( ) ˛
This establishes a new search direction from point P1 in which to travel to find a corner point
with a (hopefully) better objective function value. A unit vector in this direction is:
G1 Ï-0.832¸
vP 1 = =Ì ˝ ...[11.59]
G1 Ó 0.555 ˛
We can now reproduce the calculations shown in Table 11.1, except that we are projecting from
point P1 along a vector specified by v P1. The nearest constraint in that direction will be con-
straint 3, at a distance of a3 = 0.71. A new search point, P2, can now be established from:
= (6.0,4.0) ...[11.60]
A similar set of calculations will yield a point P3 (at the intersection of constraints 2 and 3).
However, evaluation of the gradient projection at point P3 indicates that the optimum solution
lies in a direction back toward point P2. This means that the optimum is not at a corner point, but
somewhere between points P2 and P3. We must invoke another search technique to look
between these points for the solution.
Any number of search methods could be used to evaluate the interval between P2 and P3 to find
the constrained optimum (e.g., bisection search, Golden Section search, etc.). One very efficient
method for this particular problem would be to parameterize the objective function in terms of
the position of a point along a line segment connecting P2 and P3, and then maximize the param-
eterized equation. It can be done as follows:
Now, define:
129
x1 = y1 + t (z1 - y1) ...[11.61]
and
We can specify the coordinates of any arbitrary point between P2 and P3 simply by picking the
appropriate value of t, where 0 ≤ t ≤ 1. In particular, we can substitute Equations [11.61] and
[11.62] into the objective function for x1 and x2, turning the objective into a function of a single
variable, t. The maximum of the objective function, f, can now be found simply by differentiat-
ing it with respect to t, setting the result equal to zero, and solving for t.
130
11.3 PROBLEMS
1. Complete the solution to the constrained nonlinear problem of Section 11.2 by:
b. solving the parameterization problem at the end of the section (i.e., finding the opti-
mal solution which lies somewhere between P2 and P3)
2. A narrow, confined aquifer is penetrated by three wells (refer to Figure 11.6, below).
Assume the aquifer is isotropic and homogeneous, that the land surface is horizontal, and
that datum is at the land surface. Within the aquifer, the governing equation for porous
media flow is:
2 2
∂ h ∂ h ˜
+ =Q ...[11.63]
∂x 2 ∂y 2
˜ = Q ˜
where Q , and Q is pumping rate from a well located at point (x,y). Units of Q
10,000
are inverse length.
If the combined output of the three wells must be at least 2 gpm, determine the optimal
pumping rates for each of the wells in order to minimize total cost of pumping.
131
wells located at: (400,400), (500,200), and (500,400)
y (all distances in hundreds of feet)
7 ∂h
=0
∂y
h = 0 ft
6
4 ∂h
=0
∂x
3
wells
2
0 1 2 3 4 5 6 7 x
∂h
=0
∂y
132