0% found this document useful (0 votes)
15 views18 pages

Kuhun tucker and gradient method

Uploaded by

siddharth singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views18 pages

Kuhun tucker and gradient method

Uploaded by

siddharth singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

11.

GRADIENT-BASED NONLINEAR
OPTIMIZATION METHODS

While linear programming and its various related methods provide powerful tools for use in
water resources systems analysis, there are many water-related problems that cannot be ade-
quately represented only in terms of a linear objective function and linear constraints. When the
objective function and/or some or all of the constraints of a problem are nonlinear, other problem
solution methods must be used. A class of such solution techniques used information about the
problem geometry obtained from the gradient of the objective function. These solution methods
are collectively categorized here as “gradient-based” methods.

The purpose of this section is to provide a simple introduction to gradient solution methods.
Those interested in greater detail regarding the many gradient-based methods and the mathemati-
cal theory upon which they are based should refer to Wagner (1975), MacMillan (1975), among
many others.

11.1 INTRODUCTION TO NONLINEAR PROBLEMS

11.1.1 Convex and Concave Functions

The geometry of nonlinear problems places certain requirements on the topology of the objective
function and constraint set before the solution found by certain gradient methods can be guaran-
teed to be an optimum solution. In Section 3 of these notes, the concepts of convex and concave
sets were introduced. The notions of convex and concave functions were illustrated in Section 8,
but now we offer more formal definitions.

A function is convex if it satisfies the following inequality:

f(x) ≤ q f(x1) + (1 - q) f(x2) ...[11.1]

where the point:

x = x(q) = q x1 + (1 - q) x2 ...[11.2]

lies on a line segment that connects the points x1 and x2 (0 ≤ q ≤ 1). This notion is illustrated in
Figure 11.1 for a nonlinear function of two variables.

A function, f(x), is said to be concave if:

f(x) = f(q x1 + (1 - q) x2) ≥ q f(x1) + (1 - q) f(x2) ...[11.3]

where q and the n-tuple x are as defined above.

115
f(x) = f(x1,x 2)
f(x 1 ) = f(x11 ,x12 )
q f(x 1 ) + (1 - q) f(x 2 )

f(x)

x2

x12
x 1 = (x11 ,x12 ) f(x 2 ) = f(x 21 ,x 22 )

x 22
x 2 = (x 21 ,x 22 )
x1
x11 x 21

Figure 11.1: A Convex Function of Two Variables

11.1.2 Kuhn-Tucker Conditions

The Kuhn-Tucker conditions extend the concept of Lagrange multipliers to mathematical models
with active and inactive inequality constraints. They supply a set of necessary and sufficient
conditions to determine if a solution to a nonlinear optimization problem is a global optimum.

Consider a problem where the objective function and constraints might be nonlinear:

Max/min Z = f(x) ...[11.4]

s.t.: gi(x) ≤ bi (i = 1, 2, ...) ...[11.5]

gk(x) ≥ bk (k = 1, 2, ...) ...[11.6]

gm(x) = bm (m = 1, 2, ...) ...[11.7]

x≥0 ...[11.8]

116
Ï x1 ¸
Ôx Ô
where x = Ì 2 ˝ .
Ô ... Ô
Óx n ˛

The approach in using the Kuhn-Tucker conditions to obtain a solution to a nonlinear optimiza-
tion problem is to introduce Lagrange multipliers to transform the problem into a single objective
function with no constraints. Applying this to Expressions [11.4] through [11.7] gives:

L(x,l,m,n) = f(x) + Â li (bi - gi(x)) + Â mk (bk - gk(x))


i k

+ Â nm (bm - gm(x)) ...[11.9]


m

where l, m, and n are vectors of control variables/Lagrange multipliers associated with the “less-
than”, “greater-than”, and “equal-to” constraints, respectively.

Necessary Conditions. The necessary conditions for a solution to [11.4] through [11.7] to be a
critical or stationary point are:

—x L = 0 ...[11.10]

—l L = 0 ...[11.11]

—m L = 0 ...[11.12]

—u L = 0 ...[11.13]

Sufficient Conditions: These conditions (i.e., Equations [11.10] through [11.13]) are identical to
the requirements of Lagrange multipliers for equality constraints. The Kuhn-Tucker conditions,
however, specify additional requirements for a stationary point to be a global optimum. Further,
the specifications differ, depending upon whether is a global minimum or a global maximum that
is sought for the objective function.

For a minimization problem, such as:

Min Z = f(x) ...[11.14]

s.t.: gi(x) ≤ bi (i = 1, 2, ...) ...[11.15]

gk(x) ≥ bk (k = 1, 2, ...) ...[11.16]

gm(x) = bm (m = 1, 2, ...) ...[11.17]

117
the Kuhn-Tucker approach becomes:

Min L(x,l,m,n) = f(x) + Â li (bi - gi(x)) + Â mk (bk - gk(x))


i k

+ Â nm (bm - gm(x)) ...[11.18]


m

s.t.:

necessary conditions:

∂L
(1) xj ≥ 0 and ≥0 (j = 1, 2, ..., n) ...[11.19]
∂x j

(2a) If li = 0, then bi - gi(x) ≥ 0 (inactive constraint) ...[11.20a]

(2b) Else if bi - gi(x) = 0, then li ≤ 0 (active constraint) ...[11.20b]

(3a) If mk = 0, then bk - gk(x) ≤ 0 (inactive constraint) ...[11.21a]

(3b) Else if bk - gk(x) = 0, then mk ≥ 0 (active constraint) ...[11.21b]

(4) nm is unrestricted in sign and bm - gm(x) = 0 ...[11.22]

sufficient conditions:

(5) f(x) is a convex function

(6a) gi(x) is a convex function

(6b) gk(x) is a concave function

(6c) gm(x) is a linear function

For a maximization problem, such as:

Max Z = f(x) ...[11.23]

s.t.: gi(x) ≤ bi (i = 1, 2, ...) ...[11.24]

gk(x) ≥ bk (k = 1, 2, ...) ...[11.25]

118
gm(x) = bm (m = 1, 2, ...) ...[11.26]

the Kuhn-Tucker formulation and conditions become:

Max L(x,l,m,n) = f(x) + Â li (bi - gi(x)) + Â mk (bk - gk(x))


i k

+ Â nm (bm - gm(x)) ...[11.27]


m

s.t.:

necessary conditions:

∂L
(1) xj ≥ 0 and ≤0 (j = 1, 2, ..., n) ...[11.28]
∂x j

(2a) If li = 0, then bi - gi(x) ≥ 0 (inactive constraint) ...[11.29a]

(2b) Else if bi - gi(x) = 0, then li ≥ 0 (active constraint) ...[11.29b]

(3a) If mk = 0, then bk - gk(x) ≤ 0 (inactive constraint) ...[11.30a]

(3b) Else if bk - gk(x) = 0, then mk ≤ 0 (active constraint) ...[11.30b]

(4) nm is unrestricted in sign and bm - gm(x) = 0 ...[11.31]

sufficient conditions:

(5) f(x) is a concave function

(6a) gi(x) is a convex function

(6b) gk(x) is a concave function

(6c) gm(x) is a linear function

Note that neither the Kuhn-Tucker conditions nor the Lagrangian approach require xi ≥ 0 (as
opposed to the LP simplex method).

119
11.1.3 Example of the Kuhn-Tucker Method

Solve the following problem by applying the Kuhn-Tucker conditions:

Min f = 2 x1 + x1 x2 + 3 x2 ...[11.32]

s.t.: x 21 + x2 ≥ 3 ...[11.33]

Solution: First, re-write the nonlinear constraint as:

g = x 21 + x2 - 3 ≥ 0 ...[11.34]

and formulate the Kuhn-Tucker problem and conditions:

Min h = 2 x1 + x1 x2 + 3 x2 - l (x 21 + x2 - 3) ...[11.35]

∂h
(1) = 2 + x2 - 2 l x1 = 0 ...[11.36]
∂x1

∂h
(2) = x1 + 3 - l = 0 ...[11.37]
∂x 2

∂h
(3) = x 21 + x2 - 3 ≥ 0 ...[11.38]
∂l

(4) l (x 21 + x2 - 3) = 0 ...[11.39]

(5) l ≥0 ...[11.40]

The solution approach for resolving Expressions [11.36] through [11.40] is to assume the con-
straint is non-binding (i.e., assume l = 0) and then obtain a solution for x1, x2, and g. This solu-
tion is checked to verify that the other Kuhn-Tucker conditions are met. If so, the solution is
optimal. If not, the assumption must have been incorrect, and the constraint must be binding. In
this case, a different solution for x1, x2, and l can be found.

Setting l = 0 and solving [11.36] and [11.37] yields: x1 = -3, x2 = -2, and g = 4 ≥ 0.

11.1.4 Geometric Interpretation of the Kuhn-Tucker Conditions

For a maximization or minimization problem subject to a single, equality constraint, the Lagran-
gian solution seeks a point where the slope of the objective function, f, is equal to the slope of
the constraint, g. As illustrated in Figure 11.2, this occurs at a point, A.

120
f, g

f = k3
A

f = k2
g

f = k1

Figure 11.2: Lagrangian Solution to an Optimization Problem


with an Equality Constraint

For a maximization or minimization problem subject to multiple equality constraints, the


Lagrangian seeks a point where the gradient of the objective function, f, is within the “feasible
cone” of slopes defined by the perpendiculars to the equality constraints, i.e., the g's, as shown in
Figure 11.3. That is, if f has a gradient within the “cone”, q, then the Lagrangian will find point
A as the optimal solution.

The same is true of the Kuhn-Tucker inequalities, but instead of a one-point solution space, there
are an infinite number of possible solutions, as illustrated in Figure 11.4. However, the cone of
feasible vectors has the same meaning as in the Lagrangian condition for equality constraints.

121
x2
D
f

D
g2
D
q g1

A
f = constant
g2

g1
x1

Figure 11.3: Lagrangian Solution of a Problem with Multiple


Nonlinear Equality Constraints

x2
D
f

D
g2
D
q g1

A
f = constant
g2

feasible
region g1
x1

Figure 11.4: Kuhn-Tucker Solution of a Problem with Multiple


Nonlinear Inequality Constraints

122
11.2 GRADIENT-BASED OPTIMIZATION

11.2.1 Overview

A number of gradient-based methods are available for solving constrained and unconstrained
nonlinear optimization problems. A common characteristic of all of these methods is that they
employ a numerical technique to calculate a direction in n-space in which to search for a better
estimate of the optimum solution to a nonlinear problem. This search direction relies on the
estimation of the value of the gradient of the objective function at a given point.

Gradient-based methods are used to solve nonlinear constrained or unconstrained problems


where other techniques:

• are not feasible (e.g., LP)

• do not yield desired information about the problem geometry (e.g., DP)

Gradient-based methods have the advantages that they are applicable to a broader class of prob-
lems than LP and they provide much more information about the problem geometry. They have
the disadvantages that they are computationally complex and costly, and they are much more
mathematically sophisticated and difficult than LP.

11.2.2 Common Gradient-Based Methods

The most commonly used gradient techniques are:

• steepest ascent (or, for constrained problems, steepest feasible ascent)

• conjugate gradient

• reduced gradient

Each of these methods can be found in commercially available mathematical programming soft-
ware. The method of steepest ascent is mathematically simple and easy to program, but con-
verges to an optimal solution only slowly. On the other end of the spectrum is the reduced
gradient method, which has a high rate of convergence, but is much more mathematically com-
plex and difficult to program.

11.2.3 An Example of the Method of Steepest Ascent

Before presenting an example of the method of steepest feasible ascent, we must develop some
terminology. Define the gradient of a function, f(x), to be the vector of first partial derivatives of
the function:

123
Ï ∂f ¸
Ô ∂x Ô
Ô 1Ô
Ô ∂f Ô
Ô Ô
—f = Ì ∂x ˝ ...[11.41]
Ô . . 2. Ô
Ô Ô
Ô ∂f Ô
Ô Ô
Ó ∂x n ˛

The value of —f at a point Po [where Po is an n-tuple equal to (x1o, x2o, ..., xno)] is a vector, VPo,
with entries equal to

Ï ∂f ¸
Ô Ô
Ô ∂x1 Po Ô
Ô Ô
Ô ∂f Ô
Ô Ô
VPo = —f Po = Ì ∂x 2 ˝ ...[11.42]
Ô Po Ô
Ô . . . Ô
Ô Ô
Ô ∂f Ô
Ô ∂x n Ô
Ó P ˛
o

The vector, VPo, gives the direction in n-space of the steepest ascent from the point Po, i.e., the
direction in which the rate of increase in the objective function, f, is maximum.

Define a unit vector in the direction of VPo to be vPo, calculated as

VPo
vP o = ...[11.43]
VP
o

where VPo is the norm of the gradient vector.

Consider the following optimization problem:

Max f(x1,x2) = 7 x1 + 4 x2 + x1 x2 - x 21 - x 22 ...[11.44]

2
s.t.: x1 + x2 ≤ 8 ...[11.45]
3

124
5
- x1 + x2 ≤ 2 ...[11.46]
12

x2 ≤ 4 ...[11.47]

x1, x2 ≥ 0 ...[11.48]

The constraint set for this problem is illustrated in Figure 11.5. Note that the solution to the
unconstrained problem is x = (x1,x2) = (6,5).

The method of steepest feasible ascent will be used to solve the constrained problem. This
method is an iterative procedure whereby, with each iteration, information is garnered at a par-
ticular point in solution space to determine the direction and distance to go to find another point
which is feasible and at which an improvement in the objective function is found. This continues
until no further improvement in the objective function is possible.

To begin, evaluate an expression for the gradient:

Ï7 + x - 2 x ¸
—f = Ì4 + x 2 - 2 x 1 ˝ ...[11.49]
Ó 1 2˛

Assume a starting point of Po = (8,2) (refer to Figure 11.5), and calculate the value of the gradi-
ent at this point:

Ï7 + 2 - 2 (8) ¸ Ï-7¸
VPo = —f Po = Ì4 + 8 - 2 (2) ˝ = Ì 8 ˝ ...[11.50]
Ó ˛ Ó ˛

and calculate a unit vector pointing in this direction:

VPo Ï-0.659¸
vP o = = Ì 0.753 ˝ ...[11.51]
VPo Ó ˛

Note, as shown in Figure 11.5, moving from the point Po in the direction of vPo will violate a
constraint before the unconstrained optimum is reached.

If we assume the constrained optimum lies at a corner point of the constraint set, all we have to
do is project from Po along vPo to the nearest constraint. Thence we can move along that con-
straint to a corner point, and then to another corner, etc., until an optimum is found. If the opti-
mum is not at a corner point, then there will be an iteration in our search when the optimum
search direction (as projected along a constraint) will reverse itself. If this happens, we will
know that we have overshot the optimum, and that we must search for it between the most recent
two corner points we have visited.

125
x2

8
Unconstrained Solution
(6,5)
7

6 2 x1 + x 2 ≤ 8
3 GP
1
5
5
- x1 + x 2 ≤ 2
12
P3 P2
4

126
P1
3
x 2 ≤4
vP
o

2
Po = (8,2)

1 feasible
set

x1
0 1 2 3 4 5 6 7 8 9 10 11 12

Figure 11.5: Steepest Ascent Method Search


We can determine how far to move by calculating the distance from our current point to the
nearest constraint in the direction of vPo. To do this, we must compute the following for each
constraint:

ai =
[
ci - x O
1
xO
2 ] ÏÌÓab ¸˝˛
i
i
...[11.52]
Ïa ¸
[v1 v2 ] Ì i ˝
Ó bi ˛

where:

ai is the distance from the current point, Po, to constraint i, in the direction of vPo

[xO1,x O2] are the coordinates of the current point, Po

Ïa i ¸
Ìb ˝ are the coefficients on x1 and x2 in constraint i
Ó i˛

[v1,v2] are the elements of the vector vPo

Table 11.1 summarizes the values of ai for the five constraints of the problem.

Table 11.1: Distance in the Direction of vPo from


Point Po to Each Constraint
Ïa ¸ Distance from Po to the
Constraint Coefficients, Ìb i ˝
Constraint Ó i˛ Constraint, ai
1 2 Ï 2 ¸ ( ) 2.126
x1 + x2 ≤ 8 Ì 3 ˝
3
Ó 1 ˛
2 5
- x1 + x2 ≤ 2
12
Ï -5 ¸
Ì ( )
12 ˝
3.246
Ó 1 ˛
3 x2 ≤ 4 Ï0 ¸ 2.658
Ì1 ˝
Ó ˛
4 x1 ≥ 0 Ï1 ¸ 12.15
Ì0 ˝
Ó ˛
5 x2 ≥ 0 Ï0 ¸ -2.658
Ì1 ˝
Ó ˛

127
To determine the next point in our search, we select that distance in Table 11.1 which is smallest,
but non-negative. Clearly, in the direction of , the first constraint is the closest, and to get to that
constraint we must move a distance of 2.2 units. The new point, P1, can be found from:

P1 = Po + a1 (v1,v2) = (8 + a1 (-0.659), 2 + a1 (0.753)) = (6.599, 3.601) ...[11.53]

and is shown in Figure 11.5.

From P1, we wish to move along constraint 1 (i.e., the currently binding constraint) in the direc-
tion of a vector, G 1, that results from the projection of the gradient (calculated at P1) onto the
constraint. The direction of the “gradient projection” is the direction of “steepest feasible
ascent”.

To find G1 (projected from P1), we must make the following calculations:

Ïb ¸
[r s] ÌÓ-ak ˝˛
k
K= ...[11.54]
Ïa k ¸
[ ak bk ] Ì ˝
Ób k ˛

and

Ïb ¸
G = K Ì-ak ˝ ...[11.55]
Ó k˛

where:

Ï r¸
Ìs˝ = —f P1
Ó ˛

and ak and bk are coefficients on x1 and x2, respectively, of the constraint along which G is to be
projected.

The gradient evaluated at P1 is:

Ï r¸ Ï-2.597¸
Ìs˝ = —f P1 = Ì 3.397 ˝ ...[11.56]
Ó ˛ Ó ˛

From this, K and G1 can be calculated as:

128
Ï 1 ¸
[-2.597 3.397] Ì- 2 ˝
K=
( )
Ó 3 ˛
= -3.366 ...[11.57]
Ï 1 ¸
[( ) ] ( )
2
3
1 Ì 2 ˝
Ó 3 ˛

Ï 1 ¸ Ï-3.366¸
G1 = -3.366 Ì- 2 ˝ = Ì 2.244 ˝ ...[11.58]
Ó 3 ˛ Ó( ) ˛

This establishes a new search direction from point P1 in which to travel to find a corner point
with a (hopefully) better objective function value. A unit vector in this direction is:

G1 Ï-0.832¸
vP 1 = =Ì ˝ ...[11.59]
G1 Ó 0.555 ˛

We can now reproduce the calculations shown in Table 11.1, except that we are projecting from
point P1 along a vector specified by v P1. The nearest constraint in that direction will be con-
straint 3, at a distance of a3 = 0.71. A new search point, P2, can now be established from:

P2 = P1 + a3 (v1,v2) = (6.599 + 0.71 (-0.832), 3.601 + 0.71 (0.555))

= (6.0,4.0) ...[11.60]

As shown in Figure 11.5, point P2 is at the intersection between constraints 1 and 3.

A similar set of calculations will yield a point P3 (at the intersection of constraints 2 and 3).
However, evaluation of the gradient projection at point P3 indicates that the optimum solution
lies in a direction back toward point P2. This means that the optimum is not at a corner point, but
somewhere between points P2 and P3. We must invoke another search technique to look
between these points for the solution.

Any number of search methods could be used to evaluate the interval between P2 and P3 to find
the constrained optimum (e.g., bisection search, Golden Section search, etc.). One very efficient
method for this particular problem would be to parameterize the objective function in terms of
the position of a point along a line segment connecting P2 and P3, and then maximize the param-
eterized equation. It can be done as follows:

Let: (y1,y2) be the coordinates of P2, and

(z1,z2) be the coordinates of P3.

Now, define:

129
x1 = y1 + t (z1 - y1) ...[11.61]

and

x2 = y2 + t (z2 - y2) ...[11.62]

We can specify the coordinates of any arbitrary point between P2 and P3 simply by picking the
appropriate value of t, where 0 ≤ t ≤ 1. In particular, we can substitute Equations [11.61] and
[11.62] into the objective function for x1 and x2, turning the objective into a function of a single
variable, t. The maximum of the objective function, f, can now be found simply by differentiat-
ing it with respect to t, setting the result equal to zero, and solving for t.

130
11.3 PROBLEMS

1. Complete the solution to the constrained nonlinear problem of Section 11.2 by:

a. performing the calculations to find points P2 and P3

b. solving the parameterization problem at the end of the section (i.e., finding the opti-
mal solution which lies somewhere between P2 and P3)

2. A narrow, confined aquifer is penetrated by three wells (refer to Figure 11.6, below).
Assume the aquifer is isotropic and homogeneous, that the land surface is horizontal, and
that datum is at the land surface. Within the aquifer, the governing equation for porous
media flow is:

2 2
∂ h ∂ h ˜
+ =Q ...[11.63]
∂x 2 ∂y 2

˜ = Q ˜
where Q , and Q is pumping rate from a well located at point (x,y). Units of Q
10,000
are inverse length.

If the combined output of the three wells must be at least 2 gpm, determine the optimal
pumping rates for each of the wells in order to minimize total cost of pumping.

131
wells located at: (400,400), (500,200), and (500,400)
y (all distances in hundreds of feet)

7 ∂h
=0
∂y
h = 0 ft
6

4 ∂h
=0
∂x

3
wells
2

0 1 2 3 4 5 6 7 x

∂h
=0
∂y

Figure 11.6: A Simple Groundwater Optimization Problem

132

You might also like