0% found this document useful (0 votes)
193 views

Slack Variable

1) Inequality constraints in optimization problems can be converted to equality constraints by introducing slack variables. This allows the use of Lagrange multipliers to derive necessary conditions for an optimum. 2) There are three cases for the optimal values of the slack variables and Lagrange multipliers: both equal 0; slack variable equals 0 and multiplier does not; or multiplier equals 0 and slack variable does not. 3) A one-dimensional example problem is presented to illustrate the cases and interpretation of the KKT conditions for an inequality constrained problem.

Uploaded by

tapas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
193 views

Slack Variable

1) Inequality constraints in optimization problems can be converted to equality constraints by introducing slack variables. This allows the use of Lagrange multipliers to derive necessary conditions for an optimum. 2) There are three cases for the optimal values of the slack variables and Lagrange multipliers: both equal 0; slack variable equals 0 and multiplier does not; or multiplier equals 0 and slack variable does not. 3) A one-dimensional example problem is presented to illustrate the cases and interpretation of the KKT conditions for an inequality constrained problem.

Uploaded by

tapas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

INEQUALITY CONSTRAINTS

Introduction of Slack Variables

• Consider the very general situation in which we have a nonlinear objective


function, nonlinear equality, and nonlinear inequality constraints.

• The simplest way to handle inequality constraints is to convert them to equality


constraints using slack variables and then use the Lagrange theory.

• Consider the inequality constraints

hj ( x ) ≥ 0 j = 1, 2…, r

and define the real-valued slack variables θ j such that

2
θj = hj ( x ) ≥ 0 j = 1, 2, …, r

but at the expense of introducing r new variables.

• If we now consider the general problem written as

minimize f(x) (1)


x

subject to hj ( x ) ≥ 0 j = 1 ( 1 )r (2)

• Introducing the slack variables:

2
hj ( x ) – θj = 0 j = 1 ( 1 )r

the Lagrangian is written as:


r
2
L ( x, λ, θ ) = f ( x ) + ∑ λj ( hj ( x ) – θj ) (3)
j=1

• The necessary conditions for an optimum are:

r
∂L ∂f ∂h j
∂ xi
=
∂ xi
+ ∑ λj
∂ xi
= 0 i = 1 ( 1 )n (4)
j=1 x = x*
*
λ = λ

∂L = h ( x ) – θ 2 j = 1 ( 1 )r
∂ λj j j x = x* = 0 (5)
*
θ = θ

∂L * *
= – 2λ j θ j = 0 j = 1 ( 1 )r (6)
∂ θj

*
• From the last expression (6), it is obvious that either λ* = 0 or θ j = 0 or both.

*
• Case 1: λ *j = 0 , θ j ≠ 0

* 2
In this case, the constraint h j ( x ) ≥ 0 is ignored since h j ( x * ) = ( θ j ) > 0

(i.e. the constraint is not binding).

*
If all λ j = 0 , then (4) implies that ∇f ( x * ) = 0 which means that the the

solution is the unconstrained minimum.


*
• Case 2: θ j = 0 , λ *j ≠ 0

In this case, we have h j ( x * ) = 0 which means that the optimal solution is on

the boundary of the j th constraint.

Since λ *j ≠ 0 this implies that ∇f ( x * ) ≠ 0 and therefore we are not at the

unconstrained minimum.

*
• Case 3: θ j = 0 and λ *j = 0 for all j.

In this case, we have h j ( x * ) = 0 for all j and ∇f ( x * ) = 0 .

Therefore, the boundary passes through the unconstrained optimum which is


also the constrained optimum.

Example: Now consider the problem


2
minimize f( x) = (x – a) + b
x

subject to: x≥c

f( x) 2
f(x) = (x – a) + b

b x≥c
a c x

Sketch of the constrained one dimensional problem.


• The location of the minimum depends on whether or not the unconstrained
minimum is inside the feasible region or not.

• If c > a then the minimum lies at x = c , which is the boundary of the feasible

reagion defined by x ≥ c .

• If c ≤ a then the minimum lies at the unconstrained minimum, x = a .

2
• Introducing a single slack variable, θ = x – c ≥ 0 :
2
x–c–θ = 0
and we can write the Lagrangian as
2 2
L ( x , λ, θ ) = ( x – a ) + b + λ ( x – c – θ )
where λ is the Lagrange multiplier.

∂L
= 2 ( x * – a ) + λ* = 0 (7)
∂x

∂L 2
= x * – c – θ* = 0 (8)
∂λ

∂L
= – 2λ* θ* = 0 (9)
∂θ

• In general, we need to know how c and a compare.


• Case 1:

*
From (9), assume λ* = 0 , θ ≠ 0 .

2
Therefore, from (7) x * = a and thus from (8), a – c – θ* = 0 which gives
2
that θ* = a – c and we have that θ* is real only for c ≤ a .

Now since λ* = 0 we have

∂f
L ( x *, λ* , θ* ) = f ( x * ) and = 0
∂x
x*

This tells us that the unconstrained minimum is the constrained minimum.

• Case 2: Now let us assume that λ* ≠ 0 , θ* = 0 .

From (8) we have x * = c and from (7) λ* = 2 ( a – c ) .

Since λ* ≠ 0 and in the previous case we had c ≤ a , now we have c > a .

• Case 3: For the case λ* = θ* = 0 , (7) tells us that x * – a = 0 and therefore

x* = a .

From (8) we have x – c = 0 and therefore x * = a = c . The uncostrained


minimum lies on the boundary since from (7)

∂L ∂f
= = 0.
∂x ∂x
x* x*
Example: As a two dimensional example, consider
2 2
minimize f (x) = ( x 1 – 3 ) + 2 ( x 2 – 5 )
x

subject to: g(x) = 2x 1 + 3x 2 – 5 ≤ 0

x 2 f (x) = ( x – 3 ) 2 + 2 ( x – 5 ) 2
1 2

4
g = 2x 1 + 3x 2 – 5
2

3 6 x1

Contours and feasible region for the example problem.

• Unless one was to draw a very accurate contour plot, it is hard to find the
minimum from such a graphical method.

• It is obvious from the graph though, that the minimum will lie on the line
g(x) = 0 .

2
• We introduce a single slack variable, θ , and construct the Lagrangian as

2 2 2
L ( x, λ, θ ) = ( x 1 – 3 ) + 2 ( x 2 – 5 ) + λ ( 2x 1 + 3x 2 – 5 + θ ) .

2
• The inequality constraint was changed to the equality constraint g(x) + θ = 0 ,
2
using the slack variable θ = – g(x) ≥ 0 .
• The necessary conditions become

∂L
= 2 ( x *1 – 3 ) + 2λ* = 0 (10)
∂ x1

∂L
= 4 ( x *2 – 5 ) + 3λ* = 0 (11)
∂ x2

∂L
= 2θ* λ* = 0 (12)
∂θ

∂L 2
= 2x *1 + 3x *2 – 5 + θ* = 0 (13)
∂λ
From (10) and (11):

x *1 = 3 – λ*

3
x *2 = 5 – --- λ*
4
substituting these expressions in (13) we have:

3 2
2 ( 3 – λ* ) + 3 ⎛ 5 – --- λ* ⎞ – 5 + θ* = 0
⎝ 4 ⎠

17 2
16 – ------ λ* + θ* = 0 .
4

If λ* = 0 then θ* will be complex. If θ* = 0 then λ* = 64 ⁄ 17 and therefore

13 37
x *1 = – ------ x *2 = ------
17 17

θ* = 0 means there is no slack in the constraint as expected from the plot.


The Kuhn-Tucker Theorem

• Kuhn-Tucker theorem gives the necessary conditions for optimum of a nonlinear


objective function constrained by a set of nonlinear inequality constraints.

• The general problem is written as


n
minimize f(x) x∈
x

subject to: g i(x) ≤ 0 i = 1, 2, …r

If we had equality constraints, then we could introduce two inequality


constraints in place of it.
For instance if it was required that h ( x ) = 0 , then we could just impose

h ( x ) ≤ 0 and h ( x ) ≥ 0 or – h(x) ≤ 0 .

• Now assume that f ( x ) and g i ( x ) are differentiable functions; The Lagrangian is:

L ( x, λ ) = f ( x ) + ∑ λi gi ( x )
i=1

The necessary conditions for x * to be the solution to the above problem are:
r
∂ ∂
f(x *) + ∑ λ*i g i ( x * ) = 0 j = 1, 2, …, n (14)
∂ xj ∂ xj
i=1

g i(x *) ≤ 0 i = 1 ( 1 )r (15)

*
λi gi ( x* ) = 0 i = 1 ( 1 )r (16)

*
λi ≥ 0 i = 1 ( 1 )r (17)
• These are known as the Kuhn-Tucker stationary conditions; written compactly
as:

∇x L ( x *, λ* ) = 0 (18)

* *
∇λ L ( x *, λ ) = g ( x ) ≤ 0 (19)

T *
( λ* ) g ( x ) = 0 (20)

*
λ ≥0 (21)

• If our problem is one of maximization instead of minimization then


n
maximize f(x) x∈
x

subject to: g i(x) ≤ 0 i = 1, 2, …r

we can replace f(x) by – f(x) in the first condition

r
∂ * ∂

∂ xj
f(x *) + ∑ λi ∂ xjgi ( x* ) = 0 j = 1, 2, …, n (22)
i=1

r
∂ ∂
f(x *) + ∑ ( – λ*i ) g i ( x * ) = 0 j = 1, 2, …, n . (23)
∂ xj ∂ xj
i=1

• For the maximization problem is one of changing the sign of λ*i :

*
∇x L ( x *, λ ) = 0 (24)

*
∇λ L ( x *, λ* ) = g ( x ) ≤ 0 (25)

* T *
(λ ) g(x ) = 0 (26)
λ* ≤ 0 (27)
Transformation via the Penalty Method

• The Kuhn-Tucker necessary conditions give us a theoretical framework for


dealing with nonlinear optimization

• From a practical computer algorithm point of view we are not much further than
we were when we started.

• We require practical methods of solving problems of the form:

n
minimize f(x) x∈ (28)
x

subject to gj ( x ) ≤ 0 j = 1 ( 1 )J (29)

h k(x) = 0 k = 1 ( 1 )K (30)

• We introduce a new objective function called the penalty function

P ( x ;R ) = f ( x ) + Ω(R, g(x), h(x))

where the vector R contains the penalty parameters and

Ω(R, g(x), h(x))

is the penalty term.

• The penalty term is a function of R and the constraint functions, g(x), h(x) .

• The purpose of the addition of this term to the objective function is to penalize
the objective function when a set of decision variables, x , which are not feasible
are chosen.
Use of a parabolic penalty term

• Consider the minimization of an objective function, f(x) with equality

constraints, h(x) .

• We create a penalty function by adding a positive coefficient times each


constraint, that is

K
2
minimize P ( x ;R ) = f ( x ) +
x ∑ Rk { hk ( x ) } . (31)
k=1

th
As the penalty parameters R k → ∞ , more weight is attached to satisfying the k

constraint.
th
If a specific parameter is chosen as zero, say R k = 0 , then the k equality

constraint is ignored.
The user specifies value of R k according to the importance of satisfying each
equality constraint.

Example:
2 2
minimize x 1 + x2
x

subject to: x2 = 1

We construct a penalty function as:


2 2 2
P ( x ;R ) = x 1 + x 2 + R ( x 2 – 1 )

and we proceed to minimizing P ( x ;R ) for particular values of R .


• We proceed analytically; first order necessary conditions for a minimum

∂P
= 2x *1 = 0 ⇒ x *1 = 0
∂ x1

∂P * * * R
= 2x 2 + 2R ( x 2 – 1 ) = 0 ⇒ x 2 = -------------
∂ x2 1+R

If we now take the limit as R → ∞ , we have

R - = 1.
x *2 = lim ------------
R→∞1+R

• In a numerical procedure, the value of the R would be increased gradually and


the numerical optimization would be performed several times.

x2

R = ∞
x2 = 1
R = 2
R = 1
x1
unconstrained minimum
R = 0
Example of the use of a parabolic penalty function.
Inequality constrained problems

• Consider the penalty method for inequality constrained problems.

• The general nonlinear objective function with J nonlinear inequality constraints


is written as

n
minimize f(x) x∈R (32)
x

subject to gj ( x ) ≤ 0 j = 1 ( 1 )J (33)

A penalty function can be constructed as

J
2
P ( x ;R ) = f( x ) + ∑ R i [ g i ( x ) ] u(g i) (34)
i=1

where u(g i) is the step-function defined by

⎧0 if g i(x) ≤ 0
u ( gi ) = ⎨ (35)
⎩1 if g i(x) > 0

and the penalty parameter is chosen as a positive number, R i > 0 .

2
• The term [ g i ( x ) ] u ( g i ) is sometimes called the bracket operator and is denoted

〈 g i ( x )〉

• The step-function is used to ignore the constraint when it is satisfied by the


decision variables and to treat it as a penalty term when it is not satisfied.
• When this type of penalty term is used, the method is referred to as an exterior
penalty method since points outside the feasible region are allowed, but are
penalized.

• As the penalty parameter increases, the feasibility region is “pushed in”.

constraint constraint
satisfied violated J
2
Ω = ∑ Ri [ gi ( x ) ] u ( gi )
Ri i=1

1 gi

Exterior penalty method.

Inverse penalty term

• An alternate method which is commonly used is the inverse penaly method.

• If we have a nonlinear optimization problem written as


n
minimize f(x) x∈R (36)
x

subject to gj ( x ) ≥ 0 j = 1 ( 1 )J (37)

then we can construct a penalty function as


J
1
P ( x ;R ) = f( x ) + R ∑ g-----------
i ( x
-
)
i=1
where only one penalty parameter is used.
• The method is easy to visualize if we consider the case of only one constraint,
J = 1 . Then the penalty term is simply

R
Ω = -----------
g(x)
where g(x) is the single constraint.

R
Ω = -----------
g(x)

-1 1 g(x)
-R

Inverse penalty term.

• As can be deduced from the figure, it is important that only feasible points be
started with; because of this, this method is classified as an interior method.

• With exterior penalties the parameter R is steadily increased with R → ∞ in the


limit so as to exclude infeasible points.

• With interior penalties, the parameter R is steadily decreased with R → 0 in the


limit. Otherwise, you may artificially exclude a minimum located on the
boundary.
Example:
In the following problem, solve it using both an interior and exterior method.

2 2
minimize f(x) = ( x 1 – 4 ) + ( x 2 – 4 )
x

subject to: g ( x ) = 5 – x1 – x2 ≥ 0

solve using the “bracket operator”

2
P ( x ;R ) = f(x) + R [ g ( x ) ] u ( – g(x) )
we have

2 2 2
P ( x ;R ) = ( x 1 – 4 ) + ( x 2 – 4 ) + R ( 5 – x 1 – x 2 ) u ( – g(x) )

• Thus, when g(x) < 0 , i.e. the decision variables are infeasible, then a penalty of
2
R ( 5 – x 1 – x 2 ) is applied.

• Proceeding analytically to find the necessary conditions for a minimum, we have

∂P
= 2 ( x *1 – 4 ) + ( 2R ) ( 5 – x *1 – x *2 ) ( – 1 ) = 0
∂ x1

∂P
= 2 ( x *2 – 4 ) + ( 2R ) ( 5 – x *1 – x *2 ) ( – 1 ) = 0
∂ x2

subtracting these two equations

2 ( x *1 – 4 ) – 2 ( x *2 – 4 ) = 0 ⇒ x *1 = x *2 .

From the first above, we get

( x 1 – 4 ) – R ( 5 – 2x 1 ) = 0

and therefore
5R + 4
x 1 = ----------------
2R + 1

• Increasing the penalty parameter to ∞, we have

5
lim x 1 = ---
R→∞ 2
and the constrained minimum is:

5 5
x * = ⎛ ---, ---⎞
⎝ 2 2⎠

Since the constraint, g ( x * ) = 0 this implies that the constaint is tight.

The unconstrained minimum is at x = ( 4, 4 ) .

• Now solve using the inverse penalty:

–1
P ( x ;R ) = f( x ) + R [ g ( x ) ]
we have
2 2 –1
P ( x ;R ) = ( x 1 – 4 ) + ( x 2 – 4 ) + R ( 5 – x 1 – x 2 )

Whether or not g(x) < 0 , i.e. whether or not the decision variables are infeasible,
–1
a penalty of R ( 5 – x 1 – x 2 ) is applied.

• We must make sure that we remain feasible during the execution of any
algorithm we may employ.
• Proceeding analytically to find the necessary conditions for a minimum, we have

∂P R
= 2 ( x *1 – 4 ) + --------------------------------- = 0
∂ x1 2
( 5 – x *1 – x *2 )

∂P R
= 2 ( x *2 – 4 ) + --------------------------------- = 0
∂ x2 2
( 5 – x *1 – x *2 )

subtracting these two equations, we again get x *1 = x *2 and we also have

3 2 R
4 ( x *1 ) – 36 ( x *1 ) + 105x *1 – 100 + --- = 0
2

• This equation can be solved, for its roots, and the minimum of P ( x ;R ) for
particular values of R can be found.

Minimum for different values of R

R x *1, x *2 f (x* )
100 0.5864 23.3053
10 1.7540 10.0890
1 2.2340 6.32375
0.1 2.4113 5.0479
0.01 2.4714 4.6732
0.001 2.4909 4.5548
0 2.5000 4.5000

• For each value of the penalty parameter, an unconstrained optimization problem


must be solved.

You might also like