0% found this document useful (0 votes)
13 views

OPTIMIZATION Lecture

This document discusses optimization techniques including both single variable and multivariable optimization. It provides historical context on the development of optimization methods and defines key concepts such as objective functions, variables, constraints, and necessary and sufficient conditions for optimality. Examples of optimization problems and different problem classes are also presented.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

OPTIMIZATION Lecture

This document discusses optimization techniques including both single variable and multivariable optimization. It provides historical context on the development of optimization methods and defines key concepts such as objective functions, variables, constraints, and necessary and sufficient conditions for optimality. Examples of optimization problems and different problem classes are also presented.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Optimization

1. Introduction

Optimization is the act of obtaining the best result under given circumstances.

Optimization can be defined as the process of finding the conditions that give
the maximum or minimum of a function.

The optimum seeking methods are also known as mathematical programming


techniques and are generally studied as a part of operations research.

Operations research is a branch of mathematics concerned with the


application of scientific methods and techniques to decision making problems
and with establishing the best or optimal solutions.
1. Introduction

Historical development

• Isaac Newton (1642-1727)


(The development of differential calculus
methods of optimization)

• Joseph-Louis Lagrange (1736-1813)


(Calculus of variations, minimization of functionals,
method of optimization for constrained problems)

• Augustin-Louis Cauchy (1789-1857)


(Solution by direct substitution, steepest
descent method for unconstrained optimization)
3
1. Introduction

Historical development

• Leonhard Euler (1707-1783)


(Calculus of variations, minimization of
functionals)

• Gottfried Leibnitz (1646-1716)


(Differential calculus methods
of optimization)
4
1. Introduction

Historical development

• George Bernard Dantzig (1914-2005)


(Linear programming and Simplex method (1947))

• Richard Bellman (1920-1984)


(Principle of optimality in dynamic
programming problems)

• Harold William Kuhn (1925-)


(Necessary and sufficient conditions for the optimal solution of
5
programming problems, game theory)
1. Introduction

Historical development

• Albert William Tucker (1905-1995)


(Necessary and sufficient conditions
for the optimal solution of programming
problems, nonlinear programming, game
theory: his PhD student
was John Nash)

• Von Neumann (1903-1957)


(game theory)
6
1. Introduction

• Objective function
• Variables
• Constraints

Find values of the variables


that minimize or maximize the objective function
while satisfying the constraints

7
1. Introduction

• Mathematical optimization problem:


minimize f 0 ( x)
subject to g i ( x)  bi , i  1,...., m

• f0 : Rn R: objective function
• x=(x1,…..,xn): design variables (unknowns of the problem,
they must be linearly independent)
• gi : Rn R: (i=1,…,m): inequality constraints
• The problem is a constrained optimization problem

8
1. Introduction
• If a point x* corresponds to the minimum value of the function f (x), the
same point also corresponds to the maximum value of the negative of
the function, -f (x). Thus optimization can be taken to mean
minimization since the maximum of a function can be found by seeking
the minimum of the negative of the same function.

9
Examples

10
Example

Transportation Problem - LP Formulation

Minimum Costs
Supply locations Demand locations
(Origins) (Destinations)
Example
Different Kinds of Optimization

Figure from: Optimization Technology Center


http://www-fp.mcs.anl.gov/otc/Guide/OptWeb/
Solving Optimization Problems

General optimization problem


• Very difficult to solve
• Methods involve some compromise, e.g., very long computation time, or not
always finding the solution (which may not matter in practice)
Exceptions: certain problem classes can be solved efficiently and reliably
• Linear programing problems
• Least-squares problems
• Convex optimization problems
Linear programing
Least-squares problems
Convex optimization
2. Classicial optimization techniques
Single variable optimization

• A function of one variable f (x) has a


relative or local minimum at x = x* if Local
minimum
f (x*) ≤ f (x*+h) for all sufficiently small
positive and negative values of h
Global minima

Local minima
• A point x* is called a relative or local
maximum if f (x*) ≥ f (x*+h) for all values
of h sufficiently close to zero.

23
2. Classicial optimization techniques
Single variable optimization
• A function f (x) is said to have a global or absolute minimum at x* if f (x*) ≤ f (x)
for all x, and not just for all x close to x*, in the domain over which f (x) is defined.

• Similarly, a point x* will be a global maximum of f (x) if f (x*) ≥ f (x) for all x in
the domain.

24
Necessary condition

• If a function f (x) is defined in the interval a ≤ x ≤ b and


has a relative minimum at x = x*, where a < x* < b, and
if the derivative df (x) / dx = f’(x) exists as a finite
number at x = x*, then f’ (x*)=0

• The theorem does not say that the function necessarily


will have a minimum or maximum at every point where
the derivative is zero. e.g. f’ (x)=0 at x= 0 for the
function shown in figure. However, this point is neither a
minimum nor a maximum. In general, a point x* at which
f’(x*)=0 is called a stationary point.

25
Necessary condition
FIGURE 2.2 SAYFA 67
• The theorem does not say what happens if a
minimum or a maximum occurs at a point x* where
the derivative fails to exist. For example, in the
figure

f ( x *  h)  f ( x*)
lim
h0
 m  (positive) or m - (negative)
h

depending on whether h approaches zero through


positive or negative values, respectively. Unless the

numbers m or m are equal, the derivative f’ (x*)
does not exist.
If f’ (x*) does not exist, the theorem is not applicable.

26
Sufficient condition

• Let f’(x*)=f’’(x*)=…=f (n-1)(x*)=0, but f(n)(x*) ≠ 0. Then f(x*)


is
• A minimum value of f (x) if f (n)(x*) > 0 and n is even
• A maximum value of f (x) if f (n)(x*) < 0 and n is even
• Neither a minimum nor a maximum if n is odd

27
Example
Determine the maximum and minimum values of the function:
f ( x)  12 x 5  45 x 4  40 x 3  5

Solution: Since f’(x)=60(x4-3x3+2x2)=60x2(x-1)(x-2), At x=0, f’’(x)=0 and hence we must investigate the
f’(x)=0 at x=0,x=1, and x=2. next derivative.

f ( x)  60 (12 x 2  18 x  4)  240 at x  0


The second derivative is: f’’(x)=60(4x3-9x2+4x)
Since f ( x)  0 at x=0, x=0 is neither a maximum
At x=1, f’’(x)=-60 and hence x=1 is a relative maximum. nor a minimum, and it is an inflection point.
Therefore,
fmax= f (x=1) = 12

At x=2, f’’(x)=240 and hence x=2 is a relative minimum.


Therefore,
fmin= f (x=2) = -11

28
2. Classicial optimization techniques
Multivariable optimization with no constraints
• Necessary condition
If f(X) has an extreme point (maximum or minimum) at
X=X* and if the first partial derivatives of f (X) exist at
X*, then
f f f
( X*)  ( X*)    ( X*)  0
x1 x2 xn

• Sufficient condition
A sufficient condition for a stationary point X* to be an
extreme point is that the matrix of second partial
derivatives (Hessian matrix) of f (X*) evaluated at X* is
• Positive definite when X* is a relative minimum point
• Negative definite when X* is a relative maximum point

29
Note: Given a multivariable function f. We denote :
Where, the Hessian matrix is defined:
The gradient f : is the vector of its first partial
derivatives

And the Hessian 2 f : is the matrix of its second


partial derivatives
Positive definite matrix (Review)
Definitions:
1) An nxn symmetric real matrix A is said to be positive-definite if 𝒙𝑻 𝑨𝒙 > 𝟎 for all
non-zero x𝑅𝑛

2) An nxn symmetric real matrix A is said to be positive-semidefinite if 𝒙𝑻 𝑨𝒙𝟎 for


all non-zero x𝑅𝑛

3) An nxn symmetric real matrix A is said to be negative-definite if 𝒙𝑻 𝑨𝒙 < 𝟎 for all


non-zero x𝑅𝑛

4) An nxn symmetric real matrix A is said to be negative-semidefinite if 𝒙𝑻 𝑨𝒙 𝟎


for all non-zero x𝑅𝑛
Example 1: Example 2:

𝑥 𝑇 𝐴𝑥

Then, 𝒙𝑻 𝑨𝒙𝟎 if x 0 and A is positive semi-definite. However,


whenever 𝑥1 ≠ 0 or 𝑥2 ≠ 0 (hence x≠ 0 ), it is not positive definite because there exist non-zero vectors,
for example the vector
the matrix A is positive definite.
Positive definite matrix (Review)
Definitions: Theory:
1) An nxn symmetric real matrix A is said to be positive- Let A be nxn symmetric real matrix A. All eigenvalues
definite if 𝒙𝑻 𝑨𝒙 > 𝟎 for all non-zero x𝑅𝑛 of are real.
2) An nxn symmetric real matrix A is said to be positive-
semidefinite if 𝒙𝑻 𝑨𝒙𝟎 for all non-zero x𝑅𝑛 1) A is positive definite if and only if all of its eigenvalues are
positive
3) An nxn symmetric real matrix A is said to be negative- 2) A is positive semi-definite if and only if all of its
definite if 𝒙𝑻 𝑨𝒙 < 𝟎 for all non-zero x𝑅 𝑛 eigenvalues are non-negative.
3) A is negative definite if and only if all of its eigenvalues are
4) An nxn symmetric real matrix A is said to be negative- negative
definite if 𝒙𝑻 𝑨𝒙 𝟎 for all non-zero x𝑅𝑛 4) A is negative semi-definite if and only if all of its
eigenvalues are non-positive.
5) A is indefinite if and only if it has both positive and negative
eigenvalues.
Review of mathematics
Positive definiteness
• Test 1: A matrix A will be positive definite if all its
eigenvalues are positive; that is, all the values of  that satisfy
the determinental equation

A  I  0
should be positive. Similarly, the matrix A will be negative
definite if its eigenvalues are negative.

35
Review of mathematics

Negative definiteness

• Equivalently, a matrix is negative-definite if all its


eigenvalues are negative

• It is positive-semidefinite if all its eigenvalues are all


greater than or equal to zero

• It is negative-semidefinite if all its eigenvalues are all


less than or equal to zero

36
Example 4:
Example 3:

The matrix is positive-definite. The matrix is Not positive-definite.


Review of mathematics
Positive definiteness
• Test 2: Another test that can be used to find the positive definiteness of a
matrix A of order n involves evaluation of the determinants
A  a11
a11 a12 a13  a1n
a11 a12 a23  a2 n
A2  a21 a22
a21 a22 A3  a31 a32 a33  a3n
a11 a12 a13 
A3  a21 a22 a23 an1 an 2 an 3  ann
a31 a32 a33

• The matrix A will be positive definite if and only if all the values A1, A2,
A3,An are positive
• The matrix A will be negative definite if and only if the sign of Aj is (-1)j
for j=1,2,,n
• If some of the Aj are positive and the remaining Aj are zero, the matrix A
will be positive semidefinite
38
Example 5:

A???
Example 6:

Find the extreme point of f


Example 7:

Find the extreme point of f


Convex and concave functions
• Convex and concave functions in one variable

44
Concave function
• Convex and concave functions in two variables

45
First and second order characterizations of convex functions

https://www.princeton.edu/~aaa/Public/Teaching/ORF523/S16/ORF523_S16_Lec7_gh.pdf
Convex functions
• A function f(x) is convex if for any two points x and y, we have
f(y)  f(x)  f T (x)(y  x)

• A function f(X) is convex if the Hessian matrix


H(X)   2 f ( X) / xi x j 
is positive semidefinite.

• Any local minimum of a convex function f(X) is a global minimum


48
50
Theorem 2. Consider an optimization problem
Example Determine whether the following function is convex or concave.
f(x1,x2) = 2x13-6x22

Solution:

 2 f 2 f 
 x 2 x1x2  12 x1 0 
H ( X)   12 

  f  f   0
2
 12 
 
 x1x2 x22 
Here

2 f
 12 x1  0 for x1  0
x12
 0 for x1  0

H2  144 x1  0 for x1  0
 0 for x1  0
Hence H(X) will be negative semidefinite and f(X) is concave for x1 ≤ 0 52
Example
1) Determine whether the following function is convex or concave.
2) Find the global minimum/maximum of the given f

f ( x1 , x2 , x3 )  4 x12  3x22  5x32  6 x1 x2  x1 x3  3x1  2 x2  15

53
Quadratic functions:

Then f is convex if and only if A is positive semidefinite, independently of b, c.

Example f ( x1 , x2 , x3 )  3 x12  3 x22  4 x32  4 x1 x2  2 x1 x3  2 x2 x3  x1  2 x2  3x3

1) Determine whether the following function is convex or concave.


2) Find the global minimum/maximum of the given f
3 2 1 −1
Solution:
f(𝑥1 , 𝑥2 , 𝑥3 )=𝑥 𝑇 𝐴𝑥 + 𝑏 𝑇 𝑥 A= 2 3 1 ; b= −2 ;
1 1 4 −3

then:  f(𝑥1 , 𝑥2 , 𝑥3 )=2Ax+b, 2 f(𝑥1 , 𝑥2 , 𝑥3 )=2A.

 f(𝑥1 , 𝑥2 , 𝑥3 )=(6𝑥1 +4𝑥2 + 2𝑥3 − 1, 4𝑥1 +6𝑥2 + 2𝑥3 − 2, 2𝑥1 +2𝑥2 + 8𝑥3 -3)
3− 2 1
• |A − 𝐼|= 2 3− 1 =( − 1)(−2 +9-18)
1 1 4−
|A − 𝐼|=0 ( − 1)(−2 +9-18)=0 =1; =3; =6
Becase of all the eigen-values are positive, therefore A is possive definite.
 f(X) is strictly convex function  f has a unique global minimum,
 f(𝑥1 , 𝑥2 , 𝑥3 )=2Ax+b
 f(𝑥1 , 𝑥2 , 𝑥3 )=(6𝑥1 +4𝑥2 + 2𝑥3 − 1, 4𝑥1 +6𝑥2 + 2𝑥3 − 2, 2𝑥1 +2𝑥2 + 8𝑥3 -3)

The global minimum point is the solution of equation system :


1 1 1
 f(𝑥1 , 𝑥2 , 𝑥3 )=0 𝑥1 = − 6, 𝑥2 = 3, 𝑥3 =3

1 1 1
Min f = f(− 6 , 3, 3)
Optimization in 1-D

• Look for analogies to bracketing in root-finding


• What does it mean to bracket a minimum?

(xleft, f(xleft))

(xright, f(xright))

xleft < xmid < xright


(xmid, f(xmid)) f(xmid) < f(xleft)
f(xmid) < f(xright)
Optimization in 1-D

• Once we have these properties, there is at


least one local minimum between xleft and xright
• Establishing bracket initially:
– Given xinitial, increment
– Evaluate f(xinitial), f(xinitial+increment)
– If decreasing, step until find an increase
– Else, step in opposite direction until find an increase
– Grow increment at each step
• For maximization: substitute –f for f
Optimization in 1-D

• Strategy: evaluate function at some xnew

(xleft, f(xleft))

(xright, f(xright))

(xnew, f(xnew))
(xmid, f(xmid))
Optimization in 1-D

• Strategy: evaluate function at some xnew


– Here, new “bracket” points are xnew, xmid, xright

(xleft, f(xleft))

(xright, f(xright))

(xnew, f(xnew))
(xmid, f(xmid))
Optimization in 1-D

• Strategy: evaluate function at some xnew


– Here, new “bracket” points are xleft, xnew, xmid

(xleft, f(xleft))

(xright, f(xright))

(xnew, f(xnew)) (xmid, f(xmid))


Optimization in 1-D

• Unlike with root-finding, can’t always guarantee that interval


will be reduced by a factor of 2
• Let’s find the optimal place for xmid, relative to left and right,
that will guarantee same factor of reduction regardless of
outcome
Newton’s Method
Newton’s Method
Newton’s Method
Newton’s Method
Newton’s Method

• At each step:

f ( xk )
xk 1  xk 
f ( xk )

• Requires 1st and 2nd derivatives


• Quadratic convergence
Multi-Dimensional Optimization

• Important in many areas


– Fitting a model to measured data
– Finding best design in some parameter space
• Hard in general
– Weird shapes: multiple extrema, saddles,
curved or elongated valleys, etc.
– Can’t bracket
• In general, easier than rootfinding
– Can always walk “downhill”
Newton’s Method in
Multiple Dimensions

• Replace 1st derivative with gradient,


2nd derivative with Hessian
f ( x, y )
 fx 
f   f 
 
 y 
 2
  f 
2 2
f

H  2 f
x xy 
 2 f 
 xy y 2 
Newton’s Method in
Multiple Dimensions

• Replace 1st derivative with gradient,


2nd derivative with Hessian
• So,
  1  
xk 1  xk  H ( xk ) f ( xk )

• Tends to be extremely fragile unless function very smooth and


starting close to minimum
Important classification of methods

• Use function + gradient + Hessian (Newton)


• Use function + gradient (most descent methods)
• Use function values only (Nelder-Mead, called
also “simplex”, or “amoeba” method)
Steepest Descent Methods

• What if you can’t / don’t want to


use 2nd derivative?
• “Quasi-Newton” methods estimate Hessian
• Alternative: walk along (negative of) gradient…
– Perform 1-D minimization along line passing through current point in
the direction of the gradient
– Once done, re-compute gradient, iterate
Problem With Steepest Descent
Problem With Steepest Descent
Conjugate Gradient Methods

• Idea: avoid “undoing”


minimization that’s already been
done
• Walk along direction
d k 1   g k 1   k d k
• Polak and Ribiere formula:

g k 1 ( g k 1  g k )
T

k  T
g gk
k
Conjugate Gradient Methods

• Conjugate gradient implicitly obtains information about


Hessian
• For quadratic function in n dimensions, gets exact solution in
n steps (ignoring roundoff error)
• Works well in practice…
Value-Only Methods in Multi-Dimensions

• If can’t evaluate gradients, life is hard


• Can use approximate (numerically evaluated) gradients:

f ( x  e1 )  f ( x )
 ef   
 1   
 ef   f ( x  e2 )  f ( x )


f ( x)   f2    f ( x  e3 )  f ( x )

 e3    
   
     
Generic Optimization Strategies

• Uniform sampling:
– Cost rises exponentially with # of dimensions
• Simulated annealing:
– Search in random directions
– Start with large steps, gradually decrease
– “Annealing schedule” – how fast to cool?
Downhill Simplex Method (Nelder-Mead)

• Keep track of n+1 points in n dimensions


– Vertices of a simplex (triangle in 2D
tetrahedron in 3D, etc.)

• At each iteration: simplex can move,


expand, or contract
– Sometimes known as amoeba method:
simplex “oozes” along the function
Downhill Simplex Method (Nelder-Mead)

• Basic operation: reflection

location probed by
reflection step

worst point
(highest function value)
Downhill Simplex Method (Nelder-Mead)

• If reflection resulted in best (lowest) value so far,


try an expansion

location probed by
expansion step

• Else, if reflection helped at all, keep it


Downhill Simplex Method (Nelder-Mead)

• If reflection didn’t help (reflected point still


worst) try a contraction

location probed by
contration step
Downhill Simplex Method (Nelder-Mead)

• If all else fails shrink the simplex around


the best point
Downhill Simplex Method (Nelder-Mead)

• Method fairly efficient at each iteration


(typically 1-2 function evaluations)
• Can take lots of iterations
• Somewhat flakey – sometimes needs restart after simplex
collapses on itself, etc.
• Benefits: simple to implement, doesn’t need derivative,
doesn’t care about function smoothness, etc.

You might also like