0% found this document useful (0 votes)
1 views

optimization

The document provides an overview of optimization problems, emphasizing their significance in engineering design, data analysis, and business decisions. It discusses various types of optimization, including global vs. local optimization, convex vs. non-convex optimization, and the importance of constraints. Additionally, it outlines algorithms and methods used in optimization, highlighting the challenges and techniques for solving different optimization problems.

Uploaded by

ywongyunsum
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

optimization

The document provides an overview of optimization problems, emphasizing their significance in engineering design, data analysis, and business decisions. It discusses various types of optimization, including global vs. local optimization, convex vs. non-convex optimization, and the importance of constraints. Additionally, it outlines algorithms and methods used in optimization, highlighting the challenges and techniques for solving different optimization problems.

Uploaded by

ywongyunsum
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

A Brief Overview

of Optimization Problems

Steven G. Johnson
MIT course 18.335, Spring 2021
Why optimization?
• In some sense, all engineering design is
optimization: choosing design parameters to
improve some objective
• Much of data analysis is also optimization:
extracting some model parameters from data while
minimizing some error measure (e.g. fitting)
• Machine learning (e.g. deep neural nets) is based
on optimizing a “loss” function over many model
(NN) parameters
• Most business decisions = optimization: varying
some decision parameters to maximize profit (e.g.
investment portfolios, supply chains, etc.)
A general optimization problem
minimize an objective function f0
min' () (+) with respect to n design parameters x
$∈ℝ
(also called decision parameters, optimization variables, etc.)

— note that maximizing g(x)


corresponds to f0 (x) = –g(x)
subject to m constraints

(- + ≤ 0 note that an equality constraint


h(x) = 0
yields two inequality constraints
fi(x) = h(x) and fi+1(x) = –h(x)
(although, in practical algorithms, equality constraints
typically require special handling)
x is a feasible point if it
satisfies all the constraints
feasible region = set of all feasible x
Important considerations
• Global versus local optimization
• Convex vs. non-convex optimization
• Unconstrained or box-constrained optimization, and
other special-case constraints
• Special classes of functions (linear, etc.)
• Differentiable vs. non-differentiable functions
• Gradient-based vs. derivative-free algorithms
• …
• Zillions of different algorithms, usually restricted to
various special cases, each with strengths/weaknesses
Global vs. Local Optimization
• For general nonlinear functions, most algorithms only
guarantee a local optimum
– that is, a feasible xo such that f0(xo) ≤ f0(x) for all feasible x
within some neighborhood ||x–xo|| < R (for some small R)
• A much harder problem is to find a global optimum: the
minimum of f0 for all feasible x
– exponentially increasing difficulty with increasing n, practically
impossible to guarantee that you have found global minimum
without knowing some special property of f0
– many available algorithms, problem-dependent efficiencies
• not just genetic algorithms or simulated annealing (which are popular,
easy to implement, and thought-provoking, but usually very slow!)
• for example, non-random systematic search algorithms (e.g. DIRECT),
partially randomized searches (e.g. CRS2), repeated local searches from
different starting points (“multistart” algorithms, e.g. MLSL), …
Convex Optimization
[ good reference: Convex Optimization by Boyd and Vandenberghe,
free online at www.stanford.edu/~boyd/cvxbook ]

All the functions fi (i=0…m) are convex:


where

convex: f(x) not convex: f(x)

bf( y )
a f(x) +
f(ax+by)
x y x y
For a convex problem (convex objective & constraints)
any local optimum must be a global optimum
Þ efficient, robust solution methods available
Important Convex Problems
• LP (linear programming): the objective and
constraints are affine: fi(x) = aiTx + ai
• QP (quadratic programming): affine constraints +
convexquadratic objective xTAx+bTx
• SOCP (second-order cone program): LP + cone
constraints ||Ax+b||2 ≤ aTx + a
• SDP (semidefinite programming): constraints are that
SAkxk is positive-semidefinite
all of these have very efficient, specialized solution methods
Non-convex local optimization:
a typical generic outline
[ many, many variations in details !!! ]

1 At current x, construct approximate model of fi


—e.g. affine, quadratic, … often convex

2 Optimize the model problem ⇒ new x


— use a trust region to prevent large steps

3 Evaluate new x:
— if “acceptable,” go to 1
— if bad step (or bad model), update
trust region / model and go to 2
Important special constraints
• Simplest case is the unconstrained optimization
problem: m=0
– e.g., line-search methods like steepest-descent,
nonlinear conjugate gradients, Newton methods …
• Next-simplest are box constraints (also called
bound constraints): xkmin ≤ xk ≤ xkmax
– easily incorporated into line-search methods and many
other algorithms
– many algorithms/software only handle box constraints
• …
• Linear equality constraints Ax=b
– for example, can be explicitly eliminated from the
problem by writing x=Ny+x, where x is a solution to
Ax=b and N is a basis for the nullspace of A
Derivatives of fi
• Most-efficient algorithms typically require user to
supply the gradients Ñxfi of objective/constraints
– you should always compute these analytically
• rather than use finite-difference approximations, better to just
use a derivative-free optimization algorithm
• in principle, one can always compute Ñxfi with about the same
cost as fi, using adjoint methods, or equivalently “reverse-
mode” automatic differentiation (AD)
– gradient-based algorithms can find (local) optima of
problems with millions of design parameters!
• Derivative-free methods: only require fi values
– easier to use, can work with complicated “black-box”
functions where computing gradients is inconvenient
– may be only possibility for nondifferentiable problems
– need > n function evaluations, bad for large n
Removable non-differentiability
consider the non-differentiable unconstrained problem:

min' |)* + | f0(x)


$∈ℝ –f0(x)
optimum
equivalent to minimax problem:
x
min' max{)* + , −)* (+)}
$∈ℝ
…still nondifferentiable…

…equivalent to constrained problem with a “temporary” variable t:


e!
bl 5 ≥ )* (+)
rentia
min ' ,4∈ℝ
5 subject to:
5 ≥ −)* (+)
i.e. )7 +, 5 = )* + − 5
)9 +, 5 = −)* + − 5
diffe $∈ℝ
(also called “epigraph” reformulation)
Example: Chebyshev linear fitting
fit line
find the fit that minimizes b ax1+x2
the maximum error:
min max +, -* + +/ − 1*
$%,$' * N points
= min' 5+ − 1 6 (ai,bi)
$∈ℝ
… nondifferentiable minimax problem
a

equivalent to a linear programming problem (LP):


subject to 2N constraints:
min 8 8 ≥ +, -* + +/ − 1* equivalently:
8 ≥ +, -* + +/ − 1*
$% ,$' ,7 8 ≥ −+, -* − +/ + 1*

(also called “epigraph” reformulation)


Relaxations of Integer Programming
If x is integer-valued rather than real-valued (e.g. x Î {0,1}n),
the resulting integer programming or combinatorial optimization
problem becomes much harder in general.

However, useful results can often be obtained by a continuous


relaxation of the problem — e.g., going from x Î {0,1}n to x Î [0,1]n
… at the very least, this gives an lower bound on the optimum f0

“Penalty terms” or “projection filters” (SIMP, RAMP, etc.)


can be used to obtain x that ≈ 0 or ≈ 1 almost everywhere.

[ See e.g. Sigmund & Maute, “Topology optimization approaches,” Struct.


Multidisc. Opt. 48, pp. 1031–1055 (2013). ]
Example: Topology Optimization
design a structure to do something, made of material A or B…
let every pixel of discretized structure vary continuously from A to B
[ + tricks to impose minimum feature size and mostly “binary” A/B ]

density of each pixel


varies continuously from 0 (air) to max
force

ex: design a cantilever


optimized structure,
to support maximum weight
deformed under load
with a fixed amount of material
[ Buhl et al, Struct. Multidisc. Optim. 19, 93–104 (2000) ]
Stochastic Optimization
where ([⋯ ] is expected
min' ( )(+, -) value averaging over
random vars -, computed
$∈ℝ
by a Monte-Carlo approx.

Deep-learning example:
Fitting (“learning”) to a huge “training set”
by sampling a random subset Ξ (different for each x):
5
( )(+, -) ≈ |7| ∑9∈7 ) +, -

∇$ ) often exists, but typically can’t use standard


gradient-based algorithms because of random “noise.”

A popular algorithm: Adam [Kingma & Ba, 2014]


“stochastic gradient descent”
Some Sources of Software
• NLopt: implements many nonlinear optimization algorithms
callable from many languages (C, Python, R, Matlab, …)
(global/local, constrained/unconstrained, derivative/no-derivative)
http://github.com/stevengj/nlopt

• Python: scipy.optimize, pyOpt, …; Julia: JuMP, Optim,…

• Decision tree for optimization software:


http://plato.asu.edu/guide.html
— lists many (somewhat older) packages for many problems

• CVX: general convex-optimization package http://cvxr.com


… also Python CVXOPT, R CVXR, Julia Convex

You might also like