optimization
optimization
of Optimization Problems
Steven G. Johnson
MIT course 18.335, Spring 2021
Why optimization?
• In some sense, all engineering design is
optimization: choosing design parameters to
improve some objective
• Much of data analysis is also optimization:
extracting some model parameters from data while
minimizing some error measure (e.g. fitting)
• Machine learning (e.g. deep neural nets) is based
on optimizing a “loss” function over many model
(NN) parameters
• Most business decisions = optimization: varying
some decision parameters to maximize profit (e.g.
investment portfolios, supply chains, etc.)
A general optimization problem
minimize an objective function f0
min' () (+) with respect to n design parameters x
$∈ℝ
(also called decision parameters, optimization variables, etc.)
bf( y )
a f(x) +
f(ax+by)
x y x y
For a convex problem (convex objective & constraints)
any local optimum must be a global optimum
Þ efficient, robust solution methods available
Important Convex Problems
• LP (linear programming): the objective and
constraints are affine: fi(x) = aiTx + ai
• QP (quadratic programming): affine constraints +
convexquadratic objective xTAx+bTx
• SOCP (second-order cone program): LP + cone
constraints ||Ax+b||2 ≤ aTx + a
• SDP (semidefinite programming): constraints are that
SAkxk is positive-semidefinite
all of these have very efficient, specialized solution methods
Non-convex local optimization:
a typical generic outline
[ many, many variations in details !!! ]
3 Evaluate new x:
— if “acceptable,” go to 1
— if bad step (or bad model), update
trust region / model and go to 2
Important special constraints
• Simplest case is the unconstrained optimization
problem: m=0
– e.g., line-search methods like steepest-descent,
nonlinear conjugate gradients, Newton methods …
• Next-simplest are box constraints (also called
bound constraints): xkmin ≤ xk ≤ xkmax
– easily incorporated into line-search methods and many
other algorithms
– many algorithms/software only handle box constraints
• …
• Linear equality constraints Ax=b
– for example, can be explicitly eliminated from the
problem by writing x=Ny+x, where x is a solution to
Ax=b and N is a basis for the nullspace of A
Derivatives of fi
• Most-efficient algorithms typically require user to
supply the gradients Ñxfi of objective/constraints
– you should always compute these analytically
• rather than use finite-difference approximations, better to just
use a derivative-free optimization algorithm
• in principle, one can always compute Ñxfi with about the same
cost as fi, using adjoint methods, or equivalently “reverse-
mode” automatic differentiation (AD)
– gradient-based algorithms can find (local) optima of
problems with millions of design parameters!
• Derivative-free methods: only require fi values
– easier to use, can work with complicated “black-box”
functions where computing gradients is inconvenient
– may be only possibility for nondifferentiable problems
– need > n function evaluations, bad for large n
Removable non-differentiability
consider the non-differentiable unconstrained problem:
Deep-learning example:
Fitting (“learning”) to a huge “training set”
by sampling a random subset Ξ (different for each x):
5
( )(+, -) ≈ |7| ∑9∈7 ) +, -