Powell-Tutorial-ComputationalStochasticOptimization Informs Nov152014
Powell-Tutorial-ComputationalStochasticOptimization Informs Nov152014
Warren B. Powell
Princeton University
Department of Operations Research
and Financial Engineering
Min E {cx}
Ax = b
x>0
Organize class
libraries, and set up
communications and
databases
Mathematician
Software
The biggest challenge when making decisions
under uncertainty is modeling.
Deterministic modeling
For deterministic problems, we speak the language
of mathematical programming
» For static problems
min cx Arguably Dantzig’s biggest
Ax b contribution, more so than
x0 the simplex algorithm, was
» For time-staged problems his articulation of
T
optimization problems in a
min ct xt standard format, which has
t 0
given algorithmic
At xt Bt 1 xt 1 bt researchers a common
Dt xt ut language.
xt 0
Stochastic programming
Robust optimization
Approximate dynamic programming
Model predictive control
Optimal control
Online learning
Reinforcement learning
min x0 X 0 c0 x0 E Q( x0 , 1 )
where
Q( x0 , 1 ( )) min x1 ( )X1 ( ) c1 ( ) x1 ( )
» This is the canonical form of stochastic programming,
which might also be written over multiple periods:
T
min c0 x0 p ( ) ct ( ) xt ( )
t 1
Modeling as a stochastic program
An alternative strategy is to use the vocabulary of
“stochastic programming.”
min xt X t ct xt E Q ( xt , t 1 )
where
Q( xt , t 1 ( )) min xt 1 ( )X t 1 ( ) ct 1 ( ) xt 1 ( )
» This is the canonical form of stochastic programming,
which might also be written over multiple periods:
tH
min ct xt
p (t ) ctt ' ( ) xtt ' ( )
t t t 't 1
Modeling using control theory
From “Optimal Control” by
Lewis, Vrabie and Syrmos
• State variables
• Decision variables
• Exogenous information
• Transition function
• Objective function
Control theory
ut Low-dimensional continuous vector
Operations research
xt Usually a discrete or continuous but high-dimensional
vector of decisions.
At this point, we do not specify how to make a decision.
Instead, we define the function X ( s) (or A ( s ) or U ( s )),
where specifies the type of policy. " " carries information
about the type of function f , and any tunable parameters f .
Modeling dynamic problems
Exogenous information:
Wt New information that first became known at time t
= Rˆt , Dˆ t , pˆ t , Eˆ t
Rˆt Equipment failures, delays, new arrivals
New drivers being hired to the network
Dˆ t New customer demands
pˆ t Changes in prices
Eˆ t Information about the environment (temperature, ...)
Note: Any variable indexed by t is known at time t. This
convention, which is not standard in control theory,
dramatically simplifies the modeling of information.
Below, we will let represent a sequence of actual observations W1 , W2 ,.... Wt
refers to a sample realization of the random variable Wt .
Modeling dynamic problems
The transition function
St 1 S M ( St , xt , Wt 1 )
R R x Rˆ Inventories
t 1
pt 1 pt
t
t t 1
pˆ t 1 Spot prices
Dt 1 Dt Dˆ
t 1 Market demands
Also known as the:
“System model” “Transfer function”
“State transition model”
“Plant model”
“Transformation function”
“Law of motion”
“Plant equation” “Model”
“Transition law”
T
min E C St , X t ( St )
t
t 0
Expectation over all
Cost function
random outcomes
State variable Decision function (policy)
Finding the best policy
Given a system model (transition function)
St 1 S M St , xt ,Wt 1 ( )
min x E F ( x, W )
» Risk measures
min x E F ( x, W ) E F ( x, W ) f
2
min x max w F ( x, w)
t 0
» Decision variables: » Policy
x0 ,..., xT X :S X
» Constraints: » Constraints at time t
• at time t
At xt Rt xt X t ( St ) Xt
X
xt 0 t
• Transition function » Transition function
Rt 1 bt 1 Bt xt St 1 S M St , xt ,Wt 1
» Exogenous information
(W1 , W2 ,..., WT )
Stochastic optimization models
With deterministic problems, we want to find the
best decision:
T
min x0 ,..., xT c x
t 0
t t
t 0
where xt is F t measurable.
A model without an algorithm is like cloud to cloud
lightning…
t 0
Instead, we might do one long simulation…
T
min Fˆ t C St ( ), X t ( St ( ))
t 0
…or we might average across several simulations:
N T
1
min F
N
t
t
C S (
n 1 t 0
n
), X
t ( S t ( n
))
Modeling
There are two ways to compute the objective:
T
min E t C St , X ( St )
t 0
» Offline learning » Online learning
• We use computer • We observe the
simulation to compute performance of a policy
T in the field.
Fˆ t C St ( ), X t ( St ( ))
t 0
» Parametric models
• Linear models (“basis functions”)
• Nonlinear models
» Nonparametric models
• Kernel regression
• Nearest neighbor clustering
• Local polynomial regression
Policies
(Slightly arrogant) claim:
T
min E C St , X t ( St )
t
t 0
St 1 S M St , xt ,Wt 1 ( )
17.4 15.9
2 5 8
12.6 5.7
8.1 12.7 9.6
8.4 15.9 8.9 7.3
1 3 6 9 11
9.2 2.3
3.6 13.5 4.5
16.5 20.2
4 7 10
St (?N t ) 6
The state variable
Illustrating state variables
» A stochastic graph
2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
St ?
The state variable
Illustrating state variables
» A stochastic graph
2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
St ? N t , ct , Nt , j 6, 12.7,8.9,13.5
j
The state variable
Illustrating state variables
» A stochastic graph with left turn penalties
2 5 8
12.6
8.1 12.7(.7)
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
St ?N t , ct , Nt , j
j
, N t 1 6, 12, 7,8.9,13.5 ,3
Rt It
The state variable
Illustrating state variables
» A stochastic graph with generalized learning
2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
St ?
The state variable
Illustrating state variables
» A stochastic graph with generalized learning
2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5
4 7 10
SStt ?N t , ct , Nt , j
j
,
Rt It Kt
The state variable
A proposed definition of a state variable:
» The state variable is the minimally dimensioned
function of history that is necessary and sufficient to
calculate the decision function, the cost/reward
function, and the transition function, from time t
onward.
120.00
100.00
80.00
60.00
Discharge
40.00
Charge 20.00
0.00
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70
Charge
Discharge
Policy function approximations
Policy function approximations
A number of fields work on this problem under
different names:
» Stochastic search
» Stochastic programming (“two stage”)
» Simulation optimization
» Black box optimization
» Global optimization
» Control theory (“open loop control”)
» Sequential design of experiments
» Bandit problems (for on-line learning)
» Ranking and selection (for off-line learning)
» Optimal learning
Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
Schneider National
Slide 62
Cost function approximations
Drivers Demands
Cost function approximations
t t+1 t+2
The assignment of drivers to loads evolves over time, with new loads
being called in, along with updates to the status of a driver.
Cost function approximations
A purely myopic policy would solve this problem
using
» Need to accommodate
different sources of
uncertainty.
• Market behavior
• Transit times
• Supplier uncertainty
• Product quality
Robust cost function approximations
Imagine that we want to purchase parts from
different suppliers. Let xtp be the amount of
product we purchase at time t from supplier p to
meet forecasted demand Dt . We would solve
X t ( St ) arg min xt c
pP
x
p tp
subject to
x tp Dt
pP
xtp u p Xt
xtp 0
» This assumes our demand forecast D is accurate.
t
Robust cost function approximations
Imagine that we want to purchase parts from
different suppliers. Let xtp be the amount of
product we purchase at time t from supplier p to
meet forecasted demand Dt . We would solve
X t ( St | ) arg min xt c
pP
x
p tp
subject to
Buffer inventory
x tp Dt
pP
xtp u p Xt ( )
xtp 0
» This is a “parametric cost function approximation”
Cost function approximations
A general way of creating CFAs:
» Define our policy:
X t ( ) arg min x C ( St , xt ) f f ( St , xt )
f F
Sometimes mistaken as a value function
approximation, it is really a cost correction term.
t 0
Cost function approximations
An even more general CFA model:
» Define our policy:
Ax b ( ) Parametrically
modified constraints
Atlanta
Horsepower Locomotives
Baltimore
4400
4400
6000
4400 Charlotte
5700
4600
6200
The locomotive assignment problem
Train reward
Locomotive
Horsepower Locomotives buckets
4400
4400
6000
4400
5700
4600
6200
The value of locomotives
in the future
The locomotive assignment problem
4400
4400
6000
4400
5700
4600
6200
Vt ( Rt )
Rt
Value function approximations
Objective function
1900000
1800000
1700000
Objective function
1600000
1500000
1400000
1300000
1200000
0 100 200 300 400 500 600 700 800 900 1000
Iterations
Model calibration
Deterministic vs. stochastic training
Stochastic
training
Value of locomotives at a yard
Deterministic
training
Number of locomotives
Laboratory testing
Train delay with uncertain transit times
» Stochastic training produces lower delays
Minimization that we
cannot compute
Expectation that we
cannot compute
Lookahead policies
The ultimate lookahead policy is optimal
T
X ( St ) arg min C ( St , xt ) min E C ( St ' , X t' ( St ' )) | St , xt
*
t
xt t 't 1
T
X ( St ) arg min C ( St , xt ) min E C ( St ' , X t' ( St ' )) | St , xt
*
t
xt t 't 1
Scenario trees
Lookahead policies
Assume the base model has T time periods
T
Lookahead policies
But we solve a smaller lookahead model (from t to t+H)
0 0+H
Lookahead policies
Following a lookahead policy
1 1+H
Lookahead policies
… which rolls forward in time.
2 2+H
Lookahead policies
… which rolls forward in time.
3 3+H
Lookahead policies
… which rolls forward in time.
t t+H
Lookahead policies
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model
. . . .
t t 1 t2 t 3
The real process
Lookahead policies
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model
. . . .
t t 1 t2 t 3
The real process
Lookahead policies
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model
. . . .
t t 1 t2 t 3
The real process
Lookahead policies
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model
. . . .
t t 1 t2 t 3
The real process
Modeling stochastic wind
Actual vs. forecasted energy from wind
. . . .
t t 1 t2 t 3
The base model
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model
. . . .
t t 1 t2 t 3
The base model
Stochastic lookahead policies
Stochastic lookahead
2) See wind:
3) Schedule turbines
Stochastic lookahead policies
Some common misconceptions about stochastic
programming (for sequential problems):
x 2
1 2
f ( x) e
2
Approximating a function
Parametric vs. nonparametric
Observations
Nonparametric fit
Total revenue
Parametric fit
True function
t 0
Stochastic lookahead policies
Notes:
» It is nice to talk about simulating a stochastic lookahead model
using a multistage model, but multistage models are almost
impossible to solve (we are not aware of any testing of multistage
stochastic unit commitment).
» Even two-stage approximations of lookahead models are quite
difficult for many applications, so simulating these policies remain
quite difficult, and researchers typically do not even develop a
simulator to test their policies.
» In our experience, simulations of stochastic lookahead models
tend to consist of sampling scenarios from the lookahead model.
They should be tested on a full base model.
» In a real application such as stochastic unit commitment, a number
of approximations are required in the lookahead model that should
not be present in the base model.
Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
An energy storage problem
Consider a basic energy storage problem:
Et 1 Et Eˆ t 1
pt 1 0 pt 1 pt 1 2 pt 2 tp1
D D Dˆ
t 1 t t 1
f t L Provided exogenously
Rtbattery
1 Rt
battery
xt
An energy storage problem
Objective function
L
E B
C ( St , xt ) pt xtGB xtGL
T
min E C ( St , X t ( St ))
t 0
An energy storage problem
State variables
» Cost function St Rt , Et , Lt , ( pt , pt 1 , pt 2 ), f t L
pt Price of electricity
» Decision function
Constraints:
Lookahead policy
Policy function
approximation
Stochastic programming
Robust optimization
Approximate dynamic programming
Model predictive control
Optimal control
Online learning
Reinforcement learning
Markov decision processes
Thank you!
For a related tutorial, go to:
http://www.castlelab.princeton.edu
http://www.castlelab.princeton.edu/jungle.htm