0% found this document useful (0 votes)

6 views

Powell-Tutorial-ComputationalStochasticOptimization Informs Nov152014

Uploaded by

fullyfaltoo1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Powell-Tutorial-ComputationalStochasticOptimization Informs Nov152014

Uploaded by

fullyfaltoo1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 142

Clearing the Jungle of Stochastic Optimization

Informs Annual Meeting

San Francisco
November 10, 2014

Warren B. Powell
Princeton University
Department of Operations Research
and Financial Engineering

© 2014 Warren B. Powell, Princeton University

Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
Schneider National
Optimizing energy storage
Take advantage of price variations
Applications
All of these problems are examples of sequential decision
problems.
We are going to argue that we do a very good job of
modeling deterministic optimization problems, and
static/two-stage stochastic optimization problems.

… but sequential problems are a different matter.

In this tutorial, we are going to present a canonical

framework that spans all the communities that work on
this problem class.
… and we are going to see that these four problems
illustrate four fundamental classes of policies for
sequential stochastic optimization.
Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
Modeling stochastic, dynamic problems
Before we can solve complex problems, we have
to know how to think about them.

Min E {cx}
Ax = b
x>0
Organize class
libraries, and set up
communications and
databases

Mathematician

Software
The biggest challenge when making decisions
under uncertainty is modeling.
Deterministic modeling
For deterministic problems, we speak the language
of mathematical programming
» For static problems
min cx Arguably Dantzig’s biggest
Ax  b contribution, more so than
x0 the simplex algorithm, was
» For time-staged problems his articulation of
T
optimization problems in a
min  ct xt standard format, which has
t 0
given algorithmic
At xt  Bt 1 xt 1  bt researchers a common
Dt xt  ut language.
xt  0
Stochastic programming
Robust optimization
Approximate dynamic programming
Model predictive control
Optimal control
Online learning
Reinforcement learning

Markov decision processes

Stochastic programming
Robust optimization
Approximate dynamic programming
Model predictive control
Optimal control
Online learning
Reinforcement learning

Markov decision processes

Modeling
A recent comment by a (helpful) referee:

» …One of the main contributions of the paper is the

demonstration of a policy-based modeling framework for
transportation problems with uncertainty. However, it
could be argued that a richer modeling framework already
exists (multi-stage stochastic programming) that does not
require approximating the decision space with policies….

W. B. Powell, H. Simao, B. Bouzaiene-Ayari, “Approximate Dynamic Programming in

Transportation and Logistics: A Unified Framework,” European J. on Transportation and
Logistics, Vol. 1, No. 3, pp. 237-284 (2012). DOI 10.1007/s13676-012-0015-8.
Modeling as a Markov decision process
For stochastic problems, many people model the
problem using Bellman’s equation
 
V ( s )  min a  C ( s, a )    p ( s ' | s, a)V ( s ') 
 s' 
where
s  "State variable"
a  Discrete action
p( s ' | s, a)  "Model" (transition matrix, transition kernel)
V ( s )  Value of being in state s
  Discount factor
» This is the canonical form of a dynamic program
building on Bellman’s seminal research. Simple,
elegant, widely used but difficult to scale to realistic
problems.
Modeling as a Markov decision process
“Canonical model” from Puterman (Ch. 3)
Modeling as a stochastic program
An alternative strategy is to use the vocabulary of
“stochastic programming.”

min x0 X 0 c0 x0  E Q( x0 , 1 )
where

Q( x0 , 1 ( ))  min x1 ( )X1 ( ) c1 ( ) x1 ( )
» This is the canonical form of stochastic programming,
which might also be written over multiple periods:
T
min c0 x0   p ( ) ct ( ) xt ( )
 t 1
Modeling as a stochastic program
An alternative strategy is to use the vocabulary of
“stochastic programming.”

min xt X t ct xt  E Q ( xt ,  t 1 )
where

Q( xt , t 1 ( ))  min xt 1 ( )X t 1 ( ) ct 1 ( ) xt 1 ( )
» This is the canonical form of stochastic programming,
which might also be written over multiple periods:
tH
min ct xt 

 p (t )  ctt ' ( ) xtt ' ( )
t t t 't 1
Modeling using control theory
From “Optimal Control” by
Lewis, Vrabie and Syrmos

» Standard model in optimal

control is deterministic.

» Bundles objective function

with optimality criteria
comparable to Bellman
equations.
Modeling
We lack a standard language for modeling
sequential, stochastic decision problems.
» In the slides that follow, we propose to model problems
along five fundamental dimensions:

• State variables
• Decision variables
• Exogenous information
• Transition function
• Objective function

» This framework is widely followed in the control

theory community, and almost completely ignored in
operations research and computer science.
Modeling dynamic problems
The system state:
Controls community
xt  "Information state"
 Operations research/MDP/Computer science
 St   Rt , I t , K t   System state, where:
 Rt  Resource state (physical state)
 Location/status of truck/train/plane
 Energy in storage
 I t  Information state
 Prices
 Weather
 K t  Knowledge state ("belief state")
Belief about traffic delays
Belief about the status of equipment
Modeling dynamic problems
Decisions:
Computer science
 at  Discrete action

 Control theory
ut  Low-dimensional continuous vector
 Operations research
 xt  Usually a discrete or continuous but high-dimensional
 vector of decisions.
 At this point, we do not specify how to make a decision.
 Instead, we define the function X  ( s) (or A ( s ) or U  ( s )),
 where  specifies the type of policy. " " carries information
 about the type of function f , and any tunable parameters    f .
Modeling dynamic problems
Exogenous information:
Wt  New information that first became known at time t
 
= Rˆt , Dˆ t , pˆ t , Eˆ t 
 Rˆt  Equipment failures, delays, new arrivals
 New drivers being hired to the network
 Dˆ t  New customer demands
 pˆ t  Changes in prices
 Eˆ t  Information about the environment (temperature, ...)
 Note: Any variable indexed by t is known at time t. This
 convention, which is not standard in control theory,
dramatically simplifies the modeling of information.
 Below, we will let  represent a sequence of actual observations W1 , W2 ,.... Wt  
refers to a sample realization of the random variable Wt .
Modeling dynamic problems
The transition function
St 1  S M ( St , xt , Wt 1 )
 R  R  x  Rˆ Inventories
 t 1

pt 1  pt
t


t t 1

pˆ t 1 Spot prices

Dt 1  Dt  Dˆ
 t 1 Market demands
 Also known as the:
 “System model” “Transfer function”
 “State transition model”
“Plant model”
“Transformation function”
“Law of motion”
 “Plant equation” “Model”
 “Transition law”

For many applications, these equations are unknown.

This is known as “model-free” dynamic programming.
Stochastic optimization models
The objective function

 T

min  E   C  St , X t ( St ) 
 t 

 t 0 
Expectation over all
Cost function
random outcomes
State variable Decision function (policy)
Finding the best policy
Given a system model (transition function)

St 1  S M  St , xt ,Wt 1 ( ) 

We call this the base model.

Objective functions
There are different objectives that we can:
» Expectations

min x E F ( x, W )
» Risk measures

min x E F ( x, W )   E  F ( x, W )  f 
2

min x  r F ( x, W )    Convex/coherent risk measures

» Worst case (“robust optimization”)

min x max w F ( x, w)

The choice of objective is up to the modeler.

Modeling
Deterministic Stochastic
» Objective function » Objective function
T t 
min  E   C  St , X t ( St ) 
T
min
x0 ,..., xT
c x
t 0
t t


 t 0



» Decision variables: » Policy
 x0 ,..., xT  X :S  X
» Constraints: » Constraints at time t
• at time t
At xt  Rt  xt  X t ( St )  Xt
X
xt  0  t

• Transition function » Transition function

Rt 1  bt 1  Bt xt St 1  S M  St , xt ,Wt 1 
» Exogenous information

(W1 , W2 ,..., WT )
Stochastic optimization models
With deterministic problems, we want to find the
best decision:
T
min x0 ,..., xT c x
t 0
t t

With stochastic problems, we want to find the best

function (policy) for making a decision:
 T

min  E    t C  St , X t ( St ) 
 t 0 
» … which is sometimes written
 T

min x0 ,..., xT E   C  St , xt 
 t

 t 0 
where xt is F t  measurable.
A model without an algorithm is like cloud to cloud
lightning…

… pretty to look at, but no impact.

Modeling
There are two practical issues working with this
objective function

» How do we compute the expectation?

» How do we search over policies?

Computing the objective function
In practice, we cannot compute the expectation in:
 T

min  E   C  St , X ( St ) 
 t 

 t 0 
Instead, we might do one long simulation…
T
min  Fˆ     t C  St ( ), X t ( St ( )) 
t 0
…or we might average across several simulations:
N T
1

min  F 
N
  t
 t
C S (
n 1 t  0
 n
), X 
t ( S t ( n
)) 
Modeling
There are two ways to compute the objective:
 T

min  E    t C  St , X  ( St ) 
 t 0 
» Offline learning » Online learning
• We use computer • We observe the
simulation to compute performance of a policy
T in the field.
Fˆ    t C  St ( ), X t ( St ( )) 


t 0

• Provides controlled • No control over test

testing environment. environment.
• Requires living with • Avoids dependence on
model assumptions. model assumptions.
• Can quickly compare • Policy evaluations are
different policies. quite slow.
Searching for a policy
We have to start by describing what we mean by a
policy.
» Definition:

A policy is a mapping from a state to an action.

… any mapping.

How do we search over an arbitrary space of

policies?
» Scanning the literature, it appears that every algorithmic
strategy can be boiled down to four fundamental classes:
Four (meta)classes of policies
1) Policy function approximations (PFAs)
» Lookup tables, rules, parametric functions
2) Cost function approximation (CFAs)
» X CFA ( S |  )  arg min  C  ( S , x |  )
t x X ( ) t t
t t

3) Policies based on value function approximations (VFAs)

xt t t 
» X VFA ( S )  arg min C ( S , x )   V x S x ( S , x )
t t t t t t  
4) Lookahead policies
» Deterministic lookahead:
T
X ( St )  arg min C ( Stt , xtt )   C ( Stt ' , xtt ' )
t
LA  D
xtt ,..., xt ,t  H
t 't 1
» Stochastic lookahead (e.g. scenario trees)
T
X t
LA  S
( St )  arg min C ( Stt , xtt ) 

 
 
) 
p(
t ' t 1
t ' t
C ( Stt ' (), xtt ' ( ))
t
» “Robust optimization”
T
X t
LA  RO
( St )  arg min max C ( Stt , xtt ) 
xtt ,..., xt ,t  H wWt ( )
 C (S
t 't 1
tt ' ( w), xtt ' ( w))
Function approximations
There are three classes of
approximation strategies
» Lookup table
• Given a discrete state, return a discrete
action or value 1 2 3 4 5

» Parametric models
• Linear models (“basis functions”)
• Nonlinear models

» Nonparametric models
• Kernel regression
• Nearest neighbor clustering
• Local polynomial regression
Policies
(Slightly arrogant) claim:

» These are the fundamental building blocks of all

policies.

Many variations can be built using hybrids

» Lookahead plus value function for terminal reward
» Myopic or lookahead policies with policy function
approximation
» Modified lookahead policy (lookahead with CFA)
Searching for policies
The objective function
 TT N T
1 tt  
E 
ˆ
min F 
 
 CC  
S S,
tt
t 
X
(
Ct (
),SS
X ( 
)
tt t (n
S),
t (
X 
t))
( 
S t ( n
)) 
 t tN
0 0 n 1 t  0 
St 1  S M  St , X t ( St ),Wt 1 

Finding the best policy means:

» Search over different classes of policies:
• PFAs, CFAs, VFAs and lookaheads.
• Hybrids (VFA+lookahead, CFA+PFA, …)

» Search over tunable parameters within each class.

Searching for policies
There are tunable parameters for every class of
policy:
» PFAs – These are parametric functions characterized by
parameters such as:
•   ( s, S ) parameters for an inventory policy
• X PFA ( S )     S   S 2
t 0 1 t 2 t
» CFAs – A parameterized cost function
• Bonus and penalties to encourage certain behaviors
• Constraints to ensure buffer stocks or schedule slack
» VFAs – These may be parameterized approximations of
the value of being in a state:
• V (S )     (S )
t f f t

» Lookaheads – Choices of planning horizon, number of

f F

stages, number of scenarios, ….

Searching for policies
Parameter tuning can be done offline or online:
» Offline (in the lab)
• Stochastic search
• Simulation optimization
• Global optimization
• Black box optimization
• Ranking and selection
• Knowledge gradient (offline)
» Online (in the field)
• Bandit problems
– Gittins indices
– Upper confidence bounding
• Online knowledge gradient
Stochastic optimization models
So now that complicated looking objective
function:

 T

min  E   C  St , X t ( St ) 
 t 

 t 0 
St 1  S M  St , xt ,Wt 1 ( ) 

» … becomes something real and practical (and possibly

what you are already doing).
With the right language,
we can learn how to
bridge communities,
creating practical
algorithms for a wide
range of stochastic
optimization problems.
Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
The state variables
What is a state variable?
» Bellman’s classic text on dynamic programming (1957)
describes the state variable with:
• “… we have a physical system characterized at any stage by a
small set of parameters, the state variables.”
» The most popular book on dynamic programming
(Puterman, 2005, p.18) “defines” a state variable with
the following sentence:
• “At each decision epoch, the system occupies a state.”
» Wikipedia:
• “State commonly refers to either the present condition of a
system or entity” or….
• A state variable is one of the set of variables that are used to
describe the mathematical ‘state’ of a dynamical system
The state variables
What is a state variable?
» Kirk (2004), an introduction to control theory, offers
the definition:
• A state variable is a set of quantities x (t ), x (t ),... which if
1 2
known at time t  t0 are determined for t  t0 by specifying
the inputs for the system for t  t0 .
• … or “all the information you need to model the system from
time t onward.” True, but vague.
» Cassandras and Lafortune (2008):
• The state of a system at time t0 is the information required at t
0
such that the output [cost] y(t) for all t  t0 is uniquely
determined from this information and from u (t ), t  t0 .
• Again, consistent with the statement “all the information you
need to model the system from time t0 onward,” but then why
do they later talk about “Markovian” and “non-Markovian”
queueing systems?
The state variables
There appear to be two ways of approaching a
state variable:
» The mathematician’s view – The state variable is a
given, at which point the mathematician will
characterize its properties (“Markovian,” “history-
dependent,” …)

» The modeler’s view – The state variable needs to be

constructed from a raw description of the problem.
The state variable
Illustrating state variables
» A deterministic graph

17.4 15.9
2 5 8
12.6 5.7
8.1 12.7 9.6
8.4 15.9 8.9 7.3
1 3 6 9 11
9.2 2.3
3.6 13.5 4.5
16.5 20.2
4 7 10

St  (?N t )  6
The state variable
Illustrating state variables
» A stochastic graph

2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5

4 7 10

St  ?
The state variable
Illustrating state variables
» A stochastic graph

2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5

4 7 10

 
St  ? N t , ct , Nt , j    6, 12.7,8.9,13.5 
j
The state variable
Illustrating state variables
» A stochastic graph with left turn penalties

2 5 8
12.6
8.1 12.7(.7)
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5

4 7 10

 
St  ?N t , ct , Nt , j

 j 
, N t 1   6, 12, 7,8.9,13.5  ,3 
        
Rt It
The state variable
Illustrating state variables
» A stochastic graph with generalized learning

2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5

4 7 10

St  ?
The state variable
Illustrating state variables
» A stochastic graph with generalized learning

2 5 8
12.6
8.1 12.7
8.4 15.9 8.9
1 3 6 9 11
9.2
3.6 13.5

4 7 10

 

SStt  ?N t , ct , Nt , j
    
 j
,
      

Rt It Kt
The state variable
A proposed definition of a state variable:
» The state variable is the minimally dimensioned
function of history that is necessary and sufficient to
calculate the decision function, the cost/reward
function, and the transition function, from time t
onward.

» This can be described as a “constructionist” definition,

because it specifies how to construct the state variable
from a raw description of the problem.
» Using this definition, all properly modeled problems
are Markovian!
Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
Policy function approximations
Battery arbitrage – When to charge, when to
discharge, given volatile LMPs
Policy function approximations
Grid operators require that batteries bid charge and
discharge prices, an hour in advance.
140.00

120.00

100.00

80.00

60.00

 Discharge
40.00

 Charge 20.00

0.00
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70

We have to search for the best values for the policy

parameters  Ch arg e and  Disch arg e .
Policy function approximations
Our policy function might be the parametric
model (this is nonlinear in the parameters):
1 if pt   charge

X  ( St |  )   0 if  charge  pt   discharge
1 if p   charge
 t
Policy function approximations
Finding the best policy
» We need to maximize
T
max F ( )  E   t C  St , X t ( St |  ) 
t 0

» We cannot compute the expectation, so we run simulations:

 Charge
 Discharge
Policy function approximations
Policy function approximations
A number of fields work on this problem under
different names:
» Stochastic search
» Stochastic programming (“two stage”)
» Simulation optimization
» Black box optimization
» Global optimization
» Control theory (“open loop control”)
» Sequential design of experiments
» Bandit problems (for on-line learning)
» Ranking and selection (for off-line learning)
» Optimal learning
Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
Schneider National
Slide 62
Cost function approximations
Drivers Demands
Cost function approximations

t t+1 t+2

The assignment of drivers to loads evolves over time, with new loads
being called in, along with updates to the status of a driver.
Cost function approximations
A purely myopic policy would solve this problem
using

min x  ctdl xtdl

d l
where
1 If we assign driver d to load l
xtdl  
0 Otherwise
ctdl  Cost of assigning driver d to load l at time t

What if a load it not assigned to any driver, and has been

delayed for a while? This model ignores the fact that we
eventually have to assign someone to the load.
Cost function approximations
We can minimize delayed loads by solving a
modified objective function:

min x   ctdl   tl  xtdl

d l
where

 tl  How long load l has been delayed by time t

  Bonus for moving a delayed load

We refer to our modified objective function as a cost

function approximation.
Cost function approximations
We now have to tune our policy, which we define
as:

X  ( St |  )  arg min x  ctdl   tl  xtdl

d l

We can now optimize  , another form of policy search,

by solving
T
min F  ( )  E  C ( St , X t ( St |  ))
t 0
Robust cost function approximation
Inventory management

» How much product

should I order to
anticipate future
demands?

» Need to accommodate
different sources of
uncertainty.
• Market behavior
• Transit times
• Supplier uncertainty
• Product quality
Robust cost function approximations
Imagine that we want to purchase parts from
different suppliers. Let xtp be the amount of
product we purchase at time t from supplier p to
meet forecasted demand Dt . We would solve
X t ( St )  arg min xt c
pP
x
p tp

subject to

x tp  Dt 
pP 
xtp  u p  Xt

xtp  0 
» This assumes our demand forecast D is accurate.
t
Robust cost function approximations
Imagine that we want to purchase parts from
different suppliers. Let xtp be the amount of
product we purchase at time t from supplier p to
meet forecasted demand Dt . We would solve
X t ( St |  )  arg min xt c
pP
x
p tp

subject to
Buffer inventory
x tp  Dt   
pP

xtp  u p  Xt ( )
xtp  0 

» This is a “parametric cost function approximation”
Cost function approximations
A general way of creating CFAs:
» Define our policy:
 
X t ( )  arg min x  C ( St , xt )    f  f ( St , xt ) 


 f F 
      
Sometimes mistaken as a value function
approximation, it is really a cost correction term.

» We again tune  by optimizing:

T
min F ( )  E  C ( St , X t ( ))


t 0
Cost function approximations
An even more general CFA model:
» Define our policy:

X t ( )  arg min x C  ( St , xt |  ) Parametrically

modified costs
subject to

Ax  b  ( ) Parametrically
modified constraints

» We again tune  by optimizing:

T
min F  ( )  E  C ( St , X t ( ))
t 0
Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
The locomotive assignment problem

Atlanta
Horsepower Locomotives
Baltimore
4400
4400
6000
4400 Charlotte
5700
4600
6200
The locomotive assignment problem

Train reward
Locomotive
Horsepower Locomotives buckets

4400
4400
6000
4400
5700
4600
6200
The value of locomotives
in the future
The locomotive assignment problem

4400
4400
6000
4400
5700
4600
6200

Locomotive subproblem can be solved quickly using Cplex

Approximate dynamic programming
Step 1: Start with a pre-decision state Stn
Step 2: Solve the deterministic optimization using
Deterministic
an approximate value function:
optimization
n
vˆt  min x C ( St , xt )  Vt
n n 1 M ,x n
(S ( St , xt )) 
to obtain xtn.
Step 3: Update the value function approximation Recursive
Vt n1 ( Stx,1n )  (1   n 1 )Vt n11 ( Stx,1n )   n 1vˆtn statistics
Step 4: Obtain Monte Carlo sample of Wt ( n ) and
compute the next pre-decision state: Simulation

Stn1  S M ( Stn , xtn , Wt 1 ( n ))

Step 5: Return to step 1.
Iterative learning
Approximate dynamic programming
t
Iterative learning
Approximate dynamic programming
Iterative learning
Approximate dynamic programming
Exploiting concavity
Derivatives are used to estimate a piecewise linear
approximation

Vt ( Rt )

Rt
Value function approximations
Objective function
1900000

1800000

1700000
Objective function

1600000

1500000

1400000

1300000

1200000
0 100 200 300 400 500 600 700 800 900 1000

Iterations
Model calibration
Deterministic vs. stochastic training

Stochastic
training
Value of locomotives at a yard

Deterministic
training

Number of locomotives
Laboratory testing
Train delay with uncertain transit times
» Stochastic training produces lower delays

Train delay using deterministically trained VFAs

Train delay

Train delay using stochastically trained VFAs

For more information see http://www.castlelab.princeton.edu/plasma.htm

Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
Lookahead policies
The ultimate lookahead policy is optimal
 T 
X ( St )  arg min C ( St , xt )  min   E  C ( St ' , X t' ( St ' )) | St , xt 
*
t
xt  t 't 1 

Minimization that we
cannot compute

Expectation that we
cannot compute
Lookahead policies
The ultimate lookahead policy is optimal
 T 
X ( St )  arg min C ( St , xt )  min   E  C ( St ' , X t' ( St ' )) | St , xt 
*
t
xt  t 't 1 

Instead, we have to solve an approximation called

the lookahead model:

 T 
X ( St )  arg min C ( St , xt )  min  E  C ( St ' , X t' ( St ' )) | St , xt 
*
t
xt  t 't 1 

» A lookahead policy works by approximating the

lookahead model.
Stochastic lookahead policies
We use a series of approximations:
» Horizon truncation – Replacing a longer horizon
problem with a shorter horizon
» Stage aggregation – Replacing multistage problems
with two-stage approximation.
» Outcome aggregation/sampling – Simplifying the
exogenous information process
» Discretization – Of time, states and decisions
» Dimensionality reduction – We may ignore some
variables (such as forecasts) in the lookahead model
that we capture in the base model (these become latent
variables in the lookahead model).
Lookahead policies
Lookahead policies are the trickiest to model:
» We create “tilde variables” for the lookahead model:

St ,t '  Approximated state variable (e.g coarse discretization)

xt ,t '  Decision we plan on implementing at time t ' when we are
planning at time t , t '  t , t  1,..., t  H
t   xt ,t , xt ,t 1 ,..., xt ,t  H 
x
Wt ,t '  Approximation of information process
ct ,t '  Forecast of costs at time t ' made at time t
bt ,t '  Forecast of right hand sides for time t ' made at time t
Lookahead policies
Deterministic lookahead

Stochastic lookahead (with two-stage

approximation)
T
X t
LA S
( St )  arg min C ( Stt , xtt )   
 
)   t ' t C ( S tt ' (), xtt ' ( ))
p (
t 't 1
t

Scenario trees
Lookahead policies
 Assume the base model has T time periods
T
Lookahead policies
 But we solve a smaller lookahead model (from t to t+H)
0 0+H
Lookahead policies
 Following a lookahead policy
1 1+H
Lookahead policies
 … which rolls forward in time.
2 2+H
Lookahead policies
 … which rolls forward in time.
3 3+H
Lookahead policies
 … which rolls forward in time.
t t+H
Lookahead policies
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model

. . . .

t t 1 t2 t 3
The real process
Lookahead policies
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model

. . . .

t t 1 t2 t 3
The real process
Lookahead policies
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model

. . . .

t t 1 t2 t 3
The real process
Lookahead policies
Lookahead policies peek into the future
» Optimize over deterministic lookahead model
The lookahead model

. . . .

t t 1 t2 t 3
The real process
Modeling stochastic wind
Actual vs. forecasted energy from wind

This is our forecast ftt ' of the wind power at

time t’, made at time t.

This is the actual energy from wind, showing

the deviations from forecast.

t  Current time t '  Some point in the future

Modeling stochastic wind
Creating wind scenarios (Scenario #1)
Modeling stochastic wind
Creating wind scenarios (Scenario #2)
Modeling stochastic wind
Creating wind scenarios (Scenario #3)
Stochastic lookahead policies
Creating wind scenarios (Scenario #4)
Stochastic lookahead policies
Creating wind scenarios (Scenario #5)
Stochastic lookahead policies
Stochastic lookahead
» Here, we approximate the information model by using a
Monte Carlo sample to create a scenario tree:
1am 2am 3am 4am 5am …..

Change in wind speed

Slide 109
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model

. . . .

t t 1 t2 t 3
The base model
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model

. . . .

t t 1 t2 t 3
The base model
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model

. . . .

t t 1 t2 t 3
The base model
Lookahead policies
We can then simulate this lookahead policy over
time:
The lookahead model

. . . .

t t 1 t2 t 3
The base model
Stochastic lookahead policies
Stochastic lookahead

1am 2am 3am 4am 5am …..

Change in wind speed

Stochastic lookahead policies
The two-stage approximation
1) Schedule
steam
x0

2) See wind:

3) Schedule turbines
Stochastic lookahead policies
Some common misconceptions about stochastic
programming (for sequential problems):

» Solving a “stochastic program” is hard, but getting an

optimal solution does not produce an optimal policy.

» Bounds on the quality of the solution to a stochastic

program is not a bound on the quality of the policy.

» We only care about the quality of the policy, which can

only be evaluated using a stochastic base model.
Approximating distributions
Stochastic lookahead model
» Uses approximation of the information process in the
lookahead model

Parametric distribution (pdf) Nonparametric distribution (cdf)

f ( x) F ( x)


  x   2 
 
1  2 
f ( x)  e  

2
Approximating a function
Parametric vs. nonparametric

Observations
Nonparametric fit
Total revenue

Parametric fit
True function

» Robust CFAs are parametric Price of product

» Scenario trees are nonparametric
Lookahead policies
Robust optimization
» Static robust optimization is a problem:

min x max wW F ( x, w)

» Robust optimization for sequential problems (as it is
practiced in the literature) is a lookahead policy:
X tRO ( St |  )  arg min x0 ,..., xT max w1 ,..., wT W ( ) C ( St , xt )
where St 1  S M ( St , xt , wt 1 )
The RO community tunes  using Uncertainty set
T
min F ( )  E  C  St , X tRO ( St |  ) 
t 0
Thus, they approximate both the information process (using the
uncertainty set W ( )) and the objective (max instead of E ).
This is the same as:
  
T
min  E  C  St , X ( St ) 


 t 0 
Stochastic lookahead policies
Notes:
» It is nice to talk about simulating a stochastic lookahead model
using a multistage model, but multistage models are almost
impossible to solve (we are not aware of any testing of multistage
stochastic unit commitment).
» Even two-stage approximations of lookahead models are quite
difficult for many applications, so simulating these policies remain
quite difficult, and researchers typically do not even develop a
simulator to test their policies.
» In our experience, simulations of stochastic lookahead models
tend to consist of sampling scenarios from the lookahead model.
They should be tested on a full base model.
» In a real application such as stochastic unit commitment, a number
of approximations are required in the lookahead model that should
not be present in the base model.
Outline
Four basic problems
Modeling sequential decision problems
State variables
The four classes of policies
» Policy function approximations (PFAs)
» Cost function approximations (CFAs)
» Value function approximations (VFAs)
» Lookahead policies
An energy storage illustration
Matching policies to problems
An energy storage problem
Consider a basic energy storage problem:

» We are going to show that with minor variations in the

characteristics of this problem, we can make each class
of policy work best.
An energy storage problem
A model of our problem
» State variables
» Decision variables
» Exogenous information
» Transition function
» Objective function
An energy storage problem
State variables
L
E B

» We will present the full model, accumulating the

information we need in the state variable.
» We will highlight information we need as we proceed.
This information will make up our state variable.
An energy storage problem
Decision variables
L
E B

xt   xtEL , xtEB , xtGL , xtGB , xtBL , 

» Constraints;

» Policy: Might be lookahead using forecasts f L  ( f L )

t tt ' t 't
An energy storage problem
Exogenous information
L
E B

Eˆ t  Change in energy from wind between t  1 and t

 tp  Noise in the price process between t  1 and t
Wt  Dˆ t  Change in load between t  1 and t
f ttL'  Forecast of load Dtload
' provided by vendor at time t
f t L   f ttL' 
t 't
An energy storage problem
Transition function
L
E B

Et 1  Et  Eˆ t 1
pt 1   0 pt  1 pt 1   2 pt  2   tp1
D  D  Dˆ
t 1 t t 1

f t L  Provided exogenously
Rtbattery
1  Rt
battery
 xt
An energy storage problem
Objective function
L
E B

C ( St , xt )  pt  xtGB  xtGL 
T
min  E  C ( St , X t ( St ))
t 0
An energy storage problem
State variables
» Cost function St  Rt , Et , Lt , ( pt , pt 1 , pt  2 ), f t L 
pt  Price of electricity
» Decision function
Constraints:

f t L  Needed if we use a lookahead policy

» Transition function
pt 1   0 pt  1 pt 1   2 pt  2   tp1
An energy storage problem
We can create distinct flavors of this problem:
» Problem class 1 – Best for PFAs
• Highly stochastic (heavy tailed) electricity prices
• Stationary data
» Problem class 2 – Best for CFAs
• Stochastic prices and wind (but not heavy tailed)
• Stationary data
» Problem class 3 - Best for VFAs
• Stochastic wind and prices (but not too random)
• Time varying loads, but inaccurate wind forecasts
» Problem class 4 – Best for deterministic lookaheads
• Relatively low noise problem with accurate forecasts
» Problem class 5 – A hybrid policy worked best here
• Stochastic prices and wind, nonstationary data, noisy forecasts.
An energy storage problem
The policies
» The PFA:
• Charge battery when price is below p1
• Discharge when price is above p2
» The CFA
• Minimize a cost with an error correction term.
» The VFA
• Piecewise linear, concave value function in terms of energy,
indexed by time.
» The lookahead (deterministic)
• Optimize over a horizon H (only tunable parameter) using
forecasts of demand, prices and wind energy
» A hybrid lookahead CFA
• Deterministic lookahead with bounds on inventories in the
future.
An energy storage problem
Each policy is best on certain problems
» Results are percent of posterior optimal solution

» … any policy might be best depending on the data.

Joint research with Prof. Stephan Meisel, University of Muenster, Germany.

 Robust cost function

approximation

 Lookahead policy

 Policy function
approximation

 Policy based on value

function approximation
Choosing a policy
Which policy to use?
» PFAs are best for low-
dimensional problems
where the structure of the
policy is apparent from the
problem.

» CFAs work for high-

dimensional problems,
where we can get desired X t ( )  arg min x   ctdl   l  xtdl
d l
behavior by manipulating
the cost function.
Choosing a policy
Which policy to use?
» VFAs work best when the
lookahead model is easy to
approximate

» Lookahead models should

be used only when all else
fails (which is often)
Modeling sequential decision problems
First build your model Then design your policies:
» Objective function » PFA? Exploit obvious
T t  problem structure.
min  E   C  St , X t ( St ) 
 

 t 0  » CFA? Can you tune a

» Policy deterministic approximation
X :S  X to make it work better?
» Constraints at time t
» VFA? Can you approximate
the value of being in a
xt  X t ( St )  Xt downstream state?

» Transition function » Lookahead? Do you have a

forecast? What is the nature
St 1  S M  St , xt ,Wt 1  of the uncertainty?
» Exogenous information
» Hybrid?
(W1 , W2 ,..., WT )
Stochastic programming
Robust optimization
Approximate dynamic programming
Model predictive control
Optimal control
Online learning
Reinforcement learning

Markov decision processes

Stochastic lookahead
Worst-case lookahead
VFA-based policies
Deterministic lookahead
Parametric VFA
Myopic policy
VFA-based policy with discrete actions

Exact lookup table

Computational Stochastic
Optimization

Stochastic programming
Robust optimization
Approximate dynamic programming
Model predictive control
Optimal control
Online learning
Reinforcement learning
Markov decision processes
Thank you!
For a related tutorial, go to:

http://www.castlelab.princeton.edu

and click on the link “Clearing the Jungle

of Stochastic Optimization”

http://www.castlelab.princeton.edu/jungle.htm

Advanced Calculus for Mathematical Modeling in Engineering and Physics (David Stapleton) (Z-Library)
No ratings yet
Advanced Calculus for Mathematical Modeling in Engineering and Physics (David Stapleton) (Z-Library)
866 pages
First Push Grade 12 March 2024
No ratings yet
First Push Grade 12 March 2024
10 pages
Toronto Municipal Code Chapter 349, Animals: April 1, 2023
No ratings yet
Toronto Municipal Code Chapter 349, Animals: April 1, 2023
32 pages
Dynamic Programming and Optimal Control
No ratings yet
Dynamic Programming and Optimal Control
199 pages
Sample Dipum Project Solutions
50% (2)
Sample Dipum Project Solutions
49 pages
Powell UnifiedFrameworkforOUU ECSO Tutorial Sept222017 PDF
No ratings yet
Powell UnifiedFrameworkforOUU ECSO Tutorial Sept222017 PDF
177 pages
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
No ratings yet
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
22 pages
Powell UnifiedFrameworkStochasticOptimization Jan292018
No ratings yet
Powell UnifiedFrameworkStochasticOptimization Jan292018
69 pages
Dynamic Optimization
No ratings yet
Dynamic Optimization
73 pages
Dynamic Programming
No ratings yet
Dynamic Programming
52 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
100% (1)
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
86 pages
Abstract Dynamic Programming
No ratings yet
Abstract Dynamic Programming
257 pages
On State Variables and POMDP-s
No ratings yet
On State Variables and POMDP-s
49 pages
Stochastic DP
No ratings yet
Stochastic DP
23 pages
Powell - Modernizing The Teaching of Optimization January 5 2024
No ratings yet
Powell - Modernizing The Teaching of Optimization January 5 2024
8 pages
Stochastic Optimization: Anton J. Kleywegt and Alexander Shapiro
No ratings yet
Stochastic Optimization: Anton J. Kleywegt and Alexander Shapiro
43 pages
Dynamic Optimization - Book
No ratings yet
Dynamic Optimization - Book
84 pages
002 2012 Intro To Optimal Control
No ratings yet
002 2012 Intro To Optimal Control
53 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
Introduction To Dynamic Optimization
No ratings yet
Introduction To Dynamic Optimization
7 pages
MIT 6.036 Lecture
No ratings yet
MIT 6.036 Lecture
64 pages
Dynamic Programming and Optimal Control Script
No ratings yet
Dynamic Programming and Optimal Control Script
58 pages
Solving Stochastic Planning Problems With Large State and Action Spaces
No ratings yet
Solving Stochastic Planning Problems With Large State and Action Spaces
9 pages
Notas - Dynamic Optimation and Optimal Control
No ratings yet
Notas - Dynamic Optimation and Optimal Control
26 pages
CSE2530__Reinforcement_Learning__2025_P1+2
No ratings yet
CSE2530__Reinforcement_Learning__2025_P1+2
115 pages
Abstract Dynamic Programming Bertsekas Dimitri P download
No ratings yet
Abstract Dynamic Programming Bertsekas Dimitri P download
87 pages
Babul Chapter 2 SP vs DP New
No ratings yet
Babul Chapter 2 SP vs DP New
52 pages
RL Class Notes (4)
No ratings yet
RL Class Notes (4)
68 pages
MIT6 231F15 Notes PDF
No ratings yet
MIT6 231F15 Notes PDF
303 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
Optimization Under Uncertainty: Lecture Notes
No ratings yet
Optimization Under Uncertainty: Lecture Notes
118 pages
HandbookORMS SP-Chapter01
No ratings yet
HandbookORMS SP-Chapter01
64 pages
Reinforcement Learning Model Based Planning Dynamic Programming
No ratings yet
Reinforcement Learning Model Based Planning Dynamic Programming
17 pages
Bia 2
No ratings yet
Bia 2
4 pages
DP Methods
No ratings yet
DP Methods
61 pages
Add-On DRL CS06
No ratings yet
Add-On DRL CS06
23 pages
Stochastic Programming 2nd Edition Kall P Wallace S instant download
100% (2)
Stochastic Programming 2nd Edition Kall P Wallace S instant download
76 pages
Reinforcement Learning and Optimal Control - Draft Version by Dmitri Bertsekas
No ratings yet
Reinforcement Learning and Optimal Control - Draft Version by Dmitri Bertsekas
268 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
An Introduction to Reinforcement Learning From theory to algorithms (December 19, 2024)_Joon Kwon
No ratings yet
An Introduction to Reinforcement Learning From theory to algorithms (December 19, 2024)_Joon Kwon
66 pages
Reinforcement-Learning-Cheatsheet
No ratings yet
Reinforcement-Learning-Cheatsheet
16 pages
Book-Decision Making Under Uncertainty and Reinforcement Learning
No ratings yet
Book-Decision Making Under Uncertainty and Reinforcement Learning
273 pages
Lec10 Dss PDF
No ratings yet
Lec10 Dss PDF
13 pages
La5 PDF
No ratings yet
La5 PDF
35 pages
Dynamic Programming and Optimal Control 3rd Edition, Volume II
No ratings yet
Dynamic Programming and Optimal Control 3rd Edition, Volume II
233 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
50 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
Operations Research Notes Main
No ratings yet
Operations Research Notes Main
123 pages
Notes Summary
No ratings yet
Notes Summary
65 pages
ACC23 Tutorial Paulson
No ratings yet
ACC23 Tutorial Paulson
12 pages
Lecture 5 - ModelFreePrediction
No ratings yet
Lecture 5 - ModelFreePrediction
79 pages
Process Optimisation: Dynamic Programming
No ratings yet
Process Optimisation: Dynamic Programming
35 pages
Reinforcement Learning: Foundations
No ratings yet
Reinforcement Learning: Foundations
276 pages
Decision Uncertainty
No ratings yet
Decision Uncertainty
269 pages
An Introduction To Policy Search Methods: Thomas Furmston
No ratings yet
An Introduction To Policy Search Methods: Thomas Furmston
33 pages
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
No ratings yet
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
42 pages
SSRN 4963741
No ratings yet
SSRN 4963741
26 pages
Book All-In-One 2
No ratings yet
Book All-In-One 2
281 pages
Stochastic Programming
No ratings yet
Stochastic Programming
9 pages
CH-2 Models and Modeling
No ratings yet
CH-2 Models and Modeling
48 pages
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
TypicalZimbabweansmall-scalepigproductionsystemsAcasestudyofNyabaneBulilimaDistrictMatabelelandSouthProvince
No ratings yet
TypicalZimbabweansmall-scalepigproductionsystemsAcasestudyofNyabaneBulilimaDistrictMatabelelandSouthProvince
14 pages
strategies_to_link_smallholder_pig_farmers_to_for-wageningen_university_and_research_312971
No ratings yet
strategies_to_link_smallholder_pig_farmers_to_for-wageningen_university_and_research_312971
61 pages
Duplex Flashcards Mod1
No ratings yet
Duplex Flashcards Mod1
20 pages
ANP403
No ratings yet
ANP403
94 pages
Mayor 0001
No ratings yet
Mayor 0001
11 pages
SlideEgg - 72665-Roadmap Journey PowerPoint Template
No ratings yet
SlideEgg - 72665-Roadmap Journey PowerPoint Template
1 page
Toronto Municipal Code Chapter 363, Building Construction and Demolition
No ratings yet
Toronto Municipal Code Chapter 363, Building Construction and Demolition
70 pages
Toronto Municipal Code Chapter 447, Fences: June 9, 2021
No ratings yet
Toronto Municipal Code Chapter 447, Fences: June 9, 2021
18 pages
Toronto Municipal Code Chapter 591, Noise: September 1, 2022
No ratings yet
Toronto Municipal Code Chapter 591, Noise: September 1, 2022
14 pages
Toronto Municipal Code Chapter 629, Property Standards: August 15, 2022
No ratings yet
Toronto Municipal Code Chapter 629, Property Standards: August 15, 2022
58 pages
Sri Sthuti PDF
No ratings yet
Sri Sthuti PDF
91 pages
Schedule Port of Montreal
No ratings yet
Schedule Port of Montreal
1 page
Lecture 1.05 - Cartesian Product and Functions
No ratings yet
Lecture 1.05 - Cartesian Product and Functions
15 pages
Addis Abeba Education Bureau Yeka Sub City Education Office: General Directions
No ratings yet
Addis Abeba Education Bureau Yeka Sub City Education Office: General Directions
9 pages
Sec 3.1 Functions
No ratings yet
Sec 3.1 Functions
60 pages
MATLAB Advanced Programming PDF
100% (1)
MATLAB Advanced Programming PDF
84 pages
Intervals of Increase and Decrease
No ratings yet
Intervals of Increase and Decrease
2 pages
CSC 108H1 F 2011 Test 1 Duration - 45 Minutes Aids Allowed: None
No ratings yet
CSC 108H1 F 2011 Test 1 Duration - 45 Minutes Aids Allowed: None
6 pages
Topic Outline Basic Formulas and Functions
No ratings yet
Topic Outline Basic Formulas and Functions
12 pages
LESSON 1 Introduction of Gen - Math1
No ratings yet
LESSON 1 Introduction of Gen - Math1
8 pages
Scheme of Studies ADP (IT) and BS (IT) 5th, Bridging Program (2022-24)
No ratings yet
Scheme of Studies ADP (IT) and BS (IT) 5th, Bridging Program (2022-24)
28 pages
Formative Assessment Plan Grade 9-Mathematics 1 Quarter Grading Period
No ratings yet
Formative Assessment Plan Grade 9-Mathematics 1 Quarter Grading Period
30 pages
Occupational Standards for a Library and Information Science Technician Level 60001
No ratings yet
Occupational Standards for a Library and Information Science Technician Level 60001
93 pages
Math Reviewer 2019
No ratings yet
Math Reviewer 2019
237 pages
Control M Curso
100% (1)
Control M Curso
166 pages
1.1.2 Checkup - Lessons Learned (Checkup)
No ratings yet
1.1.2 Checkup - Lessons Learned (Checkup)
9 pages
Detailed Courses: Rank Booster Questions Practice JEE Main and Advanced Level
No ratings yet
Detailed Courses: Rank Booster Questions Practice JEE Main and Advanced Level
4 pages
2010 Preliminary Examination II: Pre-University 3
No ratings yet
2010 Preliminary Examination II: Pre-University 3
7 pages
Most Essential Learning Module in Mathematics 2020
No ratings yet
Most Essential Learning Module in Mathematics 2020
14 pages
Inverse Function Competency
No ratings yet
Inverse Function Competency
30 pages
A Simplified Adaptive Neuro-Fuzzy Inference System - ANFIS - Controller Trained by Genetic Algorithm To Control Nonlinear Multi-Input Multi-Output Systems
No ratings yet
A Simplified Adaptive Neuro-Fuzzy Inference System - ANFIS - Controller Trained by Genetic Algorithm To Control Nonlinear Multi-Input Multi-Output Systems
12 pages
Cve Student Handbook Civil 2023 - 083815
100% (1)
Cve Student Handbook Civil 2023 - 083815
87 pages
Linear Programming Problems
No ratings yet
Linear Programming Problems
24 pages
Lesson 7 Homework Practice Constant Rate of Change
100% (1)
Lesson 7 Homework Practice Constant Rate of Change
8 pages
Assignment 1: Problem 1.1 Printing
No ratings yet
Assignment 1: Problem 1.1 Printing
2 pages
Delft Design Guide
No ratings yet
Delft Design Guide
176 pages
Maximum and Minimum Values: Click Here For Answers. Click Here For Solutions
No ratings yet
Maximum and Minimum Values: Click Here For Answers. Click Here For Solutions
5 pages
Practiceexam Matha Web Solutions 2022
No ratings yet
Practiceexam Matha Web Solutions 2022
16 pages
An Introductory Guide to Multi-Criteria Decision Analysis (MCDA) – Government Analysis Function
No ratings yet
An Introductory Guide to Multi-Criteria Decision Analysis (MCDA) – Government Analysis Function
59 pages