0% found this document useful (0 votes)

229 views6 pages

Risk-Constrained Markov Decision Processes

We propose a new constrained Markov decision process framework with risk-type constraints. The risk metric we use is Conditional Value-at-Risk (CVaR), which is gaining popularity in finance. We propose an iterative offline algorithm to find the riskcontrained optimal control policy.

Uploaded by

Sruthiy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

229 views6 pages

Risk-Constrained Markov Decision Processes

Uploaded by

Sruthiy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

49th IEEE Conference on Decision and Control December 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA

Risk-constrained Markov Decision Processes

Vivek Borkar & Rahul Jain

Abstract We propose a new constrained Markov decision process framework with risk-type constraints. The risk metric we use is Conditional Value-at-Risk (CVaR), which is gaining popularity in nance. It is a conditional expectation but the conditioning is dened in terms of the level of the tail probability. We propose an iterative ofine algorithm to nd the riskcontrained optimal control policy. A stochastic approximationinspired learning variant is also sketched. Index Terms Constrained Markov decision processes; Risk measures; Stochastic Approximations.

I. I NTRODUCTION The theory of single-stage stochastic programming is fairly well-developed [6]. Equally well developed is the theory of (unconstrained) stochastic dynamic programming for sequential optimization [4]. Marrying the two theories to develop a theory of sequential stochastic optimization with general constraints has proved to be challenging [31]. Extending stochastic programming methods to multi-stage problems has remained challenging [32] while attempts to extend dynamic programming approaches to optimization with constraints have been successful only when constraints have special form [7], [2]. In the theory of constrained Markov decision processes developed so far [2], both the objective function and constraints have the same form, usually an expectation of a sum, discounted or averaged. This allows for introduction of an occupation measure. The sequential optimization problem can then be formulated as a convex program. When the objective and constraint functions have different forms, this technique does not work. Moreover, the occupation measure technique cannot be generalized to probabilistic and general conditional expectation (of the form we will shortly see). Algorithms for stochastic programs with probabilistic/chance or conditional expectation constraints have been developed [25]. However, as with other stochastic programs, it is very difcult to extend these to a multi-stage sequential optimization setting. Our interest is in solving a sequential stochastic optimization problem with conditional expectation constraints. Our motivation comes from nancial risk management. Thus, the constraint is on a risk measure (i.e., Conditional Value at Risk) which has a conditional expectation form, but is a bit unusual in that the conditioning
Vivek Borkar is with the School of Technology and Computer Science, Tata Institute of Fundamental Research (TIFR) Mumbai. Email: [email protected]. Rahul Jain is with the EE & ISE Departments, University of Southern California, Los Angeles. Email: [email protected]. His research is supported by a James H. Zumberge Faculty Research and Innovation award, the NSF grant IIS-0917410 and the NSF CAREER award CNS0954116.

event is determined by a constraint on a probability (i.e., Value at Risk). Value-at-Risk (VaR) is a popular risk metric in nance. For a real-valued random variable Y on some probability space (, F, P ), the VaR at level , = (Y ) is dened to be arg sup P (Y > ) . Unfortunately, it has many shortcomings including the fact that it is not subadditive, i.e., the VaR of a portfolio may be greater than the sum of VaRs of the portfolio constituents. Thus, another measure called Conditional VaR (CVaR) was introduced by Artzner, et al. [3], which is dened as E[Y |Y > ], and depends on the VaR at level . It has been shown that CVaR is a coherent risk measure, i.e., it is convex, monotone, positively homogeneous and translation equivariant, and is amenable to standard methods of stochastic programming for its optimization [26]. However, often the risk must be measured and optimized not just of the current portfolio, but over time in a sequential manner. Motivated by this, we introduce a constrained Markov decision model where the constraint is on a CVaR-type conditional expectation. Consider a Markov decision process (MDP) dened on state space X, a control space U, with a reward function r(x, u) and a cost function c(x, u) where x X and u U, a transition function Pu (dx |x) and a nite horizon T . Let T Y (T +1) denote t=0 c(Xt , ut ). Then the CVaR at level is given by (Y (T +1)) = E[Y (T +1)|Y (T +1) > (Y (T + 1))]. We would like to maximize the expected reward over a nite time horizon subject to an upper bound on the T conditional expectation, CVaR: maxu {E[ t=0 r(Xt , ut )] : (Y (T + 1)) C }. This is useful when decision-making may involve multiple objectives, one say a reward, and another a cost. The goal is to maximize the expected total reward over a certain time horizon while making sure the conditional expectation of the total cost given the total cost exceeds some given level remains bounded. An example of this is the re-insurance business where the re-insurance companies want to collect premiums (the rewards) by providing re-insurance coverage while ensuring that in the case of rare (that have probability less than ) but catastrophic events (e.g., natural calamities such as devastating hurricanes or oods), the expected payouts (the costs) remain bounded. Standard methods for solving Constrained MDPs such as the occupation measure technique and the Lagrangian method [2] cannot deal with conditional expectation constraints of CVaR type. In this paper, we give an ofine multiple time-scale iterative algorithm to solve this problem. We prove its convergence under certain assumptions. We then propose an online stochastic approximations-based learning

978-1-4244-7746-3/10/$26.00 2010 IEEE

2664

algorithm. Literature overview of CMDPs and Stochastic Programming for Risk Analysis Constrained Markov Decision Process (CMDP) models were rst introduced by Derman and Klein [14] in the 1960s. It had been noticed that such models were not generally amenable to solution by dynamic programming. Thus, linear programming based solutions using an occupation measure approach (which had already been developed for dynamic programming formulations [13]) were proposed. This was extended by Kallenberg and others [18] to discounted cost, total cost, and average cost criterion for MDPs with a unichain structure. Borkar [7], Altman and others [16], [2] further generalized this approach to average cost with general multi-chain ergodic structure. A second method based on a Lagrangian approach was developed in [5] for MDPs with a single constraint, and extended to multiple constraints in [1]. A third method based on a linear program mixing stationary deterministic policies was developed in [1], [27], [15]. CMDP models with different discount factors for different constraints tend to be much more difcult, and usually optimal stationary policies do not exist though it has been shown that the optimal policies are eventually stationary [16]. Sample path formulations of MDPs were introduced in [28] and shown to satisfy Bellmans principle of optimality [17]. Alternative solution approaches based on stochastic approximations have also been proposed in [22]. However, almost all of this literature [2] is focused on expectation constraints. In many applications, other kinds of constraints also appear, such as probabilistic constraints, and more generally stochastic dominance constraints [12]. For example, it is said that Wall Street wants a lot of potential prot on the upside, and not much risk of losses on the downside, i.e., it wants to maximize prots subject to bounds on the risk of losses exceeding a certain amount. One popular measure of risk is the Value-at-Risk (VaR) metric, which is the smallest loss that is exceeded with probability at most some given level (typically, 5%). However, the VaR metric has many undesirable properties including the lack of sub-additivity (a portfolio of two assets may have a greater VaR than the sum of individual VaRs), is non-convex and non-smooth. Thus, a related convex measure - conditional Value-at-Risk (CVaR) was introduced in [3]. This measure is convex, monotonic, translation equivariant and positively homogeneous (and is the only such measure with these properties among popular risk measures, such as Markowitzs Mean-Variance (MV) risk measure [21], etc.) [29]. Optimization methods for computing single-stage CVaR have been proposed [26], [12], [29]. However, their extensions to multi-stage sequential problems suffers from computational difculties similar to multi-stage stochastic programs [6]. Furthermore, all of this literature focuses on computing optimal strategies that minimize the VaR and CVaR. Our interest instead is in multiobjective sequential optimization problems which can be formulated as a constrained Markov decision process where

one of the constraints is a CVaR type risk constraint. We propose an ofine iterative algorithm, as well as a stochastic approximation based online learning algorithm to solve the problem. The paper is organized as follows. In II, we introduce the problem formulation. Section III then presents preliminary results while section IV presents an ofine iterative quasigradient stochastic-approximation inspired algorithm. In section V, we present an online learning algorithm. These results are all for a nite horizon. Section VI discusses further work. II. P ROBLEM F ORMULATION Consider a compact metric state space X, a compact metric control space U, a continuous reward function r(x, u), a continuous cost function c(x, u) where x X and u U, a controlled transition function Pu (dx |x, u) continuous in (x, u), and a nite horizon T . Time is discrete and starts at 0. We will denote a policy by u = uT = (u1 , , uT ), where ut is the control applied at time t according to this policy. We will denote Pu as the probability measure on the nite horizon process X T = (X0 , , XT ) under control policy u. Only noisy observations of the cost are available. Thus, given a zero-mean i.i.d. noise process {t } with strictly positive density , we dene the cumulative cost process {Yt } as Y0 = 0, Yt+1 = Yt + c(Xt , ut ) + t+1 . (1)

We dene the Value-at-Risk (VaR) function = (Y ) for a random variable Y as (Y ) = arg sup P(Y > ) ,

with (0, 1) and is typically close to 0 such as 0.1, 0.05, or 0.01. The Conditional-Value-at-Risk (CVaR) function is dened as (Y ) = E[Y |Y > (Y )]. Then, our objective is to maximize the expected total reward over the nite horizon subject to the CVaR of the terminal cost being bounded by some constant C . rMDP : max E[
u T t=0

r(Xt , ut )]

(2) (3)

s.t.

(YT +1 ) C .

As discussed earlier, such a formulation is useful when the decision-maker wants to maximize expected total reward (rst objective) over a nite horizon, while making sure that the conditional expectation of the total cost (the second objective) given that it lies in the -probability tail does not exceed a deterministic bound C . Proposition 1: If {t } is an i.i.d. noise process with strictly positive density , then a solution u of rMDP exists. Proof: Existence of an optimal solution of the above follows by standard compactnesscontinuity arguments as follows. T E[ t=0 r(Xt , ut )] is clearly a continuous functional of the law L of (Xt , Yt , ut ), t 0. Since YT +1 has a strictly

2665

positive density by our hypotheses on {t }, its distribution function F () is continuous and strictly increasing. Thus = F 1 (1 ). Now (YT +1 ) = = = E[YT +1 |YT +1 > (YT +1 )] E[YT +1 I{YT +1 > F 1 (1 )}] P (YT +1 > F 1 (1 )) 1 E[YT +1 I{YT +1 > F 1 (1 )}].

Then, (YT +1 ) = E[YT +1 |YT +1 > ]

=
t=0 zt T

c(zt )P (Zt = zt |YT +1 > ) c(zt )P (YT +1 > |Zt = zt )

t=0 zt

= =

P (Zt = zt ) P (YT +1 > |Z0 = z0 )

In view of the foregoing, this is a continuous functional of L. Since x0 is xed, (x, u) P (dx |x, u) is continuous, and X and U compact, it follows that the set of attainable L is compact. Thus the constraint set is also compact and hence the objective attains its maximum on it.

1 E[ c(Zt )Vt (Zt )]. V0 (z0 ) t=0

(5)

Note that V0 (z0 ) = P (YT +1 > |Zt = z0 ) = . Thus, the constraint in rMDP becomes III. P RELIMINARIES Consider a controlled Markov chain Zt = (Xt , Yt , vt ), t 0, with control process ut , t 0, where vt = ut1 . The combined evolution of this three-component controlled Markov chain is determined by the transition kernel p(dx |Xt , ut ), the evolution equation (1) for Yt , and the equation vt = ut1 . We will assume z0 = (x0 , 0, v0 ) to be deterministic. The combined transition kernel will be denoted by p(dz |z, u). We shall assume the cost function to be separable, i.e., c(x, u) = c1 (x) + c2 (u). We will set u1 = uT = u, u being an additional element we add to U with c2 () = 0. u This does not alter the denition of YT +1 , the terminal cost. For a given , dene the state-value-at-risk function as Vt (z) := P (YT +1 > |Zt = z). (4) 1 E[ c(zt )Vt (zt )] C . t=0 IV. A N O FFLINE I TERATIVE A LGORITHM FOR THE R MDP PROBLEM We now present a multiple time-scale iterative algorithm to solve the rMDP problem: Let {n }, {n } be strictly positive stepsizes satisfying the conditions n , n 0, n n = n n = . We specify non-obvious initial conditions. Other variables and functions are assumed to be initiated at the beginning of the algorithm. Remarks. (1) The innermost loop over t computes m V0n (x0 ) = P (YT +1 > ), J0 (x0 , 0) = minu T E[ t=0 r(Xt , ut )] + (C E[YT +1 |YT +1 > ]) over m all non-anticipative controls, for xed = n , = m . It m also computes Q (x0 , 0) = E[YT +1 |YT +1 > ] for xed = m though it should be possible to do this less often by moving it outside the t and n loops. (2) The middle loop over n adjusts till V0n (z0 ) = P (YT +1 > ) = . (3) The outer loop over m adjusts till E[YT +1 |YT +1 > ] C . (4) The and iterations can also be done concurrently if we use n = o(n ). Convergence analysis below then still holds using a two two-scale argument [10]. Convergence Analysis: All iterations in the innermost loop involve nitely many steps, so do not need any convergence analysis. In practice, the iterations for m {n , n 0}, {m , m 0} will also be stopped after nitely many steps according to some stopping rule, but a convergence analysis is required nevertheless to justify such a procedure. Unfortunately, it does not seem possible to establish convergence of {n } in general. (Note that we have suppressed the superscript for notational ease, taking advantage of the
T

Then, we can write a backward recursion equation for Vt (z) as Vt (z) = p(dz |z, v)Vt+1 (z ).

Denote c(zt ) = c(xt , vt ) = c1 (xt ) + c2 (vt ). Then, similarly, we can express CVaR in terms of the state-value function. Lemma 1: If V0 (z0 ) > 0, then we have that 1 (YT +1 ) = E[ c(Zt )Vt (Zt )]. V0 (z0 ) t=0 Proof: By denition, we have
T T

E[YT +1 |YT +1 > ] =

t=0 zt

P (Zt = zt |YT +1 > )c(xt , vt ),

where we have used the separability of the cost function.

2666

Algorithm 1 iRMDP For m = 1, 2, , till convergence: For n = 1, 2, , till convergence: For t = T, , 0: 1)

n,m Jt (x, y)

o.d.e. (6), but the differential inclusion (t) H((t)), where maxu r(x, u) + H() := cl(
>0 {P (YT +1

> ) : < }).

n,m p(dxt+1 |x, u)Jt+1 (x

, y + c(x, u) + s)(s)ds

with n,m m JT +1 (x, y) = m (C yI(y > n )/) 2) un,m (z) t arg maxu r(x, u) +

n,m p(dx |x, u)Jt+1 (x , y + c(x, u) + s)(s)ds n,m 3) Vtn,m (z) = p(dz |z, un,m (z))Vt+1 (z ) t with n,m m VT +1 (z) = I(y > n )

(See, e.g., [10], Chapter 4.) This, in fact is one of the solution concepts for o.d.e.s with discontinuous right hand side, due to Krasovskii [19]. In some cases, this can yield useful information. Convergence analysis for {m } is easier. Lemma 3: Let {m } be strictly positive step-sizes such that m 0 and m m = . Then, as m , we obtain m and Qm Q such that 0 0 (C Q (z0 )) = 0 and C Q (z0 ) 0, 0 0 Proof: Let z0 .

4) Qm (z) = c(z)Vtn,m (z)/ t n,m m p(dz |z, ut (z))Qt+1 (z ), with m Qm+1 (z) = yI(y > n )/ T End t m m n+1 = n n ( V0n,m (z0 )) End n m+1 = (m m (C Qm (z0 )))+ 0 End m

G() := max E[
t=0

r(Xt , ut )] + (C [YT +1 ]) ,

where the maximum is over all admissible L. Suppose the maximum is attained by L := the law of (Xt , Yt , u ), t 0. G is clearly convex and by the t Milgrom-Segal envelope theorem [23], C [YT +1 ] is a valid subgradient thereof. Thus, the -iteration is simply an instance of the classical subgradient descent that is known to converge to a global minimum of g, in this case the desired Lagrange multiplier. V. A N O NLINE L EARNING A LGORITHM FOR FINITE R MDP S We now consider state space X and control space U to be nite which will make Y to be nite as well. We present an online learning algorithm for this setting that nds the optimal control given a sequence of samples. A sample k here is k k (X k,T , Y k,T , uk,T ) where X k,T = (X0 , , XT ), i.e., the k,T entire state trajectory over the T horizon, Y is the entire cost trajectory and uk,T the control sequence that generates it. Besides being an online algorithm, the algorithm below differs from the iterative algorithm presented earlier, namely as it operates at multiple time-scales and thus, along with 2 2 k = o(ak ) and k = o(k ), we need k a2 , k k , k k k are all nite while k ak = k k = k k = . Convergence: The rst equation is the Q-learning version of the deterministic DP recursion. In order that all (z, u) pairs are sampled often enough, in practice one will use the above choice of uk with high probability 1 , and uniform t random u with probability . The convergence analysis for and iterations for the off-line scheme continues to apply to the online scheme as well. As for the Q-learning scheme, let k J k denote Jt , (x, y, u) written as a vector as all of t, x, y, u vary over their respective domains. Again, treat k , k as quasi-static by invoking the two time scale argument. Now the iteration for J k is a special case of the Q-learning iterations analyzed in [30] and its almost sure convergence to the correct Q-value J follows. Note, however, that our

fact that it represents a slower time scale and its effect therefore is quasi-static (see [10], section 6.1).) We shall consider a special case, viz., when the function h() := P (YT +1 > ) is Lipschitz continuous. The difculty is that in addition to the explicit dependence of this probability on , there is also the hidden dependence via the underlying control policy, which is harder to decipher. Nevertheless, we assume this condition. Lemma 2: Suppose the function h() = P (YT +1 > ) is Lipschitz-continuous, then for xed m, as n , V0n,m (z0 ) for all z0 . Proof: Observe that with Lipschitz-continuity, the fact n 0 implies that the iterates will have the same asymptotic behavior as that of the o.d.e., (t) = h((t)). (6)

For 0, h() > and for >> 0, h() < . Thus the trajectories of (6) remain bounded and for a scalar o.d.e., this implies convergence to an equilibrium, i.e., a point where h() = . In view of the aforementioned properties of h and its continuity, at least one such point exists. In fact, all such points that correspond to downward crossings of the level by h will be stable equilibria and the rest unstable, and the smallest and the largest equilibrium will be necessarily stable. In general, however, continuity of h cannot be guaranteed. Hence, the iterates track not the

2667

Algorithm 2 oRMDP For k=1,2, For t = T, T 1, , 0: k1 k k 1) Jt (x, y, u) = Jt (x, y, u) + ak I{Xt = x, Ytk = k k k y, uk = u} r(x, u) + maxu Jt+1 (Xt+1 , Yt+1 , u ) t
k1 Jt (x, y, u) , with k k JT +1 (x, y) = I(YT +1 = y)k (C yI(y > k )/) k k k 2) uk := vt+1 = arg max Jt (Xt , Ytk , ) t k k k 3) Vtk (z) = Vtk1 (z) + ak I{Zt = z} Vt+1 (Zt+1 )

future work, we will also try to derive a proof that does not need such an assumption. The assumption on separability of the cost function is also crucial. In the future, we shall also consider the general cost structure. We hope our most immediate contribution is to restimulate research in the community on the further development of the theory of constrained MDPs whereby we can handle more general constraints than the current framework can. ACKNOWLEDGEMENTS The problem was introduced to the second author by Pu Huang and Dharmashankar Subramanian of the Risk Group in the Mathematical Sciences division of the IBM Watson Research Center. The second author is very grateful to them, and to Alan J. King and Jayant Kalagnanam, also of IBM Watson Research, and to Erim Kardes of USC for many helpful discussions. R EFERENCES
[1] E. Altman and F. Spieksma, The Linear Program approach in Markov decision problems revisited, ZOR Methods and Models in Operations Research, 42(2):169-188, 1995. [2] E. Altman, Constrained Markov Decision Processes, Chapman & Hall, 1999. [3] P. Artzner, F. Delbaen, J-M. Eber and D. Heath, Thinking Coherently, Risk 10:68-71, 1997. [4] D. Bertsekas, Dynamic programming and optimal control, third edition, Athena Scientic, 2005. [5] F. J. Beutler and K. W. Ross, Optimal policies for controlled Markov chains with a constraint, J. Math. Anal. Appl., 112:236-252, 1985. [6] J. Birge and F. Louveaux, Introduction to Stochastic Programming, Springer-Verlag, 1997. [7] V. Borkar, A convex analytic approach to Markov decision processes, Prob, Th. Rel. Fields 78:583-602, 1988. [8] V.S. Borkar, Stochastic approximation with two time scales, Systems & Control Letters, 29:291-294, 1997. [9] V. Borkar, An actor-critic algorithm for constrained Markov decision processes, Systems & Control Letters, 54:207-213, 2005. [10] V. Borkar, Stochastic Approximations: A Dynamical Systems Viewpoint, Cambridge University Press, 2009. [11] D. Dentcheva and A. Ruszczynski, Portfolio optimization with stochastic dominance constraints, J. of Banking and Finance 30:433451, 2006. [12] D. Dentcheva and A, Ruszczynski, Stochastic dynamic optimization with discounted stochastic dominance constraints, SIAM Journal of Control and Optimization 7(5):2540-2556, 2008. [13] C. Derman, Finite State Markovian Decision Processes, Academic Press, New York and London, 1970. [14] C. Derman and M. Klein, Some remarks on nite horizon Markovian decision models, Operations Research 13:272-278, 1965. [15] E. A. Feinberg, Constrained semi-Markov decision processes with average rewards, ZOR Methods and Models in Operations Research 39:257-288, 1995. [16] E. Feinberg and A. Shwartz, Constrained discounted dynamic programming, Math. of Oper. Res. 21: 922-945, 1996. [17] M. Haviv, On constrained Markov decision processes, OR Letters 19(1):25-28, 1995. [18] A. Hordijk and L. C. M. Kallenberg, Linear programming and Markov decision chains, Management Science 25:353-362, 1979. [19] N.N. Krasovskii, Stability of motion, Stanford University Press, 1963. [20] J. Palmquist, S. Uryasev and P. Krokhmal, Portfolio Optimization with conditional value-at-risk objective and constraints, J. of Risk 4(2):11-27, 2002. [21] H. M. Markowitz, Portfolio selection, Journal of Finance, 7(1):7791, 1952. [22] D.J. Ma, A. M. Makowski and A. Shwartz, Stochastic approximations for nite state Markov chains, Stochastic Processes and Their Applications, 35:27-45, 1988.

Vtk1 (z) with k VT +1 (z) = I(YT +1 = y, y > k )

k 4) Qk (z) = Qk1 (z) + ak I{Zt = z} Vtk (z)(z) + c t t k Qk (Zt+1 ) Qk1 (z) with t t+1 k yVT +1 (z)/ End t k = k1 k ( V0k (z0 )) k = (k1 k (C Qk (z0 )))+ 0 End k

Qk +1 (z) T

exploratory randomization of the action choice will yield a near-optimal rather than optimal control. VI. D ISCUSSION AND F URTHER W ORK In this paper, we have introduced a new class of contrained Markov decision processes. The constraints are of conditional expectation of the terminal value of a total cost functional. The motivation comes from nance, in particular insurance, wherein an insurance company wants to maximize its revenue from premiums subject to a constraint on the conditional expectation of the claims the insurer might have to pay. Since our interest is in risk from catastrophic events such as oods, hurricanes, market crashes which have small probability but result in large claims when they happen. Thus, conditioning in the expectation is on the tail probability (i.e., on events that have the largest claims but together have less than say 5% total probability mass). The problem as formulated is of tremendous interest in nance and risk management. It is, however, not amenable to solutions techniques available either in stochastic programming, nor theory of constrained Markov decision processes. We, thus, give an iterative/stochastic approximation-based algorithm to solve it. We are currently in the process of acquiring relevant insurance data, and would test the algorithm on such data in further work. In future work, we will also extend the methodology to the innite-horizon case, both discounted as well as average-case. Our proof of convergence of the ofine algorithm also needs an additional assumption. In

2668

[23] P. Milgrom and I. Segal, Envelope theorems for arbitrary choice sets, Econometrica 70:583-601, 2002. [24] G. Pug, Some Remarks on the Value-at-Risk and the Conditional Value-at-Risk, in: S. Uryasev (Ed) Probabilistic Constrained Optimization: Methodology and Applications, Kluwer Academic Publishers, 2000. [25] A. Prekopa, Stochastic Programming, Kluwer, 1995. [26] R.T. Rockafellar and S. Uryasev, Optimization of conditional valueat-risk, J. of Risk, 2:21-41, 2000. [27] K.W. Ross, Randomized and past-dependent policies for Markov decision processes with multiple constraints, Operations Research 37:474-477, 1989. [28] K. Ross and R. Varadarajan, Multichain Markov decision processes with a sample path constraint: A decomposition approach, Math. Oper. Res., 16 (1991), pp. 195-207. [29] A. Ruszczynski and A. Shapiro, Optimization of risk measures, preprint, 2007. [30] J. N. Tsitsiklis, Asynchronous stochastic approximation and Qlearning, Machine Learning, 16(3):185-202, 1994. [31] P. Varaiya and R. J-B. Wets, Stochastic dynamic optimization approaches and computation in Mathematical Programming, Recent Developments and Applications, M. Iri & K. Tanabe (eds), pp. 309332, Kluwer Academic Publisher, 1989. [32] R. J.-B. Wets, Challenges in Stochastic Programming, working paper, IIASA, Austria, 1994.

2669

Capturing Upside Risk_ Finding and Managing Opportunities in -- David Hillson (Author) -- 1, 2019 -- Auerbach Publications -- 9781000691122 -- 338b33a4da8f9cd78f62457
No ratings yet
Capturing Upside Risk_ Finding and Managing Opportunities in -- David Hillson (Author) -- 1, 2019 -- Auerbach Publications -- 9781000691122 -- 338b33a4da8f9cd78f62457
301 pages
Complete Download Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, PDF All Chapters
100% (4)
Complete Download Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, PDF All Chapters
55 pages
LECTURE Notes On Design of Experiments
88% (8)
LECTURE Notes On Design of Experiments
31 pages
Adaline/Madaline:Applications
100% (1)
Adaline/Madaline:Applications
25 pages
A Signal Processing Perspective On Financial Engineering
No ratings yet
A Signal Processing Perspective On Financial Engineering
39 pages
Chapter 9 Cost of Capital: Mini Case: Situation
No ratings yet
Chapter 9 Cost of Capital: Mini Case: Situation
19 pages
Integer Linear Optimization Models
No ratings yet
Integer Linear Optimization Models
66 pages
Dynamic Bayesian Networks: Fundamentals and Applications
From Everand
Dynamic Bayesian Networks: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Bayesian Optimization PDF
No ratings yet
Bayesian Optimization PDF
22 pages
A Tutorial On Bayesian Optimization
No ratings yet
A Tutorial On Bayesian Optimization
22 pages
Markov Chain For Transition Probability
100% (1)
Markov Chain For Transition Probability
29 pages
Risk-Constrained FTR Bidding Strategy in Transmission Markets
No ratings yet
Risk-Constrained FTR Bidding Strategy in Transmission Markets
8 pages
CenturyLink Valuation Project
No ratings yet
CenturyLink Valuation Project
120 pages
(1961) Tukey - The Future of Data Analysis
No ratings yet
(1961) Tukey - The Future of Data Analysis
68 pages
Hadoop For Finance Essentials - Sample Chapter
No ratings yet
Hadoop For Finance Essentials - Sample Chapter
15 pages
R & Python
No ratings yet
R & Python
22 pages
General Idea of Iterative Models-Spiral Model
No ratings yet
General Idea of Iterative Models-Spiral Model
30 pages
A Leisurely Look at The Bootstrap, The Jackknife, and Cross-Validation (1983 13s) - BRADLEY EFRON
No ratings yet
A Leisurely Look at The Bootstrap, The Jackknife, and Cross-Validation (1983 13s) - BRADLEY EFRON
13 pages
Lecture 2: Markov Decision Processes: David Silver
No ratings yet
Lecture 2: Markov Decision Processes: David Silver
57 pages
Recommended Reading For Time Series Analysis
No ratings yet
Recommended Reading For Time Series Analysis
2 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
Bayesian Optimization
No ratings yet
Bayesian Optimization
15 pages
Entropy: Explainable AI: A Review of Machine Learning Interpretability Methods
No ratings yet
Entropy: Explainable AI: A Review of Machine Learning Interpretability Methods
45 pages
Falk M. A First Course On Time Series Analysis Examples With SAS (U. of Wurzburg, 2005) (214s) - GL
100% (1)
Falk M. A First Course On Time Series Analysis Examples With SAS (U. of Wurzburg, 2005) (214s) - GL
214 pages
Scaling AI and ML
No ratings yet
Scaling AI and ML
4 pages
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
Reinforcement Learning: A Survey: Leslie Pack Kaelbling Michael L. Littman Andrew W. Moore
No ratings yet
Reinforcement Learning: A Survey: Leslie Pack Kaelbling Michael L. Littman Andrew W. Moore
49 pages
Sequential Analysis Hypothesis Testing and Changepoint Detection ( Etc.) (Z-Library)
No ratings yet
Sequential Analysis Hypothesis Testing and Changepoint Detection ( Etc.) (Z-Library)
600 pages
Gurobi Optimization
No ratings yet
Gurobi Optimization
26 pages
Tutorial Bilevel Optimization Without Tears
No ratings yet
Tutorial Bilevel Optimization Without Tears
39 pages
Ant Colony Optimization Algorithms
No ratings yet
Ant Colony Optimization Algorithms
13 pages
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Markov Decision Processes
100% (1)
Markov Decision Processes
104 pages
Exercises of Stochastic Processes
From Everand
Exercises of Stochastic Processes
Simone Malacrida
No ratings yet
Machine Learning Methods in Finance
No ratings yet
Machine Learning Methods in Finance
64 pages
Numerical analysis Third Edition
From Everand
Numerical analysis Third Edition
Gerardus Blokdyk
No ratings yet
Understanding Random Forest
100% (1)
Understanding Random Forest
12 pages
Understanding Probability 3rd Edition. Henk Tijms. Cambridge University Press 2012.
0% (2)
Understanding Probability 3rd Edition. Henk Tijms. Cambridge University Press 2012.
6 pages
Financial Modeling Case Study (Enercon)
No ratings yet
Financial Modeling Case Study (Enercon)
2 pages
RealOptions Example
No ratings yet
RealOptions Example
33 pages
MML Book PDF
No ratings yet
MML Book PDF
416 pages
(Treading On Python 2) Matt Harrison - Treading On Python Volume 2 - Intermediate Python 2 (2013, Hairysun)
No ratings yet
(Treading On Python 2) Matt Harrison - Treading On Python Volume 2 - Intermediate Python 2 (2013, Hairysun)
144 pages
XL Wings
No ratings yet
XL Wings
214 pages
Linear Programming For The Premier League Fantasy Football
0% (1)
Linear Programming For The Premier League Fantasy Football
27 pages
SSRN Id3177534 PDF
No ratings yet
SSRN Id3177534 PDF
11 pages
Bayesian Stochastic Modelling in Python
No ratings yet
Bayesian Stochastic Modelling in Python
81 pages
(Skiena, 2017) - Book - The Data Science Design Manual - 3
No ratings yet
(Skiena, 2017) - Book - The Data Science Design Manual - 3
1 page
Advances in Object Oriented Data Modeling
100% (1)
Advances in Object Oriented Data Modeling
394 pages
A Review of Bayesian Machine Learning Principles, Methods, and Applications
No ratings yet
A Review of Bayesian Machine Learning Principles, Methods, and Applications
6 pages
Deep Learning For Edge Computing Applications A ST
No ratings yet
Deep Learning For Edge Computing Applications A ST
14 pages
Statistical Rethinking Sample
100% (2)
Statistical Rethinking Sample
80 pages
Lagrange Multipliers: Navigation Search
No ratings yet
Lagrange Multipliers: Navigation Search
25 pages
XAI Final
No ratings yet
XAI Final
18 pages
2012 CIO Event Scotland
No ratings yet
2012 CIO Event Scotland
42 pages
Google Cloud Dataproc The Ultimate Step-By-Step Guide
From Everand
Google Cloud Dataproc The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
Imbalanced Data: How To Handle Imbalanced Classification Problems
No ratings yet
Imbalanced Data: How To Handle Imbalanced Classification Problems
17 pages
M5 - Custom Model Building With SQL in BigQuery ML Slides
No ratings yet
M5 - Custom Model Building With SQL in BigQuery ML Slides
32 pages
Financial Transmission Rights Primer 13 Mar 2009
100% (1)
Financial Transmission Rights Primer 13 Mar 2009
27 pages
SimPy For First Time Users - SimPy v2.2 Documentation
No ratings yet
SimPy For First Time Users - SimPy v2.2 Documentation
15 pages
Top 10 Data Mining Algorithms
No ratings yet
Top 10 Data Mining Algorithms
65 pages
Railway Point-Operating Machine Fault Detection Using Unlabeled Signaling Sensor Data
No ratings yet
Railway Point-Operating Machine Fault Detection Using Unlabeled Signaling Sensor Data
13 pages
Chapter-5 - Leveling - PPB - Surveying - I
No ratings yet
Chapter-5 - Leveling - PPB - Surveying - I
4 pages
Class Ix Maths MCQS
No ratings yet
Class Ix Maths MCQS
36 pages
GE June 1993 - Flownet Diagrams - The Use of Finite Differences and A Spreadsheet To Determine Potential Heads PDF
No ratings yet
GE June 1993 - Flownet Diagrams - The Use of Finite Differences and A Spreadsheet To Determine Potential Heads PDF
7 pages
Unit 5 - Sketching Graphs - 8 Apr
No ratings yet
Unit 5 - Sketching Graphs - 8 Apr
5 pages
Nonlocal Theories For Bending, Buckling and Vibration of Beams - JN Reddy
No ratings yet
Nonlocal Theories For Bending, Buckling and Vibration of Beams - JN Reddy
21 pages
Shear and Moment in Beams
0% (1)
Shear and Moment in Beams
18 pages
Creating Variance Cube From Seismic Data and Converting To Transmissibility Multiplier Pattern To Assist History Match
No ratings yet
Creating Variance Cube From Seismic Data and Converting To Transmissibility Multiplier Pattern To Assist History Match
4 pages
DS3 Lateral Analysis
No ratings yet
DS3 Lateral Analysis
10 pages
CRJ 511 HW2
No ratings yet
CRJ 511 HW2
2 pages
LudicSavant and AureusFulgen's DPR Calculator (v2
No ratings yet
LudicSavant and AureusFulgen's DPR Calculator (v2
14 pages
Light
No ratings yet
Light
4 pages
Robert Hermann (Mathematician) - Wikipedia: Biography
No ratings yet
Robert Hermann (Mathematician) - Wikipedia: Biography
3 pages
College Algebra and Calculus An Applied Approach 1st Edition Ron Larson download pdf
100% (13)
College Algebra and Calculus An Applied Approach 1st Edition Ron Larson download pdf
85 pages
Brown (2021) - An Introduction To Linear Mixed-Effects Modeling in R
No ratings yet
Brown (2021) - An Introduction To Linear Mixed-Effects Modeling in R
19 pages
Mathematics Magazine Vol. 75, No. 3, June 2002
No ratings yet
Mathematics Magazine Vol. 75, No. 3, June 2002
84 pages
Mth631 Highllight Handouts
No ratings yet
Mth631 Highllight Handouts
168 pages
A Modified Grey Forecasting Model For Long-Term Prediction
No ratings yet
A Modified Grey Forecasting Model For Long-Term Prediction
8 pages
Space Filling Curves Versus Random Walks
No ratings yet
Space Filling Curves Versus Random Walks
13 pages
Virtusa Som ML Resume
No ratings yet
Virtusa Som ML Resume
3 pages
Find The NTH Term Formula For Each Sequence
No ratings yet
Find The NTH Term Formula For Each Sequence
2 pages
Distance Time Graph Question
No ratings yet
Distance Time Graph Question
12 pages
BOW Budgeted Lesson G9 Math SY 2023 2024 2nd QTR Final
No ratings yet
BOW Budgeted Lesson G9 Math SY 2023 2024 2nd QTR Final
1 page
Fault Models of Inverter-Interfaced Distributed Generators Experimental Verification and Application To Fault Analysis
No ratings yet
Fault Models of Inverter-Interfaced Distributed Generators Experimental Verification and Application To Fault Analysis
8 pages
Reinforcement Learning for Optimal Feedback Control Rushikesh Kamalapurkar instant download
100% (4)
Reinforcement Learning for Optimal Feedback Control Rushikesh Kamalapurkar instant download
58 pages
A Survey of Vectorization Methods in Topological Data Analysis
No ratings yet
A Survey of Vectorization Methods in Topological Data Analysis
14 pages
Ideal Low Pass Filter: HT Hfe DF
No ratings yet
Ideal Low Pass Filter: HT Hfe DF
4 pages
Assignment-3 - Abdul Haleem
No ratings yet
Assignment-3 - Abdul Haleem
4 pages

Risk-Constrained Markov Decision Processes

Uploaded by

Risk-Constrained Markov Decision Processes

Uploaded by

49th IEEE Conference on Decision and Control December 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA

Risk-constrained Markov Decision Processes

978-1-4244-7746-3/10/$26.00 2010 IEEE

Then, (YT +1 ) = E[YT +1 |YT +1 > ]

c(zt )P (Zt = zt |YT +1 > ) c(zt )P (YT +1 > |Zt = zt )

P (Zt = zt ) P (YT +1 > |Z0 = z0 )

1 E[ c(Zt )Vt (Zt )]. V0 (z0 ) t=0

E[YT +1 |YT +1 > ] =

P (Zt = zt |YT +1 > )c(xt , vt ),

where we have used the separability of the cost function.

Algorithm 1 iRMDP For m = 1, 2, , till convergence: For n = 1, 2, , till convergence: For t = T, , 0: 1)

> ) : < }).

n,m p(dxt+1 |x, u)Jt+1 (x

Vtk1 (z) with k VT +1 (z) = I(YT +1 = y, y > k )

You might also like