期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Sensitivity of constrained Markov decision processes

Eitan Altman Adam Shwartz 《Annals of Operations Research》1991,32(1):1-22

We consider the optimization of finite-state, finite-action Markov decision processes under constraints. Costs and constraints are of the discounted or average type, and possibly finite-horizon. We investigate the sensitivity of the optimal cost and optimal policy to changes in various parameters. We relate several optimization problems to a generic linear program, through which we investigate sensitivity issues. We establish conditions for the continuity of the optimal value in the discount factor. In particular, the optimal value and optimal policy for the expected average cost are obtained as limits of the dicounted case, as the discount factor goes to one. This generalizes a well-known result for the unconstrained case. We also establish the continuity in the discount factor for certain non-stationary policies. We then discuss the sensitivity of optimal policies and optimal values to small changes in the transition matrix and in the instantaneous cost functions. The importance of the last two results is related to the performance of adaptive policies for constrained MDP under various cost criteria [3,5]. Finally, we establish the convergence of the optimal value for the discounted constrained finite horizon problem to the optimal value of the corresponding infinite horizon problem. 相似文献

2.

Constrained Markov decision processes with total cost criteria: Occupation measures and primal LP

Eitan Altman 《Mathematical Methods of Operations Research》1996,43(1):45-72

This paper is the third in a series on constrained Markov decision processes (CMDPs) with a countable state space and unbounded cost. In the previous papers we studied the expected average and the discounted cost. We analyze in this paper the total cost criterion. We study the properties of the set of occupation measures achieved by different classes of policies; we then focus on stationary policies and on mixed deterministic policies and present conditions under which optimal policies exist within these classes. We conclude by introducing an equivalent infinite Linear Program. 相似文献

3.

Convergence of the optimal values of constrained Markov control processes

Jorge Alvarez-Mena Onésimo Hernández-Lerma 《Mathematical Methods of Operations Research》2002,55(3):461-484

相似文献

4.

Risk averse asymptotics in a Black–Scholes market on a finite time horizon

Peter Grandits Stefan Thonhauser 《Mathematical Methods of Operations Research》2011,74(1):21-40

We consider the optimal investment and consumption problem in a Black–Scholes market, if the target functional is given by expected discounted utility of consumption plus expected discounted utility of terminal wealth. We investigate the behaviour of the optimal strategies, if the relative risk aversion tends to infinity. It turns out that the limiting strategies are: do not invest at all in the stock market and keep the rate of consumption constant! 相似文献

5.

Mean and variance optimization of non–linear systems and worst–case analysis

P. Parpas B. Rustem V. Wieland S. Žaković 《Computational Optimization and Applications》2009,43(2):235-259

In this paper, we consider expected value, variance and worst–case optimization of nonlinear models. We present algorithms for computing optimal expected value, and variance policies, based on iterative Taylor expansions. We establish convergence and consider the relative merits of policies based on expected value optimization and worst–case robustness. The latter is a minimax strategy and ensures optimal cover in view of the worst–case scenario(s) while the former is optimal expected performance in a stochastic setting. Both approaches are used with a small macroeconomic model to illustrate relative performance, robustness and trade-offs between the alternative policies. 相似文献

6.

Markov decision processes with exponentially representable discounting

Yair Carmon 《Operations Research Letters》2009,37(1):51-55

We generalize the geometric discount of finite discounted cost Markov Decision Processes to “exponentially representable”discount functions, prove existence of optimal policies which are stationary from some time N onward, and provide an algorithm for their computation. Outside this class, optimal “N-stationary” policies in general do not exist. 相似文献

7.

On the power and limitations of affine policies in two-stage adaptive optimization

Dimitris Bertsimas Vineet Goyal 《Mathematical Programming》2012,134(2):491-531

We consider a two-stage adaptive linear optimization problem under right hand side uncertainty with a min–max objective and give a sharp characterization of the power and limitations of affine policies (where the second stage solution is an affine function of the right hand side uncertainty). In particular, we show that the worst-case cost of an optimal affine policy can be times the worst-case cost of an optimal fully-adaptable solution for any δ > 0, where m is the number of linear constraints. We also show that the worst-case cost of the best affine policy is times the optimal cost when the first-stage constraint matrix has non-negative coefficients. Moreover, if there are only k ≤ m uncertain parameters, we generalize the performance bound for affine policies to , which is particularly useful if only a few parameters are uncertain. We also provide an -approximation algorithm for the general case without any restriction on the constraint matrix but the solution is not an affine function of the uncertain parameters. We also give a tight characterization of the conditions under which an affine policy is optimal for the above model. In particular, we show that if the uncertainty set, is a simplex, then an affine policy is optimal. However, an affine policy is suboptimal even if is a convex combination of only (m + 3) extreme points (only two more extreme points than a simplex) and the worst-case cost of an optimal affine policy can be a factor (2 − δ) worse than the worst-case cost of an optimal fully-adaptable solution for any δ > 0. 相似文献

8.

Optimal replacement policies with minimal repair and age-dependent costs

《European Journal of Operational Research》1997,98(1):75-84

In this paper, we study a modified minimal repair/replacement problem that is formulated as a Markov decision process. The operating cost is assumed to be a nondecreasing function of the system's age. The specific maintenance actions for a manufacturing system to be considered are whether to have replacement, minimal repair or keep it operating. It is shown that a control limit policy, or in particular a (t, T) policy, is optimal over the space of all possible policies under the discounted cost criterion. A computational algorithm for the optimal (t, T) policy is suggested based on the total expected discounted cost. 相似文献

9.

Optimal control in light traffic Markov decision processes

Ger Koole Olaf Passchier 《Mathematical Methods of Operations Research》1997,45(1):63-79

We consider Markov Decision Processes under light traffic conditions. We develop an algorithm to obtain asymptotically optimal policies for both the total discounted and the average cost criterion. This gives a general framework for several light traffic results in the literature. We illustrate the method by deriving the asymptotically optimal control of a simple ATM network. 相似文献

10.

约束折扣半马氏决策规划

胡光华张升《应用数学学报》1997,20(2):187-195

本文研究约束折扣半马氏决策规划问题，即在一折扣期望费用约束下，使折扣期望报酬达最大的约束最优问题，假设状态集可数，行动集为紧的非空Ｂｏｒｅｌ集，本文给出了ｐ－约束最优策略的充要条件，证明了在适当的假设条件下必存在ｐ－约束最优策略。相似文献

11.

A policy-improvement type algorithm for solving zero-sum two-person stochastic games of perfect information

T.E.S. Raghavan Zamir Syed 《Mathematical Programming》2003,95(3):513-532

We give a policy-improvement type algorithm to locate an optimal pure stationary strategy for discounted stochastic games with perfect information. A graph theoretic motivation for our algorithm is presented as well. Received: January 1998 / Accepted: May 2002 Published online: February 14, 2003 Key words. stochastic games – MDP – perfect information – policy iteration Partially Funded by NSF Grant DMS 930-1052 and DMS 970-4951 相似文献

12.

Asymptotic properties of constrained Markov Decision Processes

Eitan Altman 《Mathematical Methods of Operations Research》1993,37(2):151-170

We present in this paper several asymptotic properties of constrained Markov Decision Processes (MDPs) with a countable state space. We treat both the discounted and the expected average cost, with unbounded cost. We are interested in (1) the convergence of finite horizon MDPs to the infinite horizon MDP, (2) convergence of MDPs with a truncated state space to the problem with infinite state space, (3) convergence of MDPs as the discount factor goes to a limit. In all these cases we establish the convergence of optimal values and policies. Moreover, based on the optimal policy for the limiting problem, we construct policies which are almost optimal for the other (approximating) problems. Based on the convergence of MDPs with a truncated state space to the problem with infinite state space, we show that an optimal stationary policy exists such that the number of randomisations it uses is less or equal to the number of constraints plus one. We finally apply the results to a dynamic scheduling problem.This work was partially supported by the Chateaubriand fellowship from the French embassy in Israel and by the European Grant BRA-QMIPS of CEC DG XIII 相似文献

13.

A comparison of average and discounted cost models for the dynamic lot size inventory problem

Suresh Chand Kwei Tang 《European Journal of Operational Research》1985,22(1):9-18

It is a common practice in the inventory literature to use average cost models as approximations to the theoretically correct discounted cost models. An average cost model minimizes the average undiscounted cost per period, while a discounted cost model minimizes the total discounted cost over the problem horizon. This paper attempts to answer an important question: How good are the results (the total discounted costs) for the average cost models compared to those for the discounted cost models? This question has been conclusively answered for the simplest inventory model where the demand rate and other parameters are assumed to remain constant in time. This paper addresses this issue for the first time for the case where demand rates are allowed to be nonstationary in time.A discounted cost model has been developed in the paper to carry out this comparison. It is shown that a simple dynamic programming algorithm can be used to find optimal order policies for the discounted cost model.The effect of the varying interest rates and other parameters on the relative performance of the average cost model has been studied by developing an insightful analysis and also by doing a computational study. The results show that, while the average cost model can cost as much as about 26% more than the discounted cost model in extreme cases, this increase is not significant for the parameter values in the range of the common interest. 相似文献

14.

Nonstationary Continuous-Time Markov Control Processes with Discounted Costs on Infinite Horizon

Onésimo Hernández-Lerma T. E. Govindan 《Acta Appl Math》2001,67(3):277-293

This paper concerns nonstationary continuous-time Markov control processes on Polish spaces, with the infinite-horizon discounted cost criterion. Necessary and sufficient conditions are given for a control policy to be optimal and asymptotically optimal. In addition, under suitable hypotheses, it is shown that the successive approximation procedure converges in the sense that the sequence of finite-horizon optimal cost functions and the corresponding optimal control policies both converge. 相似文献

15.

Optimal arrival rate and service rate control of multi-server queues

Nelson Lee Vidyadhar G. Kulkarni 《Queueing Systems》2014,76(1):37-50

We consider the problem of optimal control of a multi-server queue with controllable arrival and service rates. This study is motivated by its potential application to the design and control of data centers. The cost structure includes customer holding cost which is a non-decreasing convex function of the number of customers in the system, server operating cost which is a non-decreasing convex function of the chosen service rate, and system operating reward which is a non-decreasing concave function of the chosen arrival rate. We formulate the problem as a continuous-time Markov decision process and derive structural properties of the optimal control policies under both discounted cost and average cost criterions. 相似文献

16.

Lagrangian heuristics for scheduling new product development projects in the pharmaceutical industry

Vishal A. Varma Reha Uzsoy Joseph Pekny Gary Blau 《Journal of Heuristics》2007,13(5):403-433

To stay ahead of their competition, pharmaceutical firms must make effective use of their new product development (NPD) capabilities by efficiently allocating its analytical, clinical testing and manufacturing resources across various drug development projects. The resulting project scheduling problems involve coordinating hundreds of testing and manufacturing activities over a period of several quarters. Most conventional integer programming approaches are computationally impractical for problems of this size, while priority rule-driven heuristics seldom provide consistent solution quality. We propose a Lagrangian decomposition (LD) heuristic that exploits the special structure of these problems. Some resources (typically manpower) are shared across all on-going projects while others (typically equipment) are specific to individual project categories. Our objective function is a weighted discounted cost expressed in terms of activity completion times. The LD heuristics were subjected to a comprehensive experimental study based on typical operational instances. While the conventional “Reward–Risk” priority rule heuristic generates duality gaps between 47–58%, the best LD heuristic achieves duality gaps between 10–20%. The LD heuristics also yield makespan reductions of over 30% over the Reward–Risk priority rule. 相似文献

17.

Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach

José Niño-Mora 《Mathematical Programming》2002,93(3):361-413

This paper develops a polyhedral approach to the design, analysis, and computation of dynamic allocation indices for scheduling binary-action (engage/rest) Markovian stochastic projects which can change state when rested (restless bandits (RBs)), based on partial conservation laws (PCLs). This extends previous work by the author [J. Ni?o-Mora (2001): Restless bandits, partial conservation laws and indexability. Adv. Appl. Probab. 33, 76–98], where PCLs were shown to imply the optimality of index policies with a postulated structure in stochastic scheduling problems, under admissible linear objectives, and they were deployed to obtain simple sufficient conditions for the existence of Whittle's (1988) RB index (indexability), along with an adaptive-greedy index algorithm. The new contributions include: (i) we develop the polyhedral foundation of the PCL framework, based on the structural and algorithmic properties of a new polytope associated with an accessible set system -extended polymatroid}); (ii) we present new dynamic allocation indices for RBs, motivated by an admission control model, which extend Whittle's and have a significantly increased scope; (iii) we deploy PCLs to obtain both sufficient conditions for the existence of the new indices (PCL-indexability), and a new adaptive-greedy index algorithm; (iv) we interpret PCL-indexability as a form of the classic economics law of diminishing marginal returns, and characterize the index as an optimal marginal cost rate; we further solve a related optimal constrained control problem; (v) we carry out a PCL-indexability analysis of the motivating admission control model, under time-discounted and long-run average criteria; this gives, under mild conditions, a new index characterization of optimal threshold policies; and (vi) we apply the latter to present new heuristic index policies for two hard queueing control problems: admission control and routing to parallel queues; and scheduling a multiclass make-to-stock queue with lost sales, both under state-dependent holding cost rates and birth-death dynamics. Received: April 2000 / Accepted: October 2002 Published online: December 9, 2002 RID="★" ID="★" Work partly supported by the Spanish Ministry of Science and Technology (grant BEC2000-1027), NATO (Collaborative Linkage Grant PST.CLG.976568), and the Joint Spanish-US (Fulbright) Commission for Scientific and Technical Exchange (project 2000-20132) Key words. Markov decision process – restless bandits – polyhedral combinatorics – extended polymatroid – adaptive-greedy algorithm – dynamic allocation index – stochastic scheduling – threshold policy – index policy – Gittins index – Klimov index – Whittle index – control of queues – admission control – routing – make-to-stock – multiclass queue – finite buffers – conservation laws – achievable performance Mathematics Subject Classification (1991): (AMS 2000 Subject Classification): 90B36, 90B22, 90C40, 90C57, 90C08 相似文献

18.

An Inventory Model for Deteriorating Items with?Two?Levels of Trade Credit Taking Account of Time Discounting

Jui-Jung Liao Kuo-Nan Huang 《Acta Appl Math》2010,110(1):313-326

This paper presents an extended inventory model of Huang (J. Oper. Res. Soc. 54, 1011–1015, 2003), which investigated the retailer’s optimal inventory policy under two levels of trade credit. Herein, we consider the impact of a replenishment policy on the timing of the cash flows associated with payments to suppliers and revenue streams from customers. That is, the same cash amount will possess different money value at different future time. To see this, we adopt the more appropriate net present value (NPV) object instead of the average cost objective. In addition, the deteriorating effects will be incorporated in this inventory model, and the time to deterioration of each item follows an exponential distribution. The discounted cash flow (DCF) approach is used to derive the optimal solution in this study. Furthermore, we first show that the optimal solution not only exists bus also is unique. Then, we provide a theorem to locate the optimal ordering policy. Finally, a numerical example for illustration is provided. 相似文献

19.

A multi-item continuous review inventory system with compound Poisson demands

K. Ohno T. Ishigaki 《Mathematical Methods of Operations Research》2001,53(1):147-165

相似文献

20.

Optimal server assignment in a two-stage tandem queueing system

《Operations Research Letters》2020,48(1):71-77

We study Markovian queueing systems consisting of two stations in tandem. There is a dedicated server in each station and an additional server that can be assigned to any station. Assuming that linear holding costs are incurred by jobs in the system and two servers can collaborate to work on the same job, we determine structural properties of optimal server assignment policies under the discounted and the average cost criteria. 相似文献