期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Algorithms for aggregated limiting average Markov decision problems

Mohammed Abbad Cherki Daoui 《Mathematical Methods of Operations Research》2001,53(3):451-463

相似文献

2.

Semi-infinite discounted Markov decision processes: Policy improvement and singular perturbations

Mohammed Abbad Khalid Rahhali 《Mathematical Methods of Operations Research》2001,54(2):279-290

相似文献

3.

Optimal balanced control for call centers

Sandjai Bhulai Taoying Farenhorst-Yuan Bernd Heidergott Dinard van?der Laan 《Annals of Operations Research》2012,201(1):39-62

In this paper we study the optimal assignment of tasks to agents in a call center. For this type of problem, typically, no single deterministic and stationary (i.e., state independent and easily implementable) policy yields the optimal control, and mixed strategies are used. Other than finding the optimal mixed strategy, we propose to optimize the performance over the set of ??balanced?? deterministic periodic non-stationary policies. We provide a stochastic approximation algorithm that allows to find the optimal balanced policy by means of Monte Carlo simulation. As illustrated by numerical examples, the optimal balanced policy outperforms the optimal mixed strategy. 相似文献

4.

Blackwell optimality in the class of stationary policies in Markov decision chains with a Borel state space and unbounded rewards

Arie Hordijk Alexander A. Yushkevich 《Mathematical Methods of Operations Research》1999,49(1):1-39

This paper is the first part of a study of Blackwell optimal policies in Markov decision chains with a Borel state space and unbounded rewards. We prove here the existence of deterministic stationary policies which are Blackwell optimal in the class of all, in general randomized, stationary policies. We establish also a lexicographical policy improvement algorithm leading to Blackwell optimal policies and the relation between such policies and the Blackwell optimality equation. Our technique is a combination of the weighted norms approach developed in Dekker and Hordijk (1988) for countable models with unbounded rewards and of the weak-strong topology approach used in Yushkevich (1997a) for Borel models with bounded rewards. 相似文献

5.

A counterexample on overtaking optimality

Andrzej S. Nowak Oscar Vega-Amaya 《Mathematical Methods of Operations Research》1999,49(3):435-439

相似文献

6.

Existence of optimal stationary policies in deterministic optimal control

Dimitri P Bertsekas Steven E Shreve 《Journal of Mathematical Analysis and Applications》1979,69(2):607-620

This paper considers deterministic discrete-time optimal control problems over an infinite horizon involving a stationary system and a nonpositive cost per stage. Various results are provided relating to existence of an ?-optimal stationary policy, and existence of an optimal stationary policy assuming an optimal policy exists. 相似文献

7.

MARKOVIAN DECISION PROGRAMMING WITH RECURSIVE VECTOR-REWARD

刘建庸刘克《应用数学学报(英文版)》1990,6(2):158-165

In this paper, we discuss Markovian decision programming with recursive vector-reward andgive an algorithm to find optimal policies. We prove that: (1) There is a Markovian optimal policy for the nonstationary case; (2) Thereis a stationary optimal policy for the stationary case. 相似文献

8.

Linear programming formulation of MDPs in countable state space: The multichain case

Arie Hordijk Jean B. Lasserre 《Mathematical Methods of Operations Research》1994,40(1):91-108

We present an Linear Programming formulation of MDPs with countable state and action spaces and no unichain assumption. This is an extension of the Hordijk and Kallenberg (1979) formulation in finite state and action spaces. We provide sufficient conditions for both existence of optimal solutions to the primal LP program and absence of duality gap. Then, existence of a (possibly randomized) average optimal policy is also guaranteed. Existence of a stationary average optimal deterministic policy is also investigated. 相似文献

9.

Optimal models with maximizing probability of first achieving target value in the preceding stages

林元烈伍从斌康波大《中国科学A辑(英文版)》2003,46(3):396-414

Decision makers often face the need of performance guarantee with some sufficiently high probability. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probability criterion for the first achieving target value. The objective is to find a policy that maximizes the probability of the total discounted reward exceeding a target value in the preceding stages. We show that our formulation cannot be described by former models with standard criteria. We provide the properties of the objective functions, optimal value functions and optimal policies. An algorithm for computing the optimal policies for the finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deterministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis, we approximate general models and prove the existence of e-optimal policy for finite state space. We give an example for the reliability of the satellite sy 相似文献

10.

马氏决策规划的加速逼近算法与最小方差问题

董泽清《数学学报》1978,21(2):135-150

我们涉及的折扣马氏决策规划(有些著者称为马氏决策过程),具有状态空问与每个状态可用的决策集均为可数无穷集、次随机转移律族、有界报酬函数.给出了一个求(ε_)最优平稳策略的加速收敛逐次逼近算法,比White的逐次逼近算法更快地收敛于(ε_)最优解,并配合有非最优策略的检验准则,使算法更加得益. 设β为折扣因子,一般说β(或(ε,β))_最优平稳策略,往往是非唯一的,甚至与平稳策略类包含的策略数一样多.我们自然希望在诸β(或(ε,β))_最优平稳策略中寻求方差齐次地(关于初始状态)达(ε_)最小的策略.我们证明了这种策略确实存在,并给出了获得这种策略的算法. 相似文献

11.

Policy iteration for robust nonstationary Markov decision processes

Saumya Sinha Archis Ghate 《Optimization Letters》2016,10(8):1613-1628

Policy iteration is a well-studied algorithm for solving stationary Markov decision processes (MDPs). It has also been extended to robust stationary MDPs. For robust nonstationary MDPs, however, an “as is” execution of this algorithm is not possible because it would call for an infinite amount of computation in each iteration. We therefore present a policy iteration algorithm for robust nonstationary MDPs, which performs finitely implementable approximate variants of policy evaluation and policy improvement in each iteration. We prove that the sequence of cost-to-go functions produced by this algorithm monotonically converges pointwise to the optimal cost-to-go function; the policies generated converge subsequentially to an optimal policy. 相似文献

12.

Sensitivity analysis and optimal ultimately stationary deterministic policies in some constrained discounted cost models

Krishnamurthy Iyer Nandyala Hemachandra 《Mathematical Methods of Operations Research》2010,71(3):401-425

We consider a discrete time Markov Decision Process (MDP) under the discounted payoff criterion in the presence of additional discounted cost constraints. We study the sensitivity of optimal Stationary Randomized (SR) policies in this setting with respect to the upper bound on the discounted cost constraint functionals. We show that such sensitivity analysis leads to an improved version of the Feinberg–Shwartz algorithm (Math Oper Res 21(4):922–945, 1996) for finding optimal policies that are ultimately stationary and deterministic. 相似文献

13.

Optimal intervention policies for a multidimensional simple epidemic process

E.G. Kyriakidis A. Pavitsos 《Mathematical and Computer Modelling》2009,50(9-10):1318-1324

We consider a deterministic simple epidemic process in which the susceptibles are exposed to n+1 diseases. It is assumed that one disease is relatively harmless while the others cause serious symptoms. Policies for introducing infection by the harmless disease are considered and, under a suitable cost structure, the optimal policy that minimises the future cost for every initial state is found. For the corresponding stochastic model, the optimal policy is found by implementing a suitable dynamic programming algorithm, and is compared numerically with the optimal policy for the deterministic model. 相似文献

14.

On Markovian decision programming with recursive reward functions

Jianyong Liu Ke Liu 《Annals of Operations Research》1990,24(1):145-164

In this paper, the infinite horizon Markovian decision programming with recursive reward functions is discussed. We show that Bellman's optimal principle is applicable for our model. Then, a sufficient and necessary condition for a policy to be optimal is given. For the stationary case, an iteration algorithm for finding a stationary optimal policy is designed. The algorithm is a generalization of Howard's [7] and Iwamoto's [3] algorithms.This research was supported by the National Natural Science Foundation of China. 相似文献

15.

Blackwell Optimality in the Class of Markov Policies for Continuous-Time Controlled Markov Chains

Tomás Prieto-Rumeau 《Acta Appl Math》2006,92(1):77-96

This paper deals with Blackwell optimality for continuous-time controlled Markov chains with compact Borel action space, and possibly unbounded reward (or cost) rates and unbounded transition rates. We prove the existence of a deterministic stationary policy which is Blackwell optimal in the class of all admissible (nonstationary) Markov policies, thus extending previous results that analyzed Blackwell optimality in the class of stationary policies. We compare our assumptions to the corresponding ones for discrete-time Markov controlled processes. 相似文献

16.

Continuous time Markovian decision processes average return criterion

Prasadarao Kakumanu 《Journal of Mathematical Analysis and Applications》1975,52(1):173-188

Continuous time Markovian decision models with countable state space are investigated. The existence of an optimal stationary policy is established for the expected average return criterion function. It is shown that the expected average return can be expressed as an expected discounted return of a related Markovian decision process. A policy iteration method is given which converges to an optimal deterministic policy, the policy so obtained is shown optimal over all Markov policies. 相似文献

17.

Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards

Arie Hordijk Alexander A. Yushkevich 《Mathematical Methods of Operations Research》1999,50(3):421-448

相似文献

18.

Dynamic admission and service rate control of a queue

Kranthi Mitra Adusumilli John J. Hasenbein 《Queueing Systems》2010,66(2):131-154

This paper investigates a queueing system in which the controller can perform admission and service rate control. In particular, we examine a single-server queueing system with Poisson arrivals and exponentially distributed services with adjustable rates. At each decision epoch the controller may adjust the service rate. Also, the controller can reject incoming customers as they arrive. The objective is to minimize long-run average costs which include: a holding cost, which is a non-decreasing function of the number of jobs in the system; a service rate cost c(x), representing the cost per unit time for servicing jobs at rate x; and a rejection cost κ for rejecting a single job. From basic principles, we derive a simple, efficient algorithm for computing the optimal policy. Our algorithm also provides an easily computable bound on the optimality gap at every step. Finally, we demonstrate that, in the class of stationary policies, deterministic stationary policies are optimal for this problem. 相似文献

19.

Computing a Stationary Base-Stock Policy for a Finite Horizon Stochastic Inventory Problem with Non-linear Shortage Costs

《随机分析与应用》2013,31(3):589-625

Abstract

We consider a periodic-review stochastic inventory problem in which demands for a single product in each of a finite number of periods are independent and identically distributed random variables. We analyze the case where shortages (stockouts) are penalized via fixed and proportional costs simultaneously. For this problem, due to the finiteness of the planning horizon and non-linearity of the shortage costs, computing the optimal inventory policy requires a substantial effort as noted in the previous literature. Hence, our paper is aimed at reducing this computational burden. As a resolution, we propose to compute “the best stationary policy.” To this end, we restrict our attention to the class of stationary base-stock policies, and show that the multi-period, stochastic, dynamic problem at hand can be reduced to a deterministic, static equivalent. Using this important result, we introduce a model for computing an optimal stationary base-stock policy for the finite horizon problem under consideration. Fundamental analytic conclusions, some numerical examples, and related research findings are also discussed. 相似文献

20.

Weighted discounted Markov decision processes with perturbation

刘克《应用数学学报(英文版)》1999,15(2):183-189

1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe... 相似文献