期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Average optimality inequality for continuous-time Markov decision processes in Polish spaces

Quanxin Zhu 《Mathematical Methods of Operations Research》2007,66(2):299-313

In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957). 相似文献

2.

Average optimality for continuous-time Markov decision processes with a policy iteration approach

Quanxin Zhu 《Journal of Mathematical Analysis and Applications》2008,339(1):691-704

This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature. 相似文献

3.

Another set of conditions for Markov decision processes with average sample-path costs

Quanxin Zhu Xianping Guo 《Journal of Mathematical Analysis and Applications》2006,322(2):1199-1214

This paper deals with discrete-time Markov decision processes with average sample-path costs (ASPC) in Borel spaces. The costs may have neither upper nor lower bounds. We propose new conditions for the existence of ε-ASPC-optimal (deterministic) stationary policies in the class of all randomized history-dependent policies. Our conditions are weaker than those in the previous literature. Moreover, some sufficient conditions for the existence of ASPC optimal stationary policies are imposed on the primitive data of the model. In particular, the stochastic monotonicity condition in this paper has first been used to study the ASPC criterion. Also, the approach provided here is slightly different from the “optimality equation approach” widely used in the previous literature. On the other hand, under mild assumptions we show that average expected cost optimality and ASPC-optimality are equivalent. Finally, we use a controlled queueing system to illustrate our results. 相似文献

4.

Unbounded cost Markov decision processes with limsup and liminf average criteria: new conditions

Quanxin Zhu Xianping Guo Yonglong Dai 《Mathematical Methods of Operations Research》2005,61(3):469-482

相似文献

5.

Markov decision processes with state-dependent discount factors and unbounded rewards/costs

Qingda Wei Xianping Guo 《Operations Research Letters》2011,39(5):369-374

This paper deals with discrete-time Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Under general conditions, we develop an iteration algorithm for computing the optimal value function, and also prove the existence of optimal stationary policies. Furthermore, we illustrate our results with a cash-balance model. 相似文献

6.

Constrained continuous-time Markov decision processes with average criteria

Lanlan Zhang Xianping Guo 《Mathematical Methods of Operations Research》2008,67(2):323-340

In this paper, we study constrained continuous-time Markov decision processes with a denumerable state space and unbounded reward/cost and transition rates. The criterion to be maximized is the expected average reward, and a constraint is imposed on an expected average cost. We give suitable conditions that ensure the existence of a constrained-optimal policy. Moreover, we show that the constrained-optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our conditions. Supported by NSFC, NCET and RFDP. 相似文献

7.

Markov Decision Processes with Variance Minimization: A New Condition and Approach

Quanxin Zhu Xianping Guo 《随机分析与应用》2013,31(3):577-592

Abstract

This article deals with the limiting average variance criterion for discrete-time Markov decision processes in Borel spaces. The costs may have neither upper nor lower bounds. We propose another set of conditions under which we prove the existence of a variance minimal policy in the class of average expected cost optimal stationary policies. Our conditions are weaker than those in the previous literature. Moreover, some sufficient conditions for the existence of a variance minimal policy are imposed on the primitive data of the model. In particular, the stochastic monotonicity condition in this paper has been first used to study the limiting average variance criterion. Also, the optimality inequality approach provided here is different from the “optimality equation approach” widely used in the previous literature. Finally, we use a controlled queueing system to illustrate our results. 相似文献

8.

Risk-sensitive finite-horizon piecewise deterministic Markov decision processes

《Operations Research Letters》2020,48(1):96-103

This paper deals with risk-sensitive piecewise deterministic Markov decision processes, where the expected exponential utility of a finite-horizon reward is to be maximized. Both the transition rates and reward functions are allowed to be unbounded. Feynman–Kac’s formula is developed in our setup, using which along with an approximation technique, we establish the associated Hamilton–Jacobi–Bellman equation and the existence of risk-sensitive optimal policies under suitable conditions. 相似文献

9.

Bias optimality for multichain continuous-time Markov decision processes

Xianping Guo XinYuan Song Junyu Zhang 《Operations Research Letters》2009,37(5):317-321

This paper deals with the bias optimality of multichain models for finite continuous-time Markov decision processes. Based on new performance difference formulas developed here, we prove the convergence of a so-called bias-optimal policy iteration algorithm, which can be used to obtain bias-optimal policies in a finite number of iterations. 相似文献

10.

A useful technique for piecewise deterministic Markov decision processes

《Operations Research Letters》2021,49(1):55-61

This note presents a technique that is useful for the study of piecewise deterministic Markov decision processes (PDMDPs) with general policies and unbounded transition intensities. This technique produces an auxiliary PDMDP from the original one. The auxiliary PDMDP possesses certain desired properties, which may not be possessed by the original PDMDP. We apply this technique to risk-sensitive PDMDPs with total cost criteria, and comment on its connection with the uniformization technique. 相似文献

11.

On an extremal property of Markov chains and sufficiency of Markov strategies in Markov decision processes with the Dubins-Savage criterion

I. M. Sonin 《Annals of Operations Research》1991,29(1):417-426

An inequality regarding the minimum ofP(lim inf(X _n D _n)) is proved for a class of random sequences. This result is related to the problem of sufficiency of Markov strategies for Markov decision processes with the Dubins-Savage criterion, the asymptotical behaviour of nonhomogeneous Markov chains, and some other problems. 相似文献

12.

Average cost Markov decision processes under the hypothesis of Doeblin

Masami Kurano 《Annals of Operations Research》1991,29(1):375-385

Average cost Markov decision processes (MDPs) with compact state and action spaces and bounded lower semicontinuous cost functions are considered. Kurano [7] has treated the general case in which several ergodic classes and a transient set are permitted for the Markov process induced by any randomized stationary policy under the hypothesis of Doeblin and showed the existence of a minimum pair of state and policy. This paper considers the same case as that discussed in Kurano [7] and proves some new results which give the existence theorem of an optimal stationary policy under some reasonable conditions. 相似文献

13.

Average sample-path optimality for continuous-time Markov decision processes in Polish spaces

Quan-xin Zhu 《应用数学学报(英文版)》2011,27(4):613-624

In this paper we study the average sample-path cost(ASPC) problem for continuous-time Markov decision processes in Polish spaces.To the best of our knowledge,this paper is a first attempt to study the ASPC criterion on continuous-time MDPs with Polish state and action spaces.The corresponding transition rates are allowed to be unbounded,and the cost rates may have neither upper nor lower bounds.Under some mild hypotheses,we prove the existence of ε(ε≥ 0)-ASPC optimal stationary policies based on two differe... 相似文献

14.

Optimality equations and sensitive optimality in bounded Markov decision processes 1

《Optimization》2012,61(5):767-781

This paper consider Markov decision processes with countable state space, compact action spaces and a bounded reward function. Under some recurrence and connectedness condition, including the simultaneous Döblin condition, we prove the existence of bounded solutions of the optimality equations which arise for the multichain case in connection with the average reward criterion and sensitive optimality criteria, and we give a characterization of the sets of n-average optimal decision rules. 相似文献

15.

A new strong optimality criterion for nonstationary Markov decision processes

Xianping Guo Peng Shi Weiping Zhu 《Mathematical Methods of Operations Research》2000,52(2):287-306

This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given. 相似文献

16.

First passage Markov decision processes with constraints and varying discount factors

Xiao WU Xiaolong ZOU Xianping GUO 《Frontiers of Mathematics in China》2015,10(4):1005

This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results. 相似文献

17.

Equivalence classes for optimizing risk models in Markov decision processes

Yoshio?Ohtsubo Email author Kenji?Toyonaga 《Mathematical Methods of Operations Research》2004,60(2):239-250

相似文献

18.

Weighted discounted Markov decision processes with perturbation

刘克《应用数学学报(英文版)》1999,15(2):183-189

1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe... 相似文献

19.

Weighted Markov decision processes with perturbation

Ke Liu Jerzy A. Filar 《Mathematical Methods of Operations Research》2001,53(3):465-480

相似文献

20.

Markov decision processes under observability constraints

Yasemin Serin Vidyadhar Kulkarni 《Mathematical Methods of Operations Research》2005,61(2):311-328

相似文献