首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 145 毫秒
1.
In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957).  相似文献   

2.
3.
This paper concerns countable state space Markov decision processes endowed with a (long-run expected)average reward criterion. For these models we summarize and, in some cases,extend some recent results on sufficient conditions to establish the existence of optimal stationary policies. The topics considered are the following: (i) the new assumptions introduced by Sennott in [20–23], (ii)necessary and sufficient conditions for the existence of a bounded solution to the optimality equation, and (iii) equivalence of average optimality criteria. Some problems are posed.This research was partially supported by the Third World Academy of Sciences (TWAS) under Grant No. TWAS RG MP 898-152.  相似文献   

4.
In this paper we study the average sample-path cost(ASPC) problem for continuous-time Markov decision processes in Polish spaces.To the best of our knowledge,this paper is a first attempt to study the ASPC criterion on continuous-time MDPs with Polish state and action spaces.The corresponding transition rates are allowed to be unbounded,and the cost rates may have neither upper nor lower bounds.Under some mild hypotheses,we prove the existence of ε(ε≥ 0)-ASPC optimal stationary policies based on two differe...  相似文献   

5.
This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.This research has been partially supported by NSERC Grant A-5527.  相似文献   

6.
7.
We consider a discrete time finite Markov decision process (MDP) with the discounted and weighted reward optimality criteria. In [1] the authors considered some decomposition of limiting average MDPs. In this paper, we use an analogous approach for discounted and weighted MDPs. Then, we construct some hierarchical decomposition algorithms for both discounted and weighted MDPs.  相似文献   

8.
This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature.  相似文献   

9.
This paper studies denumerable state continuous-time controlled Markov chains with the discounted reward criterion and a Borel action space. The reward and transition rates are unbounded, and the reward rates are allowed to take positive or negative values. First, we present new conditions for a nonhomogeneous Q(t)-process to be regular. Then, using these conditions, we give a new set of mild hypotheses that ensure the existence of -optimal (0) stationary policies. We also present a martingale characterization of an optimal stationary policy. Our results are illustrated with controlled birth and death processes.  相似文献   

10.
We consider discrete-timeaverage reward Markov decision processes with denumerable state space andbounded reward function. Under structural restrictions on the model the existence of an optimal stationary policy is proved; both the lim inf and lim sup average criteria are considered. In contrast to the usual approach our results donot rely on the average regard optimality equation. Rather, the arguments are based on well-known facts fromRenewal Theory.This research was supported in part by the Consejo Nacional de Ciencia y Tecnologia (CONACYT) under Grants PCEXCNA 040640 and 050156, and by SEMAC under Grant 89-1/00ifn$.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号