首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 296 毫秒
1.
本文考虑连续时间Markov决策过程折扣模型的均值-方差优化问题.假设状态空间和行动空间均为Polish空间,转移率和报酬率函数均无界.本文的优化目标是在折扣最优平稳策略类里,选取相应方差最小的策略.本文致力于寻找Polish空间下Markov决策过程均值-方差最优策略存在的条件.利用首次进入分解方法,本文证明均值-方差优化问题可以转化为"等价"的期望折扣优化问题,进而得到关于均值-方差优化问题的"最优方程"和均值-方差最优策略的存在性以及它相应的特征.最后,本文给出若干例子说明折扣最优策略的不唯一性和均值-方差最优策略的存在性.  相似文献   

2.
本文研究了在一般状态空间具有平均费用的非平稳Markov决策过程,把在平稳情形用补充的折扣模型的最优方程来建立平均费用的最优方程的结果,推广到非平稳的情形.利用这个结果证明了最优策略的存在性.  相似文献   

3.
本文讨论了可数状态空间、可数决策空间、次随机转移率族、有界报酬函数的马氏决策规划(MDP)的折扣模型,给出了一个非ε-最优策略的检验准则.  相似文献   

4.
非负费用折扣半马氏决策过程   总被引:1,自引:0,他引:1  
黄永辉  郭先平 《数学学报》2010,53(3):503-514
本文考虑可数状态非负费用的折扣半马氏决策过程.首先在给定半马氏决策核和策略下构造一个连续时间半马氏决策过程,然后用最小非负解方法证明值函数满足最优方程和存在ε-最优平稳策略,并进一步给出最优策略的存在性条件及其一些性质.最后,给出了值迭代算法和一个数值算例.  相似文献   

5.
本文对可数状态集、非空决策集、报酬无界的平均准则马氏决策过程,提出了一组新的条件,在此条件下存在(ε)最优平稳策略,且当最优不等式中的和有定义时最优不等式也成立。  相似文献   

6.
本文研究约束折扣半马氏决策规划问题,即在一折扣期望费用约束下,使折扣期望报酬达最大的约束最优问题,假设状态集可数,行动集为紧的非空Borel集,本文给出了p-约束最优策略的充要条件,证明了在适当的假设条件下必存在p-约束最优策略。  相似文献   

7.
报酬无界的连续时间折扣马氏决策规划   总被引:2,自引:0,他引:2  
本文讨论了报酬函数夫界,转移速率族一致有界,状态空间和行动集均可数的连续时间折扣马氏决策规划,文中引入了一为新的无界报酬函数,并在一新的马氏策略类中,证明了有界报酬下成立的所有结果。讨论了最优策略的结构,得到了该模型策略为最优的一个充要条件。  相似文献   

8.
本文首次在报酬函数及转移速率族均非一致有界的条件下,对可数状态空间,可地动集的连续时间折扣马氏决策规划进行研究,文中引入一类新的无界报酬函数,在一类新的马氏策略中,讨论了最优策略的存在性及春结构,除证明了在有界报酬和一致有界转移速率族下成立的主要结果外,本文还得到一些重要结论。  相似文献   

9.
本文考虑可数状态空间非平稳马尔可夫决策过程(MDP)的平均目标.首先,我们指出并改正了Park,et,al[1]和Alden,etal[2]的错误,并在弱于Park,etal[1]的条件下,借助于新建立的最优方程,证明了最优平均值的收敛性和平均最优马氏策略的存在性.其次,给出了ε(>0)-平均最优马氏策略的滚动式算法.  相似文献   

10.
对于一般的MDP模型,本文证明了对任意一族依赖于历史的随机策略所导致的策略测度类的任意凸组合,存在一个随机马氏策略所导致的策略测度,使得相应于它们的平均期望目标,折扣目标以及期望总报酬目标的值均分别相等,推广了E.B.Dynkin和Yushkevich[1],M.Puterman[2],E.Feinberg和A.Shwartz[3],R.Strauch[4],以及董泽清和宋京生[5]等相应的所有结果.然后还进一步证明了关于平均期望目标、折扣目标以及期望总报酬目标的最优策略,它们要么唯一,要么有无穷多个.  相似文献   

11.
This paper establishes a rather complete optimality theory for the average cost semi-Markov decision model with a denumerable state space, compact metric action sets and unbounded one-step costs for the case where the underlying Markov chains have a single ergotic set. Under a condition which, roughly speaking, requires the existence of a finite set such that the supremum over all stationary policies of the expected time and the total expected absolute cost incurred until the first return to this set are finite for any starting state, we shall verify the existence of a finite solution to the average costs optimality equation and the existence of an average cost optimal stationary policy.  相似文献   

12.
This paper deals with discrete-time Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Under general conditions, we develop an iteration algorithm for computing the optimal value function, and also prove the existence of optimal stationary policies. Furthermore, we illustrate our results with a cash-balance model.  相似文献   

13.
We study the Markov decision processes under the average-valueat-risk criterion. The state space and the action space are Borel spaces, the costs are admitted to be unbounded from above, and the discount factors are state-action dependent. Under suitable conditions, we establish the existence of optimal deterministic stationary policies. Furthermore, we apply our main results to a cash-balance model.  相似文献   

14.
Abstract

In this paper we study discrete-time Markov decision processes with average expected costs (AEC) and discount-sensitive criteria in Borel state and action spaces. The costs may have neither upper nor lower bounds. We propose another set of conditions on the system's primitive data, and under which we prove (1) AEC optimality and strong ? 1-discount optimality are equivalent; (2) a condition equivalent to strong 0-discount optimal stationary policies; and (3) the existence of strong n (n = ?1, 0)-discount optimal stationary policies. Our conditions are weaker than those in the previous literature. In particular, the “stochastic monotonicity condition” in this paper has been first used to study strong n (n = ?1, 0)-discount optimality. Moreover, we provide a new approach to prove the existence of strong 0-discount optimal stationary policies. It should be noted that our way is slightly different from those in the previous literature. Finally, we apply our results to an inventory system and a controlled queueing system.  相似文献   

15.
16.
《Optimization》2012,61(4):773-800
Abstract

In this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs.  相似文献   

17.
In this paper, we study constrained continuous-time Markov decision processes with a denumerable state space and unbounded reward/cost and transition rates. The criterion to be maximized is the expected average reward, and a constraint is imposed on an expected average cost. We give suitable conditions that ensure the existence of a constrained-optimal policy. Moreover, we show that the constrained-optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our conditions. Supported by NSFC, NCET and RFDP.  相似文献   

18.
This paper is the first attempt to investigate the risk probability criterion in semi-Markov decision processes with loss rates. The goal is to find an optimal policy with the minimum risk probability that the total loss incurred during a first passage time to some target set exceeds a loss level. First, we establish the optimality equation via a successive approximation technique, and show that the value function is the unique solution to the optimality equation. Second, we give suitable conditions, under which we prove the existence of optimal policies and develop an algorithm for computing ?-optimal policies. Finally, we apply our main results to a business system.  相似文献   

19.
This paper deals with a continuous-time Markov decision process in Borel state and action spaces and with unbounded transition rates. Under history-dependent policies, the controlled process may not be Markov. The main contribution is that for such non-Markov processes we establish the Dynkin formula, which plays important roles in establishing optimality results for continuous-time Markov decision processes. We further illustrate this by showing, for a discounted continuous-time Markov decision process, the existence of a deterministic stationary optimal policy (out of the class of history-dependent policies) and characterizing the value function through the Bellman equation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号