首页 | 本学科首页   官方微博 | 高级检索  
     检索      

连续时间马尔可夫决策过程的折扣模型
引用本文:郭先平,戴永隆.连续时间马尔可夫决策过程的折扣模型[J].数学学报,2002,45(1):171-182.
作者姓名:郭先平  戴永隆
作者单位:中山大学统计系,广东,广州,510275
基金项目:国家自然科学基金(19361060),广东省自然科学基金,香港中山大学高等学术中心基金会资助项目
摘    要:本文考虑的是转移速率族任意且费用率函数可能无界的连续时间马尔可夫决策过程的折扣模型.放弃了传统的要求相应于每个策略的 Q -过程唯一等条件,而首次考虑相应每个策略的 Q -过程不一定唯一, 转移速率族也不一定保守, 费用率函数可能无界, 且允许行动空间非空任意的情形. 本文首次用"α-折扣费用最优不等式"更新了传统的α-折扣费用最优方程,并用"最优不等式"和新的方法,不仅证明了传统的主要结果即最优平稳策略的存在性, 而且还进一步探讨了( ∈>0  )-最优平稳策略,具有单调性质的最优平稳策略, 以及(∈≥0) -最优决策过程的存在性, 得到了一些有意义的新结果. 最后, 提供了一个迁移率受控的生灭系统例子, 它满足本文的所有条件, 而传统的假设(见文献[1-14])均不成立.

关 键 词:Q-过程非唯一  无界费用  (∈≥0)-最优平稳策略  (∈≥0)-最优决策过程  单调最优策略
文章编号:0583-1431(2002)01-0171-12
修稿时间:1999年10月20

The Unbounded Cost Discounted Model for Coutinuous Time Markov Decision
GUO Xian Ping,DAI Yong Long.The Unbounded Cost Discounted Model for Coutinuous Time Markov Decision[J].Acta Mathematica Sinica,2002,45(1):171-182.
Authors:GUO Xian Ping  DAI Yong Long
Institution:GUO Xian Ping , DAI Yong Long (Department of Mathematics, Zhongshan University, Guangzhou 510275 , P. R. China ) ( Fax: (020)84037978 ; , E-mail: mcsgxp@zsu.edu.cn;stsdaiy@zdu.edu.cn)
Abstract:This paper deals with the discounted cost model for coutinuous time Markov decision processes with arbitrary transition rates and possible unbounded costs. Getting rid of the traditional condition that the Q -process deduced from a given policy is needed to be unique, we first consider the case of the Q-processes being not unique, of the costs possibly being unbounded, and of the action space being arbitrary. By first using the "α-discounted cost optimality inequality"instead of the traditional optimality equation, we not only prove the existence of optimal stationary policies, but also derive the existence of (∈>0) -optimal stationary policies, of the monotone optimal stationary policies, and of the (∈≥0) -optimal decision processes. Moreover, some new results are obtained. Finally, a substantial applied example of birth and death processes witn controlled migration rates is used to show that our assumptions here are satisfied, whereas the conditions in 1-14] fail to hold.
Keywords:Q-processes Being not unique  Unbounded costs    (∈>0)  -optimal stationary policy  (∈≥0)  -optimal decision process  Monotone optimal policy
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《数学学报》浏览原始摘要信息
点击此处可从《数学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号