首页 | 本学科首页   官方微博 | 高级检索  
     检索      

马氏决策规划的加速逼近算法与最小方差问题
引用本文:董泽清.马氏决策规划的加速逼近算法与最小方差问题[J].数学学报,1978,21(2):135-150.
作者姓名:董泽清
作者单位:中国科学院数学研究所
摘    要:我们涉及的折扣马氏决策规划(有些著者称为马氏决策过程),具有状态空问与每个状态可用的决策集均为可数无穷集、次随机转移律族、有界报酬函数.给出了一个求(ε_)最优平稳策略的加速收敛逐次逼近算法,比White的逐次逼近算法更快地收敛于(ε_)最优解,并配合有非最优策略的检验准则,使算法更加得益. 设β为折扣因子,一般说β(或(ε,β))_最优平稳策略,往往是非唯一的,甚至与平稳策略类包含的策略数一样多.我们自然希望在诸β(或(ε,β))_最优平稳策略中寻求方差齐次地(关于初始状态)达(ε_)最小的策略.我们证明了这种策略确实存在,并给出了获得这种策略的算法.

收稿时间:1976-12-11

AN ACCELERATED SUCCESSIVE APPROXIMATION METHOD OF DISCOUNTED MARKOVIAN DECISION PROGRAMMING AND THE LEAST VARIANCE PROBLEM IN OPTIMAL POLICIES
Institution:Dong Ze-qing(Institute of Mathematics, Academia Sinica)
Abstract:The discounted Markovian decision programming of our concern consists of statespace and action-set eorresponding to every state, both of whieh are denumerable infinite sets, substochastie transition law family and boundedreward function We have given a succassive approximation method of accelerative eonvergence for (ε-) optimal stationary policy, This algorithm converges to the optimal solution more qurckly than White's successive approximation method. It has also been furnished with a testing criterion for non-optimal policy, making the algorithm more efficient.Let β be the discounting factor. Generally the β(or (ε,β))- optimal stationary policy is often not unique, and even has as many policies as contained in the stationary policy class, It is natural to hope that a policy with homogeneously (ε-) minimized variance (to the initial states) be found in the β(or (ε, β))- optimal stationary policies. We have proved that a poliey of this kind does exist, and have given an algorithm for this poliey.
Keywords:
本文献已被 CNKI 等数据库收录!
点击此处可从《数学学报》浏览原始摘要信息
点击此处可从《数学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号