马氏决策规划的加速逼近算法与最小方差问题 AN ACCELERATED SUCCESSIVE APPROXIMATION METHOD OF DISCOUNTED MARKOVIAN DECISION PROGRAMMING AND THE LEAST VARIANCE PROBLEM IN OPTIMAL POLICIES期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

马氏决策规划的加速逼近算法与最小方差问题

引用本文：	董泽清.马氏决策规划的加速逼近算法与最小方差问题[J].数学学报,1978,21(2):135-150.

作者姓名：	董泽清

作者单位：	中国科学院数学研究所

摘要：	我们涉及的折扣马氏决策规划(有些著者称为马氏决策过程),具有状态空问与每个状态可用的决策集均为可数无穷集、次随机转移律族、有界报酬函数.给出了一个求(ε_)最优平稳策略的加速收敛逐次逼近算法,比White的逐次逼近算法更快地收敛于(ε_)最优解,并配合有非最优策略的检验准则,使算法更加得益. 设β为折扣因子,一般说β(或(ε,β))_最优平稳策略,往往是非唯一的,甚至与平稳策略类包含的策略数一样多.我们自然希望在诸β(或(ε,β))_最优平稳策略中寻求方差齐次地(关于初始状态)达(ε_)最小的策略.我们证明了这种策略确实存在,并给出了获得这种策略的算法.
收稿时间：	1976-12-11
AN ACCELERATED SUCCESSIVE APPROXIMATION METHOD OF DISCOUNTED MARKOVIAN DECISION PROGRAMMING AND THE LEAST VARIANCE PROBLEM IN OPTIMAL POLICIES

Institution:	Dong Ze-qing(Institute of Mathematics, Academia Sinica)

Abstract:	The discounted Markovian decision programming of our concern consists of statespace and action-set eorresponding to every state, both of whieh are denumerable infinite sets, substochastie transition law family and boundedreward function We have given a succassive approximation method of accelerative eonvergence for (ε-) optimal stationary policy, This algorithm converges to the optimal solution more qurckly than White's successive approximation method. It has also been furnished with a testing criterion for non-optimal policy, making the algorithm more efficient.Let β be the discounting factor. Generally the β(or (ε,β))- optimal stationary policy is often not unique, and even has as many policies as contained in the stationary policy class, It is natural to hope that a policy with homogeneously (ε-) minimized variance (to the initial states) be found in the β(or (ε, β))- optimal stationary policies. We have proved that a poliey of this kind does exist, and have given an algorithm for this poliey.

Keywords:
本文献已被 CNKI 等数据库收录！
	点击此处可从《数学学报》浏览原始摘要信息
	点击此处可从《数学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏