马氏决策规划的加速逼近算法与最小方差问题 |
| |
引用本文: | 董泽清. 马氏决策规划的加速逼近算法与最小方差问题[J]. 数学学报, 1978, 21(2): 135-150. DOI: cnki:ISSN:0583-1431.0.1978-02-003 |
| |
作者姓名: | 董泽清 |
| |
作者单位: | 中国科学院数学研究所 |
| |
摘 要: | 我们涉及的折扣马氏决策规划(有些著者称为马氏决策过程),具有状态空问与每个状态可用的决策集均为可数无穷集、次随机转移律族、有界报酬函数.给出了一个求(ε_)最优平稳策略的加速收敛逐次逼近算法,比White的逐次逼近算法更快地收敛于(ε_)最优解,并配合有非最优策略的检验准则,使算法更加得益. 设β为折扣因子,一般说β(或(ε,β))_最优平稳策略,往往是非唯一的,甚至与平稳策略类包含的策略数一样多.我们自然希望在诸β(或(ε,β))_最优平稳策略中寻求方差齐次地(关于初始状态)达(ε_)最小的策略.我们证明了这种策略确实存在,并给出了获得这种策略的算法.
|
收稿时间: | 1976-12-11 |
AN ACCELERATED SUCCESSIVE APPROXIMATION METHOD OF DISCOUNTED MARKOVIAN DECISION PROGRAMMING AND THE LEAST VARIANCE PROBLEM IN OPTIMAL POLICIES |
| |
Affiliation: | Dong Ze-qing(Institute of Mathematics, Academia Sinica) |
| |
Abstract: | The discounted Markovian decision programming of our concern consists of statespace and action-set eorresponding to every state, both of whieh are denumerable infinite sets, substochastie transition law family and boundedreward function We have given a succassive approximation method of accelerative eonvergence for (ε-) optimal stationary policy, This algorithm converges to the optimal solution more qurckly than White's successive approximation method. It has also been furnished with a testing criterion for non-optimal policy, making the algorithm more efficient.Let β be the discounting factor. Generally the β(or (ε,β))- optimal stationary policy is often not unique, and even has as many policies as contained in the stationary policy class, It is natural to hope that a policy with homogeneously (ε-) minimized variance (to the initial states) be found in the β(or (ε, β))- optimal stationary policies. We have proved that a poliey of this kind does exist, and have given an algorithm for this poliey. |
| |
Keywords: | |
本文献已被 CNKI 等数据库收录! |
| 点击此处可从《数学学报》浏览原始摘要信息 |
|
点击此处可从《数学学报》下载全文 |
|