首页 | 本学科首页   官方微博 | 高级检索  
     


Adaptive policy-iteration and policy-value-iteration for discounted Markov decision processes
Authors:Prof. Dr. G. Hübner  Prof. Dr. M. Schäl
Affiliation:(1) Institut für Mathematische Stochastik, Universität Hamburg, Bundesstr. 55, 2000 Hamburg 13;(2) Institut für Angew. Mathematik und Informatik, Universität Bonn, Wegelerstr. 6, 5300 Bonn
Abstract:The paper is concerned with a discounted Markov decision process with an unknown parameter which is estimated anew at each stage. Algorithms are proposed which are intermediate between and include the ldquoclassicalrdquo (but time consuming) principle of estimation and control and the simpler nonstationary value iteration which converges more slowly. These algorithms perform one single policy improvement step after each estimation, and then the policy thus obtained is evaluated completely (policy-iteration) or incompletely (policy-value-iteration). The results show that especially both these methods lead to asymptotically discount optimal policies. In addition, these results are generalized to cases where systematic errors will not vanish when the number of stages increases.Dedicated to Prof. Dr. K. Hinderer on the occassion of his 60th birthday
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号