首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Finding optimal memoryless policies of POMDPs under the expected average reward criterion
Authors:Yanjie Li  Baoqun Yin
Institution:a Department of Automation, University of Science and Technology of China, Hefei, Anhui 230026, China
b Division of Control and Mechatronics Engineering, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China
Abstract:In this paper, partially observable Markov decision processes (POMDPs) with discrete state and action space under the average reward criterion are considered from a recent-developed sensitivity point of view. By analyzing the average-reward performance difference formula, we propose a policy iteration algorithm with step sizes to obtain an optimal or local optimal memoryless policy. This algorithm improves the policy along the same direction as the policy iteration does and suitable step sizes guarantee the convergence of the algorithm. Moreover, the algorithm can be used in Markov decision processes (MDPs) with correlated actions. Two numerical examples are provided to illustrate the applicability of the algorithm.
Keywords:POMDPs  Performance difference  Policy iteration with step sizes  Correlated actions  Memoryless policy
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号