首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Approximate receding horizon approach for Markov decision processes: average reward case
Authors:Hyeong Soo Chang
Institution:a Department of Computer Science and Engineering, Sogang University, Seoul, South Korea
b Department of Computer and Electrical Engineering, University of Maryland, College Park, MD 20742, USA
Abstract:We consider an approximation scheme for solving Markov decision processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finite-horizon sub-MDP of a given infinite-horizon MDP to create a stationary policy, which we call “approximate receding horizon control.” We first analyze the performance of the approximate receding horizon control for infinite-horizon average reward under an ergodicity assumption, which also generalizes the result obtained by White (J. Oper. Res. Soc. 33 (1982) 253-259). We then study two examples of the approximate receding horizon control via lower bounds to the exact solution to the sub-MDP. The first control policy is based on a finite-horizon approximation of Howard's policy improvement of a single policy and the second policy is based on a generalization of the single policy improvement for multiple policies. Along the study, we also provide a simple alternative proof on the policy improvement for countable state space. We finally discuss practical implementations of these schemes via simulation.
Keywords:Markov decision process  Receding horizon control  Infinite-horizon average reward  Policy improvement  Rollout  Ergodicity
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号