首页 | 本学科首页   官方微博 | 高级检索  
     


An improved algorithm for solving communicating average reward Markov decision processes
Authors:Moshe Haviv  Martin L. Puterman
Affiliation:(1) Faculty of Commerce and Business Administration, The University of British Columbia, 2053 Main Mall, V6T 1Y8 Vancouver, B.C., Canada
Abstract:
This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.This research has been partially supported by NSERC Grant A-5527.
Keywords:Markov decision processes  policy iteration  communicating classes  unichain policies  multichain policies
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号