首页 | 本学科首页   官方微博 | 高级检索  
     检索      


A basic formula for performance gradient estimation of semi-Markov decision processes
Authors:Yanjie Li  Fang Cao
Institution:1. Harbin Institute of Technology, Shenzhen Graduate School, 518055 Shenzhen, China;2. State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, 100044 Beijing, China
Abstract:This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature.
Keywords:Markov processes  Semi-Markov decision processes  Sample-path-based gradient estimation  Perturbation analysis
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号