A basic formula for performance gradient estimation of semi-Markov decision processes |
| |
Authors: | Yanjie Li Fang Cao |
| |
Institution: | 1. Harbin Institute of Technology, Shenzhen Graduate School, 518055 Shenzhen, China;2. State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, 100044 Beijing, China |
| |
Abstract: | This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature. |
| |
Keywords: | Markov processes Semi-Markov decision processes Sample-path-based gradient estimation Perturbation analysis |
本文献已被 ScienceDirect 等数据库收录! |
|