A basic formula for performance gradient estimation of semi-Markov decision processes期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

A basic formula for performance gradient estimation of semi-Markov decision processes

Authors:	Yanjie Li Fang Cao

Institution:	1. Harbin Institute of Technology, Shenzhen Graduate School, 518055 Shenzhen, China;2. State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, 100044 Beijing, China

Abstract:	This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature.

Keywords:	Markov processes Semi-Markov decision processes Sample-path-based gradient estimation Perturbation analysis
本文献已被 ScienceDirect 等数据库收录！