A survey of recent results on continuous-time Markov decision processes |
| |
Authors: | Xianping Guo Onésimo Hernández-Lerma Tomás Prieto-Rumeau Xi-Ren Cao Junyu Zhang Qiying Hu Mark E. Lewis Ricardo Vélez |
| |
Affiliation: | (1) Zhongshan University, P.R. China;(2) CINVESTAV-IPN, Mexico;(3) Universidad Nacional de Educación a Distancia, Spain;(4) Hong Kong University of Science and Technology, Hong Kong;(5) Shanghai University, China;(6) Cornell University, USA;(7) Universidad Nacinal de Educación a Distancia, Spain |
| |
Abstract: | This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) withunbounded transition rates, and reward rates that may beunbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more general MDPs or to Markov games. Research partially supported by grants NSFC, DRFP and NCET. Research partially supported by CONACyT (Mexico) Grant 45693-F. |
| |
Keywords: | Continuous-time Markov decision processes (also known as controlled Markov chains) unbounded reward and transition rates discounted reward average reward bias optimality sensitive discount criteria |
本文献已被 SpringerLink 等数据库收录! |
|