Constrained denumerable state non-stationary MDPs with expected total reward criterion |
| |
Authors: | Guo Xianping |
| |
Institution: | (1) Department of Mathematics, Zhongshan University, 510275 Guangzhou, China |
| |
Abstract: | In this paper, we consider constrained denumerable state non-stationary Markov decision processes (MDPs, for short) with expected total reward criterion. By the mechanics of intro- ducing Lagrange multiplier and using the methods of probability and analytics, we prove the existence of constrained optimal policies. Moreover, we prove that a constrained optimal policy may be a Markov policy, or be a randomized Markov policy that randomizes between two Markov policies, that differ in only one state. |
| |
Keywords: | Non-stationary MDPs expected total reward criterion constrained optimal policies |
本文献已被 CNKI SpringerLink 等数据库收录! |
|