Convergence of Markov decision processes with constraints and state-action dependent discount factors |
| |
Authors: | Wu Xiao Guo Xianping |
| |
Institution: | 1.School of Mathematics and Statistics, Zhaoqing University, Zhaoqing, 526061, China ;2.School of Mathematics, Sun Yat-sen University, Guangzhou, 510275, China ; |
| |
Abstract: | This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes(DTMDPs) with constraints,state-action dependent discount factors,and possibly unbounded costs.Using the convex analytic approach under mild conditions,we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the "limit" one.Furthermore,we show that any countablestate DTMDP can be approximated by a sequence of finite-state DTMDPs,which are constructed using the truncation technique.Finally,we illustrate the approximation by solving a controlled queueing system numerically,and give the corresponding error bound of the approximation. |
| |
Keywords: | discrete-time Markov decision processes state-action dependent discount factors unbounded costs convergence |
本文献已被 CNKI 维普 SpringerLink 等数据库收录! |
|