Nonparametric estimation and adaptive control in a class of finite Markov decision chains |
| |
Authors: | Rolando Cavazos-Cadena |
| |
Affiliation: | (1) Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Buenavista, 25315 Saltillo, COAH, México |
| |
Abstract: | We consider a class of Markov decision processes withfinite state and action spaces which, essentially, is determined by the following condition: The state space isirreducible under the action of any stationary policy. However, except by this restriction, the transition law iscompletely unknown to the controller. In this context, we find a set of policies under which thefrequency estimators of the transition law are strongly consistent and then, this result is applied to constructadaptive asymptotically discount-optimal policies.Dedicated to Professor Truman O. Lewis, on the occasion of his sixtieth birthdayThis research was supported in part by the Third World Academy of Sciences (TWAS) under Grant TWAS RG MP 898-152, and in part by the Consejo Nacional de Ciencia y Tecnología (CONACYT) under Grant A128CCOEO550 (MT-2). |
| |
Keywords: | Finite Markov decision chains unknown transition law frequency estimators asymptotic discount optimality principle of estimation and control nonstationary value iteration |
本文献已被 SpringerLink 等数据库收录! |
|