首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Nonparametric estimation and adaptive control in a class of finite Markov decision chains
Authors:Rolando Cavazos-Cadena
Institution:(1) Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Buenavista, 25315 Saltillo, COAH, México
Abstract:We consider a class of Markov decision processes withfinite state and action spaces which, essentially, is determined by the following condition: The state space isirreducible under the action of any stationary policy. However, except by this restriction, the transition law iscompletely unknown to the controller. In this context, we find a set of policies under which thefrequency estimators of the transition law are strongly consistent and then, this result is applied to constructadaptive asymptotically discount-optimal policies.Dedicated to Professor Truman O. Lewis, on the occasion of his sixtieth birthdayThis research was supported in part by the Third World Academy of Sciences (TWAS) under Grant TWAS RG MP 898-152, and in part by the Consejo Nacional de Ciencia y Tecnología (CONACYT) under Grant A128CCOEO550 (MT-2).
Keywords:Finite Markov decision chains  unknown transition law  frequency estimators  asymptotic discount optimality  principle of estimation and control  nonstationary value iteration
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号