Nonparametric estimation and adaptive control in a class of finite Markov decision chains期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Nonparametric estimation and adaptive control in a class of finite Markov decision chains

Authors:	Rolando Cavazos-Cadena

Institution:	(1) Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Buenavista, 25315 Saltillo, COAH, México

Abstract:	We consider a class of Markov decision processes withfinite state and action spaces which, essentially, is determined by the following condition: The state space isirreducible under the action of any stationary policy. However, except by this restriction, the transition law iscompletely unknown to the controller. In this context, we find a set of policies under which thefrequency estimators of the transition law are strongly consistent and then, this result is applied to constructadaptive asymptotically discount-optimal policies.Dedicated to Professor Truman O. Lewis, on the occasion of his sixtieth birthdayThis research was supported in part by the Third World Academy of Sciences (TWAS) under Grant TWAS RG MP 898-152, and in part by the Consejo Nacional de Ciencia y Tecnología (CONACYT) under Grant A128CCOEO550 (MT-2).

Keywords:	Finite Markov decision chains unknown transition law frequency estimators asymptotic discount optimality principle of estimation and control nonstationary value iteration
本文献已被 SpringerLink 等数据库收录！