Adaptive control of discounted Markov decision chains |
| |
Authors: | O. Hernández-Lerma S. I. Marcus |
| |
Affiliation: | (1) Departmento de Matemáticas, Centro de Investigación del IPN, Mexico City, DF, Mexico;(2) Department of Electrical Engineering, University of Texas at Austin, Austin, Texas |
| |
Abstract: | In this paper, we consider discounted-reward finite-state Markov decision processes which depend on unknown parameters. An adaptive policy inspired by the nonstationary value iteration scheme of Federgruen and Schweitzer (Ref. 1) is proposed. This policy is briefly compared with the principle of estimation and control recently obtained by Schäl (Ref. 4).This research was supported in part by the Consejo Nacional de Ciencia y Tecnología under Grant No. PCCBBNA-005008, in part by a grant from the IBM Corporation, in part by the Air Force Office of Scientific Research under Grant No. AFOSR-79-0025, in part by the National Science Foundation under Grant No. ECS-0822033, and in part by the Joint Services Electronics Program under Contract No. F49620-77-C-0101. |
| |
Keywords: | Discounted Markov decision processes with unknown parameters nonstationary value iteration parameter estimation adaptive control naï ve feedback controller |
本文献已被 SpringerLink 等数据库收录! |
|