首页 | 本学科首页   官方微博 | 高级检索  
     

Risk-sensitive reinforcement learning algorithms with generalized average criterion
引用本文:殷苌茗,王汉兴,赵飞. Risk-sensitive reinforcement learning algorithms with generalized average criterion[J]. 应用数学和力学(英文版), 2007, 28(3): 405-405. DOI: 10.1007/s,10483-007-0313-x
作者姓名:殷苌茗  王汉兴  赵飞
作者单位:College of Computer and Communicational Engineering Changsha University of Science and Technology,College of Sciences,Shanghai University,College of Sciences,Shanghai University,Changsha 410076,P.R.China College of Sciences,Shanghai University,Shanghai 200444,P.R.China,Shanghai 200444,P.R.China,Shanghai 200444,P.R.China
摘    要:A new algorithm is proposed, which immolates the optimality of control policies potentially to obtain the robusticity of solutions. The robusticity of solutions maybe becomes a very important property for a learning system when there exists non-matching between theory models and practical physical system, or the practical system is not static, or the availability of a control action changes along with the variety of time. The main contribution is that a set of approximation algorithms and their convergence results are given. A generalized average operator instead of the general optimal operator max (or min) is applied to study a class of important learning algorithms, dynamic programming algorithms, and discuss their convergences from theoretic point of view. The purpose for this research is to improve the robusticity of reinforcement learning algorithms theoretically.

关 键 词:强化学习算法 风险敏感度 广义平均数标准 收敛
收稿时间:2006-03-21
修稿时间:2006-12-07

Risk-sensitive reinforcement learning algorithms with generalized average criterion
Chang-ming Yin,Wang Han-xing,Zhao Fei. Risk-sensitive reinforcement learning algorithms with generalized average criterion[J]. Applied Mathematics and Mechanics(English Edition), 2007, 28(3): 405-405. DOI: 10.1007/s,10483-007-0313-x
Authors:Chang-ming Yin  Wang Han-xing  Zhao Fei
Affiliation:1. College of Computer and Communicational Engineering,Changsha University of Science and Technology,Changsha 410076,P.R.China;College of Sciences,Shanghai University,Shanghai 200444,P.R.China
2. College of Sciences,Shanghai University,Shanghai 200444,P.R.China
Abstract:A new algorithm is proposed, which immolates the optimality of control policies potentially to obtain the robusticity of solutions. The robusticity of solutions maybe becomes a very important property for a learning system when there exists non-matching between theory models and practical physical system, or the practical system is not static, or the availability of a control action changes along with the variety of time. The main contribution is that a set of approximation algorithms and their convergence results are given. A generalized average operator instead of the general optimal operator max (or min) is applied to study a class of important learning algorithms, dynamic programming algorithms, and discuss their convergences from theoretic point of view. The purpose for this research is to improve the robusticity of reinforcement learning algorithms theoretically.
Keywords:reinforcement learning  risk-sensitive  generalized average  algorithm  convergence
本文献已被 CNKI 维普 万方数据 SpringerLink 等数据库收录!
点击此处可从《应用数学和力学(英文版)》浏览原始摘要信息
点击此处可从《应用数学和力学(英文版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号