非平稳MDP平均模型── 一致最优(G,B)-生成策略的存在性 Non-stationary MDP Average Model - The Existence of Persistently Optimal (G, B)-Generated Policies期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

非平稳MDP平均模型── 一致最优(G,B)-生成策略的存在性

引用本文：	郭先平.非平稳MDP平均模型── 一致最优(G,B)-生成策略的存在性[J].数学学报,2000,43(2):269-274.

作者姓名：	郭先平

作者单位：	中山大学数学系广东广州 510275

基金项目：	国家自然科学基金资助项目;广东省自然科学基金资助项目

摘要：	本文考虑的是可数状态空间任意行动空间非平稳ＭＤＰ平均模型，借鉴于ＦｅｉｎｂｅｒｇＥ．Ａ（１９９４）的思想，提出了比马氏策略和ＦｅｉｎｂｅｒｇＥ．Ａ的（ｆ，Ｂ）－生成策略和更为广泛的（Ｇ，Ｂ）－生成策略的概念，在弱遍历条件下，用概率分析的方法，证明了一致最优（Ｇ，Ｂ）－生成策略的存在性．从而将ＦｅｉｎｂｅｒｇＥ．Ａ．（１９９４）的主要结果推广到非平衡可数状态空间情形．
关键词：	马氏决策规划(MDP) 非平稳平均目标 (G B)-生成策略
文章编号：	0583-1431(2000)02-0269-06
修稿时间：	1997年11月11
Non-stationary MDP Average Model - The Existence of Persistently Optimal (G, B)-Generated Policies

GUOXian-ping.Non-stationary MDP Average Model - The Existence of Persistently Optimal (G, B)-Generated Policies[J].Acta Mathematica Sinica,2000,43(2):269-274.

Authors:	GUOXian-ping

Institution:	GUO Xian-ping (Department of Mathematics, Zhongshan University, Guangzhou 510275, P. R. China)

Abstract:	In this paper, we consider the non-stationary MDP average model with countable state space and arb.iotrary action space : Using the (f, B)-generated policies of Feinberg E. A. for reference. We put forward the (G, B)-generated policies which are the generalization of Markov policies and (f, B)-generated policies of Feinberg E. A.. By probability and analysics method, we prove the existence of persistently optimal (G, B)-generated policies. under weaker ergodict conditions.

Keywords:	Markov decision programming(MDP) Non -stationary Persistly optimality (G B)-generated policy
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《数学学报》浏览原始摘要信息
	点击此处可从《数学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏