One-armed bandit models with continuous and delayed responses期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

One-armed bandit models with continuous and delayed responses

Authors:	Email author" target="_blank">Xikui?Wang Email author Mikelis G?Bickis

Institution:	(1) Department of Statistics, University of Manitoba, Winnipeg, Manitoba, Canada, R3T 2N2;(2) Mathematical Sciences Group, Department of Computer Science, University of Saskatchewan, Saskatoon, Saskatchewan, Canada, S7N 5E6

Abstract:	One-armed bandit processes with continuous delayed responses are formulated as controlled stochastic processes following the Bayesian approach. It is shown that under some regularity conditions, a Gittins-like index exists which is the limit of a monotonic sequence of break-even values characterizing optimal initial selections of arms for finite horizon bandit processes. Furthermore, there is an optimal stopping solution when all observations on the unknown arm are complete. Results are illustrated with a bandit model having exponentially distributed responses, in which case the controlled stochastic process becomes a Markov decision process, the Gittins-like index is the Gittins index and the Gittins index strategy is optimal. Acknowledgement.We thank an anonymous referee for constructive and insightful comments, especially those related to the notion of the Gittins index.Both authors are funded by the Natural Sciences and Engineering Research Council (NSERC) of Canada.

Keywords:	Bandit processes Controlled stochastic processes Delayed responses Gittins index Markov decision processes
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏