首页 | 本学科首页   官方微博 | 高级检索  
     检索      


One-armed bandit models with continuous and delayed responses
Authors:Email author" target="_blank">Xikui?WangEmail author  Mikelis G?Bickis
Institution:(1) Department of Statistics, University of Manitoba, Winnipeg, Manitoba, Canada, R3T 2N2;(2) Mathematical Sciences Group, Department of Computer Science, University of Saskatchewan, Saskatoon, Saskatchewan, Canada, S7N 5E6
Abstract:One-armed bandit processes with continuous delayed responses are formulated as controlled stochastic processes following the Bayesian approach. It is shown that under some regularity conditions, a Gittins-like index exists which is the limit of a monotonic sequence of break-even values characterizing optimal initial selections of arms for finite horizon bandit processes. Furthermore, there is an optimal stopping solution when all observations on the unknown arm are complete. Results are illustrated with a bandit model having exponentially distributed responses, in which case the controlled stochastic process becomes a Markov decision process, the Gittins-like index is the Gittins index and the Gittins index strategy is optimal. Acknowledgement.enspWe thank an anonymous referee for constructive and insightful comments, especially those related to the notion of the Gittins index.Both authors are funded by the Natural Sciences and Engineering Research Council (NSERC) of Canada.
Keywords:Bandit processes  Controlled stochastic processes  Delayed responses  Gittins index  Markov decision processes
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号