首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Maximizing the length of a success run for many-armed bandits
Authors:Donald A Berry  Bert Fristedt
Institution:Department of Theoretical Statistics, University of Minnesota, Vincent Hall, Minneapolis, MN 55455, U.S.A.;School of Mathematics, University of Minnesota, Vincent Hall, Minneapolis, MN 55455, U.S.A.
Abstract:One of a number of Bernoulli processes is selected at each of a number of stages. A success at stage i is worth αi and the problem is to maximize the expected payoff before the first failure. Results of Berry and Viscusi (1981) are generalized. In particular, we show that there is always an optimal strategy that uses a single process exclusively and indefinitely whenever the arms are independent and the discount sequence (α1, α2,…) is superregular. There is not always a similar reduction in the number of strategies when the discount sequence is not superregular.
Keywords:Many-armed bandits  sequential decisions  gambling with discounting  Bernoulli processes  single-arm strategies  stay-on-a-winner rule
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号