Maximizing the length of a success run for many-armed bandits |
| |
Authors: | Donald A Berry Bert Fristedt |
| |
Institution: | Department of Theoretical Statistics, University of Minnesota, Vincent Hall, Minneapolis, MN 55455, U.S.A.;School of Mathematics, University of Minnesota, Vincent Hall, Minneapolis, MN 55455, U.S.A. |
| |
Abstract: | One of a number of Bernoulli processes is selected at each of a number of stages. A success at stage i is worth αi and the problem is to maximize the expected payoff before the first failure. Results of Berry and Viscusi (1981) are generalized. In particular, we show that there is always an optimal strategy that uses a single process exclusively and indefinitely whenever the arms are independent and the discount sequence (α1, α2,…) is superregular. There is not always a similar reduction in the number of strategies when the discount sequence is not superregular. |
| |
Keywords: | Many-armed bandits sequential decisions gambling with discounting Bernoulli processes single-arm strategies stay-on-a-winner rule |
本文献已被 ScienceDirect 等数据库收录! |
|