Maximizing the length of a success run for many-armed bandits期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Maximizing the length of a success run for many-armed bandits

Authors:	Donald A Berry Bert Fristedt

Institution:	Department of Theoretical Statistics, University of Minnesota, Vincent Hall, Minneapolis, MN 55455, U.S.A.;School of Mathematics, University of Minnesota, Vincent Hall, Minneapolis, MN 55455, U.S.A.

Abstract:	One of a number of Bernoulli processes is selected at each of a number of stages. A success at stage i is worth α_i and the problem is to maximize the expected payoff before the first failure. Results of Berry and Viscusi (1981) are generalized. In particular, we show that there is always an optimal strategy that uses a single process exclusively and indefinitely whenever the arms are independent and the discount sequence (α₁, α₂,…) is superregular. There is not always a similar reduction in the number of strategies when the discount sequence is not superregular.

Keywords:	Many-armed bandits sequential decisions gambling with discounting Bernoulli processes single-arm strategies stay-on-a-winner rule
本文献已被 ScienceDirect 等数据库收录！