One-armed bandit models with continuous and delayed responses |
| |
Authors: | Email author" target="_blank">Xikui?WangEmail author Mikelis G?Bickis |
| |
Institution: | (1) Department of Statistics, University of Manitoba, Winnipeg, Manitoba, Canada, R3T 2N2;(2) Mathematical Sciences Group, Department of Computer Science, University of Saskatchewan, Saskatoon, Saskatchewan, Canada, S7N 5E6 |
| |
Abstract: | One-armed bandit processes with continuous delayed responses are formulated as controlled stochastic processes following the Bayesian approach. It is shown that under some regularity conditions, a Gittins-like index exists which is the limit of a monotonic sequence of break-even values characterizing optimal initial selections of arms for finite horizon bandit processes. Furthermore, there is an optimal stopping solution when all observations on the unknown arm are complete. Results are illustrated with a bandit model having exponentially distributed responses, in which case the controlled stochastic process becomes a Markov decision process, the Gittins-like index is the Gittins index and the Gittins index strategy is optimal.
Acknowledgement. We thank an anonymous referee for constructive and insightful comments, especially those related to the notion of the Gittins index.Both authors are funded by the Natural Sciences and Engineering Research Council (NSERC) of Canada. |
| |
Keywords: | Bandit processes Controlled stochastic processes Delayed responses Gittins index Markov decision processes |
本文献已被 SpringerLink 等数据库收录! |
|