A Bayesian two‐armed bandit model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A Bayesian two‐armed bandit model

Authors:	Xikui Wang You Liang Lysa Porth

Abstract:	A two‐armed bandit model using a Bayesian approach is formulated and investigated in this paper with the goal of maximizing the value of a certain criterion of optimality. The bandit model illustrates the trade‐off between exploration and exploitation, where exploration means acquiring scientific acknowledge for better‐informed decisions at later stages (ie, maximizing long‐term benefit), and exploitation means applying the current knowledge for the best possible outcome at the current stage (ie, maximizing the immediate expected payoff). When one arm has known characteristics, stochastic dynamic programming is applied to characterize the optimal strategy and provide the foundation for its calculation. The results show that the celebrated Gittins index can be approximated by a monotonic sequence of break‐even values. When both arms are unknown, we derive a special case of optimality of the myopic strategy.

Keywords:	bandit processes Bayesian method Gittins index Markov decision processes optimal strategy