On the Bernoulli two-armed bandit problem |
| |
Abstract: | The paper is initially concerned with monotonic properties of the posterior success probabilities when the prior success probabilities are distributed according to an arbitrary joint distribution function (Bayesian approach). Next a dynamic programming model is proposed and monotonic properties of the optimal expected cumulative discounted reward are proved. Finally, optimality properties are given for the case when one prior success probability is known. |
| |
Keywords: | |
|
|