首页 | 本学科首页   官方微博 | 高级检索  
     检索      


On the Bernoulli two-armed bandit problem
Abstract:The paper is initially concerned with monotonic properties of the posterior success probabilities when the prior success probabilities are distributed according to an arbitrary joint distribution function (Bayesian approach). Next a dynamic programming model is proposed and monotonic properties of the optimal expected cumulative discounted reward are proved. Finally, optimality properties are given for the case when one prior success probability is known.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号