首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Modifications in the discount sequence for bandit processes
Authors:Martin L Jones  Reginald Koo
Institution:1. Mathematics Department , University of Charleston , SC Charleston, SC, 29424;2. Mathematics Department , University of South Carolina Aiken , Aiken, SC, 29801
Abstract:For discrete–timeK-armed Bandit Processes with discounting, the value V(G,A) represents an observer's optimal expected gain using discount sequence A and prior distribution G on the distributions governing the K arms or processes being observed. In this paper we address the question of which types of modifications in the discount sequence will change the value in a predictable manner. Both positive and negative results are obtained. For example, it is shown that for two-armed bandits with one arm known, permutations of a positive term with a zero term in the discount sequence will change the value in a predictable way, but that this result can not be ex­tended to general two-armed bandits. We also show that there does not exist a ldquo;best information” arm in the sense that if the first term in each of two different discount sequences is zero then under G the same arm will be selected for each discount sequence
Keywords:Bayesian Decision Making  Sequential Al­ Location Of Experiments
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号