Modifications in the discount sequence for bandit processes |
| |
Authors: | Martin L Jones Reginald Koo |
| |
Institution: | 1. Mathematics Department , University of Charleston , SC Charleston, SC, 29424;2. Mathematics Department , University of South Carolina Aiken , Aiken, SC, 29801 |
| |
Abstract: | For discrete–timeK-armed Bandit Processes with discounting, the value V(G,A) represents an observer's optimal expected gain using discount sequence A and prior distribution G on the distributions governing the K arms or processes being observed. In this paper we address the question of which types of modifications in the discount sequence will change the value in a predictable manner. Both positive and negative results are obtained. For example, it is shown that for two-armed bandits with one arm known, permutations of a positive term with a zero term in the discount sequence will change the value in a predictable way, but that this result can not be extended to general two-armed bandits. We also show that there does not exist a ldquo;best information” arm in the sense that if the first term in each of two different discount sequences is zero then under G the same arm will be selected for each discount sequence |
| |
Keywords: | Bayesian Decision Making Sequential Al Location Of Experiments |
|
|