Modifications in the discount sequence for bandit processes期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Modifications in the discount sequence for bandit processes

Authors:	Martin L Jones Reginald Koo

Institution:	1. Mathematics Department , University of Charleston , SC Charleston, SC, 29424;2. Mathematics Department , University of South Carolina Aiken , Aiken, SC, 29801

Abstract:	For discrete–timeK-armed Bandit Processes with discounting, the value V(G,A) represents an observer's optimal expected gain using discount sequence A and prior distribution G on the distributions governing the K arms or processes being observed. In this paper we address the question of which types of modifications in the discount sequence will change the value in a predictable manner. Both positive and negative results are obtained. For example, it is shown that for two-armed bandits with one arm known, permutations of a positive term with a zero term in the discount sequence will change the value in a predictable way, but that this result can not be extended to general two-armed bandits. We also show that there does not exist a ldquo;best information” arm in the sense that if the first term in each of two different discount sequences is zero then under G the same arm will be selected for each discount sequence

Keywords:	Bayesian Decision Making Sequential Al Location Of Experiments