排序方式: 共有26条查询结果,搜索用时 15 毫秒
1.
We consider the optimal service control of a multiclass M/G/1 queueing system in which customers are served nonpreemptively and the system cost rate is additive across classes and increasing convex in the numbers present in each class. Following Whittle's approach to a class of restless bandit problems, we develop a Langrangian relaxation of the service control problem which serves to motivate the development of a class of index heuristics. The index for a particular customer class is characterised as a fair charge for service of that class. The paper develops these indices and reports an extensive numerical investigation which exhibits strong performance of the index heuristics for both discounted and average costs. 相似文献
2.
Damien Lamberton 《随机分析与应用》2013,31(3):603-623
Abstract In this article we investigate the rate of convergence of the so-called two-armed bandit algorithm. The behavior of the algorithm turns out to be highly non standard: no central limit theorem, possible occurrence of two different rates of convergence with positive probability. 相似文献
3.
《Operations Research Letters》2021,49(5):728-733
We consider a general class of multi-armed bandits (MAB) problems with sub-exponential rewards. This is primarily motivated by service systems with exponential inter-arrival and service distributions. It is well-known that the celebrated Upper Confidence Bound (UCB) algorithm can achieve tight regret bound for MAB under sub-Gaussian rewards. There has been subsequent work by Bubeck et al. (2013) [4] extending this tightness result to any reward distributions with finite variance by leveraging robust mean estimators. In this paper, we present three alternative UCB based algorithms, termed UCB-Rad, UCB-Warm, and UCB-Hybrid, specifically for MAB with sub-exponential rewards. While not being the first to achieve tight regret bounds, these algorithms are conceptually simpler and provide a more explicit analysis for this problem. Moreover, we present a rental bike revenue management application and conduct numerical experiments. We find that UCB-Warm and UCB-Hybrid outperform UCB-Rad in our computational experiments. 相似文献
4.
Svend-Holger Friis Ulrich Rieder Jürgen Weishaupt 《Mathematical Methods of Operations Research》1993,37(2):187-205
A general single-server queueing network model is considered. It is well-known that an optimal policy is determined by the largest-index policy. There is an index for each given queue and one allocates the server to a queue with largest current index. Using discounted dynamic programming we give a new and short proof of this result and derive some characterizations and bounds of the indices. Moreover, it is shown that an approximate largest-index policy yields an approximately optimal policy. These results lead to efficient methods for computing the indices. In particular, we present a general largest-remaining-index method. 相似文献
5.
6.
K. D. Glazebrook 《Mathematical Methods of Operations Research》2003,58(1):1-28
We introduce a family of undiscounted branching bandits on parallel servers under controls which can impose priorities between customer classes. This family can be used to model a wide range of multi-class queueing scheduling problems, with the capacity to incorporate problem features such as machine breakdowns, complex patterns of preemption/non-preemption and semi-Markov extensions. An index policy (which we call Klimov's rule) is developed which is optimal in the particular case of a single server. An expression for its cost suboptimality is given for parallel servers. Under additional conditions on the nature of the stochastic evolution of the systems concerned, the policy is shown to be asymptotically optimal in a heavy traffic limit. These general results are utilised to develop an analysis of the index policy for a parallel server version of Klimov's classical M/GI/1 system with Bernoulli feedback.
This work was supported by the Engineering and Physical Research Council through the award of grant GR/M09308. The author would also like to express his appreciation to Professor I. C. Paschalidis for helpful discussions on Klimov's problem and to Professors J. Niño-Mora and G. Weiss for many discussions and much encouragement 相似文献
7.
Bandits are a finite collection of random variables. Bandit problems are Markov decision problems in which, at each decision time, the decision maker selects a random variable (referred to as a bandit arm) and observes an outcome. The selection is based on the observation history. The objective is to sequentially choose arms so as to minimize growth (with decision time) rate of the number of suboptimal selections.The appellation bandit refers to mechanical gambling machines, and the tradition stems from the question of allocating competing treatments to a sequence of patients having the same disease. Our motivation is machine learning in which a game-playing or assembly-line adjusting computer is faced with a sequence of statistically-similar decision problems and, as resource, has access to an expanding data base relevant to these problems.The setting for the present study is nonparametric and infinite horizon. The central aim is to relate a methodology which postulates finite moments or, alternatively, bounded bandit arms. Under these circumstances, strategies proposed are shown to be asymptotically optimal and converge at guaranteed rates. In the bounded-arm case, the rate is optimal.We extend the theory to the case in which the bandit population is infinite, and share some computational experience. 相似文献
8.
A two‐armed bandit model using a Bayesian approach is formulated and investigated in this paper with the goal of maximizing the value of a certain criterion of optimality. The bandit model illustrates the trade‐off between exploration and exploitation, where exploration means acquiring scientific acknowledge for better‐informed decisions at later stages (ie, maximizing long‐term benefit), and exploitation means applying the current knowledge for the best possible outcome at the current stage (ie, maximizing the immediate expected payoff). When one arm has known characteristics, stochastic dynamic programming is applied to characterize the optimal strategy and provide the foundation for its calculation. The results show that the celebrated Gittins index can be approximated by a monotonic sequence of break‐even values. When both arms are unknown, we derive a special case of optimality of the myopic strategy. 相似文献
9.
《Stochastics An International Journal of Probability and Stochastic Processes》2013,85(3-4):139-160
We approximate the solution of nonlinear stochastic equations driven by a Gaussian white noise by solutions of similar equations, where the Gaussian noise is replaced by a weighted Poissonian point process. 相似文献
10.
Sébastien Gadat Sofiane Saadane 《Stochastics An International Journal of Probability and Stochastic Processes》2018,90(6):886-926
Narendra-Shapiro (NS) algorithms are bandit-type algorithms developed in the 1960s. NS-algorithms have been deeply studied in infinite horizon but little non-asymptotic results exist for this type of bandit algorithms. In this paper, we focus on a non-asymptotic study of the regret and address the following question: are Narendra-Shapiro bandit algorithms competitive from this point of view? In our main result, we obtain some uniform explicit bounds for the regret of (over)-penalized-NS algorithms. We also extend to the multi-armed case some convergence properties of penalized-NS algorithms towards a stationary Piecewise Deterministic Markov Process (PDMP). Finally, we establish some new sharp mixing bounds for these processes. 相似文献