首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   26篇
  免费   0篇
化学   1篇
数学   24篇
物理学   1篇
  2022年   1篇
  2021年   2篇
  2019年   1篇
  2018年   1篇
  2017年   1篇
  2013年   4篇
  2012年   1篇
  2011年   1篇
  2010年   1篇
  2005年   3篇
  2003年   3篇
  2002年   1篇
  2001年   1篇
  1997年   1篇
  1993年   1篇
  1991年   1篇
  1982年   1篇
  1978年   1篇
排序方式: 共有26条查询结果,搜索用时 15 毫秒
1.
Glazebrook  K.D.  Lumley  R.R.  Ansell  P.S. 《Queueing Systems》2003,45(2):81-111
We consider the optimal service control of a multiclass M/G/1 queueing system in which customers are served nonpreemptively and the system cost rate is additive across classes and increasing convex in the numbers present in each class. Following Whittle's approach to a class of restless bandit problems, we develop a Langrangian relaxation of the service control problem which serves to motivate the development of a class of index heuristics. The index for a particular customer class is characterised as a fair charge for service of that class. The paper develops these indices and reports an extensive numerical investigation which exhibits strong performance of the index heuristics for both discounted and average costs.  相似文献   
2.
Abstract

In this article we investigate the rate of convergence of the so-called two-armed bandit algorithm. The behavior of the algorithm turns out to be highly non standard: no central limit theorem, possible occurrence of two different rates of convergence with positive probability.  相似文献   
3.
We consider a general class of multi-armed bandits (MAB) problems with sub-exponential rewards. This is primarily motivated by service systems with exponential inter-arrival and service distributions. It is well-known that the celebrated Upper Confidence Bound (UCB) algorithm can achieve tight regret bound for MAB under sub-Gaussian rewards. There has been subsequent work by Bubeck et al. (2013) [4] extending this tightness result to any reward distributions with finite variance by leveraging robust mean estimators. In this paper, we present three alternative UCB based algorithms, termed UCB-Rad, UCB-Warm, and UCB-Hybrid, specifically for MAB with sub-exponential rewards. While not being the first to achieve tight regret bounds, these algorithms are conceptually simpler and provide a more explicit analysis for this problem. Moreover, we present a rental bike revenue management application and conduct numerical experiments. We find that UCB-Warm and UCB-Hybrid outperform UCB-Rad in our computational experiments.  相似文献   
4.
A general single-server queueing network model is considered. It is well-known that an optimal policy is determined by the largest-index policy. There is an index for each given queue and one allocates the server to a queue with largest current index. Using discounted dynamic programming we give a new and short proof of this result and derive some characterizations and bounds of the indices. Moreover, it is shown that an approximate largest-index policy yields an approximately optimal policy. These results lead to efficient methods for computing the indices. In particular, we present a general largest-remaining-index method.  相似文献   
5.
6.
We introduce a family of undiscounted branching bandits on parallel servers under controls which can impose priorities between customer classes. This family can be used to model a wide range of multi-class queueing scheduling problems, with the capacity to incorporate problem features such as machine breakdowns, complex patterns of preemption/non-preemption and semi-Markov extensions. An index policy (which we call Klimov's rule) is developed which is optimal in the particular case of a single server. An expression for its cost suboptimality is given for parallel servers. Under additional conditions on the nature of the stochastic evolution of the systems concerned, the policy is shown to be asymptotically optimal in a heavy traffic limit. These general results are utilised to develop an analysis of the index policy for a parallel server version of Klimov's classical M/GI/1 system with Bernoulli feedback. This work was supported by the Engineering and Physical Research Council through the award of grant GR/M09308. The author would also like to express his appreciation to Professor I. C. Paschalidis for helpful discussions on Klimov's problem and to Professors J. Niño-Mora and G. Weiss for many discussions and much encouragement  相似文献   
7.
Bandits are a finite collection of random variables. Bandit problems are Markov decision problems in which, at each decision time, the decision maker selects a random variable (referred to as a bandit arm) and observes an outcome. The selection is based on the observation history. The objective is to sequentially choose arms so as to minimize growth (with decision time) rate of the number of suboptimal selections.The appellation bandit refers to mechanical gambling machines, and the tradition stems from the question of allocating competing treatments to a sequence of patients having the same disease. Our motivation is machine learning in which a game-playing or assembly-line adjusting computer is faced with a sequence of statistically-similar decision problems and, as resource, has access to an expanding data base relevant to these problems.The setting for the present study is nonparametric and infinite horizon. The central aim is to relate a methodology which postulates finite moments or, alternatively, bounded bandit arms. Under these circumstances, strategies proposed are shown to be asymptotically optimal and converge at guaranteed rates. In the bounded-arm case, the rate is optimal.We extend the theory to the case in which the bandit population is infinite, and share some computational experience.  相似文献   
8.
A two‐armed bandit model using a Bayesian approach is formulated and investigated in this paper with the goal of maximizing the value of a certain criterion of optimality. The bandit model illustrates the trade‐off between exploration and exploitation, where exploration means acquiring scientific acknowledge for better‐informed decisions at later stages (ie, maximizing long‐term benefit), and exploitation means applying the current knowledge for the best possible outcome at the current stage (ie, maximizing the immediate expected payoff). When one arm has known characteristics, stochastic dynamic programming is applied to characterize the optimal strategy and provide the foundation for its calculation. The results show that the celebrated Gittins index can be approximated by a monotonic sequence of break‐even values. When both arms are unknown, we derive a special case of optimality of the myopic strategy.  相似文献   
9.
We approximate the solution of nonlinear stochastic equations driven by a Gaussian white noise by solutions of similar equations, where the Gaussian noise is replaced by a weighted Poissonian point process.  相似文献   
10.
Narendra-Shapiro (NS) algorithms are bandit-type algorithms developed in the 1960s. NS-algorithms have been deeply studied in infinite horizon but little non-asymptotic results exist for this type of bandit algorithms. In this paper, we focus on a non-asymptotic study of the regret and address the following question: are Narendra-Shapiro bandit algorithms competitive from this point of view? In our main result, we obtain some uniform explicit bounds for the regret of (over)-penalized-NS algorithms. We also extend to the multi-armed case some convergence properties of penalized-NS algorithms towards a stationary Piecewise Deterministic Markov Process (PDMP). Finally, we establish some new sharp mixing bounds for these processes.  相似文献   
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号