首页 | 本学科首页   官方微博 | 高级检索  
     


Regret bounds for Narendra-Shapiro bandit algorithms
Authors:Sébastien Gadat  Sofiane Saadane
Affiliation:1. Toulouse School of Economics, UMR 5604, Université Toulouse I Capitole , Toulouse, France.;2. Institut de Mathématiques de Toulouse, UMR 5219 , Toulouse Cedex 9, France.
Abstract:Narendra-Shapiro (NS) algorithms are bandit-type algorithms developed in the 1960s. NS-algorithms have been deeply studied in infinite horizon but little non-asymptotic results exist for this type of bandit algorithms. In this paper, we focus on a non-asymptotic study of the regret and address the following question: are Narendra-Shapiro bandit algorithms competitive from this point of view? In our main result, we obtain some uniform explicit bounds for the regret of (over)-penalized-NS algorithms. We also extend to the multi-armed case some convergence properties of penalized-NS algorithms towards a stationary Piecewise Deterministic Markov Process (PDMP). Finally, we establish some new sharp mixing bounds for these processes.
Keywords:Regret  stochastic bandit algorithms  piecewise deterministic Markov processes
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号