Start of Main Content
Author(s)

Lin Fan

Peter Glynn

Much of the literature on optimal design of bandit algorithms is based on minimization of expected regret. It is well known that algorithms that are optimal over certain exponential families can achieve expected regret that grows logarithmically in the number of trials at a rate specified by the Lai–Robbins lower bound. In this paper, we show that when one uses such optimized algorithms, the resulting regret distribution necessarily has a very heavy tail, specifically that of a truncated Cauchy distribution. Furthermore, for p > 1, the pth moment of the regret distribution grows much faster than polylogarithmically, in particular as a power of the total number of trials. We show that optimized upper-confidence bound (UCB) algorithms are also fragile in an additional sense; namely, when the problem is even slightly misspecified, the regret can grow much faster than the conventional theory suggests. Our arguments are based on standard change-of-measure ideas and indicate that the most likely way that regret becomes larger than expected is when the optimal arm returns below-average rewards in the first few arm plays, thereby causing the algorithm to believe that the arm is suboptimal. To alleviate the fragility issues exposed, we show that UCB algorithms can be modified so as to ensure a desired degree of robustness to misspecification. In doing so, we also show a sharp trade-off between the amount of UCB exploration and the heaviness of the resulting regret distribution tail.
Date Published: 2025
Citations: Fan, Lin, Peter Glynn. 2025. The Fragility of Optimized Bandit Algorithms. Operations Research.