Sandeep Juneja: Stochastic Multi Armed Bandits and Heavy Tails
Time: Mon 2022-05-30 15.15 - 16.15
Location: KTH, Lindstedtsvägen 25, Room 3721 and Zoom
Video link: Zoom meeting ID: 621 4469 8204
Lecturer: Sandeep Juneja (TIFR Mumbai)
In this talk we consider two classic and popular problems in the stochastic multi armed bandit settings. 1) Regret minimization, where given a finite set of unknown reward distributions (or arms) that can be sequentially sampled, the aim is to sample in a manner that maximizes the expected reward, or equivalently, minimizes the expected regret, over a large sampling horizon. 2) Best arm identification, where in a similar sequential setting the aim is to select the arm with the largest mean using a minimum number of samples on average while keeping the probability of false selection to a pre-specified and small value. Both these problems with a myriad of variations have been well studied in the literature. The analysis techniques for the two problems are typically different and typically the arm distributions are restricted to a small class such as a single parameter exponential family or distributions that are bounded with known bounds. In practice, such restrictions often do not hold and the arm distributions may even be heavy-tailed. In this talk we discuss how to optimally solve both the above problems when minimal restrictions are imposed on the arm distributions. Further, we highlight the commonalities of techniques in solving the two problems.