A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players

Archive ouverte

Boursier, Etienne | Kaufmann, Emilie | Mehrabian, Abbas | Perchet, Vianney

Edité par CCSD -

International audience. We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm,a collision occurs and the involved players receive zero reward. We consider the challenging heterogeneous setting, in which different arms may have different means for different players, and propose a new and efficient algorithm that combines the idea of leveraging forced collisions for implicitcommunication and that of performing matching eliminations. We present a finite-time analysis of our algorithm, giving the first sublinear minimaxregret bound for this problem, and prove that if the optimal assignment of players to arms is unique, our algorithm attains the optimal O(ln(T )) regret,solving an open question raised at NeurIPS 2018 by Bistritz and Leshem (2018).

Suggestions

Du même auteur

SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits

Archive ouverte | Boursier, Etienne | CCSD

International audience. Motivated by cognitive radio networks, we consider the stochastic multiplayer multi-armed bandit problem, where several players pull arms simultaneously and collisions occur if one of them is...

Constant or logarithmic regret in asynchronous multiplayer bandits

Archive ouverte | Richard, Hugo | CCSD

International audience. Multiplayer bandits have recently been extensively studied because of their application to cognitive radio networks. While the literature mostly considers synchronous players, radio networks ...

First-order ANIL provably learns representations despite overparametrization

Archive ouverte | Yuksel, Oguz | CCSD

International audience. Due to its empirical success on few shot classification and reinforcement learning, meta-learning recently received a lot of interest. Meta-learning leverages data from previous tasks to quic...

Chargement des enrichissements...