SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 351375 of 1262 papers

TitleStatusHype
Best Arm Identification in Restless Markov Multi-Armed Bandits0
Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits0
Best arm identification in multi-armed bandits with delayed feedback0
Best Arm Identification in Linked Bandits0
Best-Arm Identification in Correlated Multi-Armed Bandits0
An Efficient Algorithm for Deep Stochastic Contextual Bandits0
Adaptive Learning Rate for Follow-the-Regularized-Leader: Competitive Analysis and Best-of-Both-Worlds0
Active Reinforcement Learning: Observing Rewards at a Cost0
Diffusion Models Meet Contextual Bandits with Large Action Spaces0
Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme0
Diffusion Approximations for Thompson Sampling0
Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost?0
Efficient Prompt Optimization Through the Lens of Best Arm Identification0
An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives0
Differentially Private Multi-Armed Bandits in the Shuffle Model0
Diminishing Exploration: A Minimalist Approach to Piecewise Stationary Multi-Armed Bandits0
Differentially Private Kernelized Contextual Bandits0
Discrete Choice Multi-Armed Bandits0
Disentangling Exploration from Exploitation0
Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication0
Be Greedy in Multi-Armed Bandits0
Distributed Differential Privacy in Multi-Armed Bandits0
Distributed Exploration in Multi-Armed Bandits0
Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards0
Meta-Learning Bandit Policies by Gradient Ascent0
Show:102550
← PrevPage 15 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified