SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 426450 of 1262 papers

TitleStatusHype
An Empirical Evaluation of Federated Contextual Bandit Algorithms0
On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits0
Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling0
Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback0
Flooding with Absorption: An Efficient Protocol for Heterogeneous Bandits over Complex NetworksCode0
Queue Scheduling with Adversarial Bandit Learning0
Efficient Explorative Key-term Selection Strategies for Conversational Contextual BanditsCode0
Fairness for Workers Who Pull the Arms: An Index Based Policy for Allocation of Restless Bandit Tasks0
Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards0
Approximately Stationary Bandits with Knapsacks0
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models0
Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms0
On Differentially Private Federated Linear Contextual Bandits0
Kernel Conditional Moment Constraints for Confounding Robust InferenceCode0
Active Velocity Estimation using Light Curtains via Self-Supervised Multi-Armed Bandits0
Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments0
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency0
A Blackbox Approach to Best of Both Worlds in Bandits and Beyond0
Estimating Optimal Policy Value in General Linear Contextual Bandits0
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits0
Improving Fairness in Adaptive Social Exergames via Shapley Bandits0
Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond0
Practical Contextual Bandits with Feedback Graphs0
Infinite Action Contextual Bandits with Reusable Data ExhaustCode0
Genetic multi-armed bandits: a reinforcement learning approach for discrete optimization via simulation0
Show:102550
← PrevPage 18 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified