SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 401450 of 1262 papers

TitleStatusHype
Federated Linear Contextual Bandits with User-level Differential Privacy0
Tight Regret Bounds for Single-pass Streaming Multi-armed BanditsCode0
Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards0
Representation-Driven Reinforcement Learning0
Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits0
Contextual Bandits with Budgeted Information Reveal0
Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with Application to Fairness0
Meta-in-context learning in large language modelsCode0
Sequential Best-Arm Identification with Application to Brain-Computer Interface0
Efficient Training of Multi-task Combinarotial Neural Solver with Multi-armed Bandits0
Reward Teaching for Federated Multi-armed Bandits0
Stochastic Contextual Bandits with Graph-based Contexts0
First- and Second-Order Bounds for Adversarial Linear Contextual Bandits0
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded RewardsCode0
Quantum Natural Policy Gradients: Towards Sample-Efficient Reinforcement LearningCode0
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards0
Optimal Activation of Halting Multi-Armed Bandit Models0
A Field Test of Bandit Algorithms for Recommendations: Understanding the Validity of Assumptions on Human Preferences in Multi-armed BanditsCode0
Learning Personalized Decision Support Policies0
SmartChoices: Augmenting Software with Learned Implementations0
BanditQ: Fair Bandits with Guaranteed Rewards0
Full Gradient Deep Reinforcement Learning for Average-Reward Criterion0
Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms0
Federated Learning for Heterogeneous Bandits with Unobserved Contexts0
Adaptive Endpointing with Deep Contextual Multi-armed Bandits0
An Empirical Evaluation of Federated Contextual Bandit Algorithms0
On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits0
Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling0
Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback0
Flooding with Absorption: An Efficient Protocol for Heterogeneous Bandits over Complex NetworksCode0
Queue Scheduling with Adversarial Bandit Learning0
Efficient Explorative Key-term Selection Strategies for Conversational Contextual BanditsCode0
Fairness for Workers Who Pull the Arms: An Index Based Policy for Allocation of Restless Bandit Tasks0
Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards0
Approximately Stationary Bandits with Knapsacks0
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models0
Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms0
On Differentially Private Federated Linear Contextual Bandits0
Kernel Conditional Moment Constraints for Confounding Robust InferenceCode0
Active Velocity Estimation using Light Curtains via Self-Supervised Multi-Armed Bandits0
Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments0
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency0
A Blackbox Approach to Best of Both Worlds in Bandits and Beyond0
Estimating Optimal Policy Value in General Linear Contextual Bandits0
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits0
Improving Fairness in Adaptive Social Exergames via Shapley Bandits0
Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond0
Practical Contextual Bandits with Feedback Graphs0
Infinite Action Contextual Bandits with Reusable Data ExhaustCode0
Genetic multi-armed bandits: a reinforcement learning approach for discrete optimization via simulation0
Show:102550
← PrevPage 9 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified