SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 401450 of 1262 papers

TitleStatusHype
On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits0
Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling0
Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback0
Flooding with Absorption: An Efficient Protocol for Heterogeneous Bandits over Complex NetworksCode0
Queue Scheduling with Adversarial Bandit Learning0
Efficient Explorative Key-term Selection Strategies for Conversational Contextual BanditsCode0
Fairness for Workers Who Pull the Arms: An Index Based Policy for Allocation of Restless Bandit Tasks0
Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards0
Approximately Stationary Bandits with Knapsacks0
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models0
Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms0
On Differentially Private Federated Linear Contextual Bandits0
Kernel Conditional Moment Constraints for Confounding Robust InferenceCode0
Active Velocity Estimation using Light Curtains via Self-Supervised Multi-Armed Bandits0
Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments0
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency0
A Blackbox Approach to Best of Both Worlds in Bandits and Beyond0
Estimating Optimal Policy Value in General Linear Contextual Bandits0
Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond0
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits0
Improving Fairness in Adaptive Social Exergames via Shapley Bandits0
Practical Contextual Bandits with Feedback Graphs0
Infinite Action Contextual Bandits with Reusable Data ExhaustCode0
Genetic multi-armed bandits: a reinforcement learning approach for discrete optimization via simulation0
Bandit Social Learning: Exploration under Myopic Behavior0
Adversarial Rewards in Universal Learning for Contextual Bandits0
Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application to Joint Communications and SensingCode0
Leveraging User-Triggered Supervision in Contextual Bandits0
On Private and Robust Bandits0
Multiplier Bootstrap-based Exploration0
Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback0
Stochastic Contextual Bandits with Long Horizon Rewards0
Quantum contextual bandits and recommender systems for quantum data0
Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback0
Adversarial Attacks on Adversarial Bandits0
A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback0
Contextual Causal Bayesian Optimisation0
Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits0
Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning0
Quantum Heavy-tailed Bandits0
Multi-Armed Bandits and Quantum Channel Oracles0
Multi-armed Bandit Learning for TDMA Transmission Slot Scheduling and Defragmentation for Improved Bandwidth Usage0
Best Arm Identification in Stochastic Bandits: Beyond β-optimality0
Local Differential Privacy for Sequential Decision Making in a Changing Environment0
Contextual Bandits and Optimistically Universal Learning0
Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent0
On the Complexity of Representation Learning in Contextual Linear Bandits0
MABSplit: Faster Forest Training Using Multi-Armed BanditsCode0
Faster Maximum Inner Product Search in High Dimensions0
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes0
Show:102550
← PrevPage 9 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified