SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 701750 of 1262 papers

TitleStatusHype
Online Meta-Learning in Adversarial Multi-Armed Bandits0
Online Posterior Sampling with a Diffusion Prior0
Online Residential Demand Response via Contextual Multi-Armed Bandits0
Online Restless Multi-Armed Bandits with Long-Term Fairness Constraints0
Online Semi-Supervised Learning with Bandit Feedback0
Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent0
Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling0
On Minimax Optimal Offline Policy Evaluation0
On No-Sensing Adversarial Multi-player Multi-armed Bandits with Collision Communications0
Towards Tractable Optimism in Model-Based Reinforcement Learning0
On Penalization in Stochastic Multi-armed Bandits0
On Private and Robust Bandits0
On Quantum Natural Policy Gradients0
On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits0
On Regret-Optimal Learning in Decentralized Multi-player Multi-armed Bandits0
On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits0
On Speeding Up Language Model Evaluation0
On Submodular Contextual Bandits0
On the bias, risk and consistency of sample means in multi-armed bandits0
On the Complexity of Representation Learning in Contextual Linear Bandits0
On the Identification and Mitigation of Weaknesses in the Knowledge Gradient Policy for Multi-Armed Bandits0
On the Importance of Uncertainty in Decision-Making with Large Language Models0
On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits0
Achieving the Pareto Frontier of Regret Minimization and Best Arm Identification in Multi-Armed Bandits0
On the Problem of Best Arm Retention0
Contextual Decision-Making with Knapsacks Beyond the Worst Case0
On The Statistical Complexity of Offline Decision-Making0
On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs0
On Universally Optimal Algorithms for A/B Testing0
Open Problem: Best Arm Identification: Almost Instance-Wise Optimality and the Gap Entropy Conjecture0
Open Problem: Model Selection for Contextual Bandits0
Open Problem: Tight Bounds for Kernelized Multi-Armed Bandits with Bernoulli Rewards0
Optimal Activation of Halting Multi-Armed Bandit Models0
Optimal Algorithms for Range Searching over Multi-Armed Bandits0
Optimal Algorithms for Stochastic Contextual Preference Bandits0
Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards0
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits0
Optimal Best-Arm Identification under Fixed Confidence with Multiple Optima0
Optimal cross-learning for contextual bandits with unknown context distributions0
Optimal Multitask Linear Regression and Contextual Bandits under Sparse Heterogeneity0
Optimal Learning for Sequential Decision Making for Expensive Cost Functions with Stochastic Binary Feedbacks0
Towards Costless Model Selection in Contextual Bandits: A Bias-Variance Perspective0
Optimal Multi-Objective Best Arm Identification with Fixed Confidence0
Optimal No-regret Learning in Repeated First-price Auctions0
Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits0
Optimal Streaming Algorithms for Multi-Armed Bandits0
Optimistic Information Directed Sampling0
Optimism in the Face of Ambiguity Principle for Multi-Armed Bandits0
Optimizing Online Advertising with Multi-Armed Bandits: Mitigating the Cold Start Problem under Auction Dynamics0
Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits0
Show:102550
← PrevPage 15 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified