SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 150 of 1262 papers

TitleStatusHype
Hypothesis Generation with Large Language ModelsCode2
Off-Policy Evaluation for Large Action Spaces via EmbeddingsCode2
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior ModelCode2
Discovering Minimal Reinforcement Learning EnvironmentsCode1
Multi-agent Dynamic Algorithm ConfigurationCode1
Efficient Contextual Bandits with Continuous ActionsCode1
Offline Neural Contextual Bandits: Pessimism, Optimization and GeneralizationCode1
Deep Bandits Show-Off: Simple and Efficient Exploration with Deep NetworksCode1
Implicitly normalized forecaster with clipping for linear and non-linear heavy-tailed multi-armed banditsCode1
Neural Thompson SamplingCode1
Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RLCode1
Off-Policy Evaluation via Adaptive Weighting with Data from Contextual BanditsCode1
In-Context Reinforcement Learning for Variable Action SpacesCode1
Equitable Restless Multi-Armed Bandits: A General Framework Inspired By Digital HealthCode1
BanditPAM: Almost Linear Time k-Medoids Clustering via Multi-Armed BanditsCode1
An empirical evaluation of active inference in multi-armed banditsCode1
Carousel Personalization in Music Streaming Apps with Contextual BanditsCode1
Multiplayer Multi-armed Bandits for Optimal Assignment in Heterogeneous NetworksCode1
Federated Multi-Armed BanditsCode1
Generalized Linear Bandits with Local Differential PrivacyCode1
Langevin Monte Carlo for Contextual BanditsCode1
Neural Exploitation and Exploration of Contextual BanditsCode1
Pervasive Machine Learning for Smart Radio Environments Enabled by Reconfigurable Intelligent SurfacesCode1
Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence ModelingCode1
SplitPlace: AI Augmented Splitting and Placement of Large-Scale Neural Networks in Mobile Edge EnvironmentsCode1
Performance-bounded Online Ensemble Learning Method Based on Multi-armed bandits and Its Applications in Real-time Safety AssessmentCode1
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed BanditsCode1
Indexability is Not Enough for Whittle: Improved, Near-Optimal Algorithms for Restless BanditsCode1
Hierarchical Adaptive Contextual Bandits for Resource Constraint based RecommendationCode1
A Modern Introduction to Online LearningCode1
Anytime-valid off-policy inference for contextual banditsCode1
A unifying framework for generalised Bayesian online learning in non-stationary environmentsCode1
Balans: Multi-Armed Bandits-based Adaptive Large Neighborhood Search for Mixed-Integer Programming ProblemCode1
BanditPAM: Almost Linear Time k-Medoids Clustering via Multi-Armed BanditsCode1
EE-Net: Exploitation-Exploration Neural Networks in Contextual BanditsCode1
Competing for Shareable Arms in Multi-Player Multi-Armed BanditsCode1
Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions ModelingCode1
Adapting to Delays and Data in Adversarial Multi-Armed Bandits0
A Classification View on Meta Learning Bandits0
Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits0
Adapting Bandit Algorithms for Settings with Sequentially Available Arms0
AdaptEx: A Self-Service Contextual Bandit Platform0
Achieving User-Side Fairness in Contextual Bandits0
α-Fair Contextual Bandits0
AdaLinUCB: Opportunistic Learning for Contextual Bandits0
Active Velocity Estimation using Light Curtains via Self-Supervised Multi-Armed Bandits0
Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling0
Active Search for Sparse Signals with Region Sensing0
A Batch Sequential Halving Algorithm without Performance Degradation0
Active Search for High Recall: a Non-Stationary Extension of Thompson Sampling0
Show:102550
← PrevPage 1 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified