SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 601625 of 1262 papers

TitleStatusHype
Approximate Function Evaluation via Multi-Armed Bandits0
Reinforced Meta Active Learning0
Reward-Biased Maximum Likelihood Estimation for Neural Contextual Bandits0
PAC-Bayesian Lifelong Learning For Multi-Armed Bandits0
Restless Multi-Armed Bandits under Exogenous Global Markov Process0
Federated Online Sparse Decision Making0
Truncated LinUCB for Stochastic Linear BanditsCode0
The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player Multi-Armed Bandits with no Communication0
Cost-Efficient Distributed Learning via Combinatorial Multi-Armed Bandits0
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences0
Shuffle Private Linear Contextual Bandits0
Efficient Kernel UCB for Contextual BanditsCode0
Remote Contextual Bandits0
Settling the Communication Complexity for Distributed Offline Reinforcement Learning0
Smoothed Online Learning is as Easy as Statistical Learning0
Budgeted Combinatorial Multi-Armed Bandits0
Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits0
Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts0
Adaptive Experimentation with Delayed Binary FeedbackCode0
Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health0
Multi-armed Bandits for Link Configuration in Millimeter-wave Networks0
Context Uncertainty in Contextual Bandits with Applications to Recommender Systems0
Evaluating Deep Vs. Wide & Deep Learners As Contextual Bandits For Personalized Email Promo RecommendationsCode0
Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound FrameworkCode0
Neural Collaborative Filtering Bandits via Meta Learning0
Show:102550
← PrevPage 25 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified