SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 601650 of 1262 papers

TitleStatusHype
Approximate Function Evaluation via Multi-Armed Bandits0
Reinforced Meta Active Learning0
Reward-Biased Maximum Likelihood Estimation for Neural Contextual Bandits0
PAC-Bayesian Lifelong Learning For Multi-Armed Bandits0
Restless Multi-Armed Bandits under Exogenous Global Markov Process0
Federated Online Sparse Decision Making0
Truncated LinUCB for Stochastic Linear BanditsCode0
The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player Multi-Armed Bandits with no Communication0
Cost-Efficient Distributed Learning via Combinatorial Multi-Armed Bandits0
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences0
Efficient Kernel UCB for Contextual BanditsCode0
Shuffle Private Linear Contextual Bandits0
Settling the Communication Complexity for Distributed Offline Reinforcement Learning0
Remote Contextual Bandits0
Smoothed Online Learning is as Easy as Statistical Learning0
Budgeted Combinatorial Multi-Armed Bandits0
Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits0
Multi-armed Bandits for Link Configuration in Millimeter-wave Networks0
Adaptive Experimentation with Delayed Binary FeedbackCode0
Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts0
Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health0
Context Uncertainty in Contextual Bandits with Applications to Recommender Systems0
Evaluating Deep Vs. Wide & Deep Learners As Contextual Bandits For Personalized Email Promo RecommendationsCode0
Neural Collaborative Filtering Bandits via Meta Learning0
Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound FrameworkCode0
Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms0
Top-K Ranking Deep Contextual Bandits for Information Selection Systems0
Networked Restless Multi-Armed Bandits for Mobile Interventions0
Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits0
Learning Neural Contextual Bandits Through Perturbed Rewards0
Occupancy Information Ratio: Infinite-Horizon, Information-Directed, Parameterized Policy Search0
Semantic Parsing for Planning Goals as Constrained Combinatorial Contextual Bandits0
Contextual Bandits for Advertising Campaigns: A Diffusion-Model Independent Approach (Extended Version)0
Modelling Cournot Games as Multi-agent Multi-armed Bandits0
Off-Policy Evaluation Using Information Borrowing and Context-Based SwitchingCode0
Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits0
Safe Linear Leveling Bandits0
Privacy Amplification via Shuffling for Linear Contextual Bandits0
Efficient Action Poisoning Attacks on Linear Contextual Bandits0
Best Arm Identification under Additive Transfer Bandits0
Contextual Bandit Applications in Customer Support Bot0
On Submodular Contextual Bandits0
Bandits with Knapsacks beyond the Worst Case0
Identification of the Generalized Condorcet Winner in Multi-dueling BanditsCode0
Optimal Algorithms for Stochastic Contextual Preference Bandits0
Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and LearningCode0
Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization0
Asymptotically Best Causal Effect Identification with Multi-Armed Bandits0
Online Fair Revenue Maximizing Cake Division with Non-Contiguous Pieces in Adversarial Bandits0
Decentralized Upper Confidence Bound Algorithms for Homogeneous Multi-Agent Multi-Armed Bandits0
Show:102550
← PrevPage 13 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified