SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 10261050 of 1262 papers

TitleStatusHype
Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization0
Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems0
Provable General Function Class Representation Learning in Multitask Bandits and MDPs0
Provably and Practically Efficient Neural Contextual Bandits0
Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks0
Transfer Learning with Partially Observable Offline Data via Causal Bounds0
Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback0
Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits0
Provably Optimal Algorithms for Generalized Linear Contextual Bandits0
Pure Exploration in Asynchronous Federated Bandits0
Pure exploration in multi-armed bandits with low rank structure using oblivious sampler0
Combinatorial Pure Exploration of Causal Bandits0
Pure Exploration under Mediators' Feedback0
QoS-Aware Multi-Armed Bandits0
Quantile Multi-Armed Bandits with 1-bit Feedback0
Quantum contextual bandits and recommender systems for quantum data0
Quantum Heavy-tailed Bandits0
Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets0
Query-Efficient Correlation Clustering with Noisy Oracle0
Queue Scheduling with Adversarial Bandit Learning0
Quick-Draw Bandits: Quickly Optimizing in Nonstationary Environments with Extremely Many Arms0
Raising Student Completion Rates with Adaptive Curriculum and Contextual Bandits0
Random Effect Bandits0
Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards0
Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback0
Show:102550
← PrevPage 42 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified