SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 776800 of 1262 papers

TitleStatusHype
Preference-centric Bandits: Optimality of Mixtures and Regret-efficient Algorithms0
Privacy Amplification via Shuffling for Linear Contextual Bandits0
Privacy-Preserving Communication-Efficient Federated Multi-Armed Bandits0
Privacy-Preserving Multi-Party Contextual Bandits0
Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs0
Productization Challenges of Contextual Multi-Armed Bandits0
Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization0
Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems0
Provable General Function Class Representation Learning in Multitask Bandits and MDPs0
Provably and Practically Efficient Neural Contextual Bandits0
Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks0
Transfer Learning with Partially Observable Offline Data via Causal Bounds0
Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback0
Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits0
Provably Optimal Algorithms for Generalized Linear Contextual Bandits0
Pure Exploration in Asynchronous Federated Bandits0
Pure exploration in multi-armed bandits with low rank structure using oblivious sampler0
Combinatorial Pure Exploration of Causal Bandits0
Pure Exploration under Mediators' Feedback0
QoS-Aware Multi-Armed Bandits0
Quantile Multi-Armed Bandits with 1-bit Feedback0
Quantum contextual bandits and recommender systems for quantum data0
Quantum Heavy-tailed Bandits0
Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets0
Query-Efficient Correlation Clustering with Noisy Oracle0
Show:102550
← PrevPage 32 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified