SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 10011050 of 1262 papers

TitleStatusHype
PAC Reinforcement Learning with Rich Observations0
Pairwise Elimination with Instance-Dependent Guarantees for Bandits with Cost Subsidy0
Parallel Contextual Bandits in Wireless Handover Optimization0
Parallelizing Contextual Bandits0
Parameterized Exploration0
Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback0
Partially Observable Contextual Bandits with Linear Payoffs0
Personalization Paradox in Behavior Change Apps: Lessons from a Social Comparison-Based Personalized App for Physical Activity0
Personalized Course Sequence Recommendations0
Perturbed-History Exploration in Stochastic Multi-Armed Bandits0
Pessimism for Offline Linear Contextual Bandits using _p Confidence Sets0
PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits0
Phasic Diversity Optimization for Population-Based Reinforcement Learning0
Non-Stationary Off-Policy Optimization0
Player Modeling via Multi-Armed Bandits0
Policy Gradients for Contextual Recommendations0
Practical Algorithms for Best-K Identification in Multi-Armed Bandits0
Practical Contextual Bandits with Regression Oracles0
Preference-based Online Learning with Dueling Bandits: A Survey0
Preference-centric Bandits: Optimality of Mixtures and Regret-efficient Algorithms0
Privacy Amplification via Shuffling for Linear Contextual Bandits0
Privacy-Preserving Communication-Efficient Federated Multi-Armed Bandits0
Privacy-Preserving Multi-Party Contextual Bandits0
Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs0
Productization Challenges of Contextual Multi-Armed Bandits0
Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization0
Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems0
Provable General Function Class Representation Learning in Multitask Bandits and MDPs0
Provably and Practically Efficient Neural Contextual Bandits0
Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks0
Transfer Learning with Partially Observable Offline Data via Causal Bounds0
Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback0
Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits0
Provably Optimal Algorithms for Generalized Linear Contextual Bandits0
Pure Exploration in Asynchronous Federated Bandits0
Pure exploration in multi-armed bandits with low rank structure using oblivious sampler0
Combinatorial Pure Exploration of Causal Bandits0
Pure Exploration under Mediators' Feedback0
QoS-Aware Multi-Armed Bandits0
Quantile Multi-Armed Bandits with 1-bit Feedback0
Quantum contextual bandits and recommender systems for quantum data0
Quantum Heavy-tailed Bandits0
Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets0
Query-Efficient Correlation Clustering with Noisy Oracle0
Queue Scheduling with Adversarial Bandit Learning0
Quick-Draw Bandits: Quickly Optimizing in Nonstationary Environments with Extremely Many Arms0
Raising Student Completion Rates with Adaptive Curriculum and Contextual Bandits0
Random Effect Bandits0
Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards0
Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback0
Show:102550
← PrevPage 21 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified