SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 9511000 of 1262 papers

TitleStatusHype
Unreliable Multi-Armed Bandits: A Novel Approach to Recommendation Systems0
Triply Robust Off-Policy Evaluation0
Incentivized Exploration for Multi-Armed Bandits under Reward Drift0
Neural Contextual Bandits with UCB-based ExplorationCode0
Confidence Intervals for Policy Evaluation in Adaptive ExperimentsCode0
Multi-Armed Bandits with Correlated ArmsCode0
Persistency of Excitation for Robustness of Neural NetworksCode0
Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs0
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints0
Thompson Sampling via Local UncertaintyCode0
Trend-responsive User Segmentation Enabling Traceable Publishing Insights. A Case Study of a Real-world Large-scale News Recommendation System0
BanditRank: Learning to Rank Using Contextual Bandits0
Smoothness-Adaptive Contextual BanditsCode0
Multi-User MABs with User Dependent Rewards for Uncoordinated Spectrum Access0
Decentralized Heterogeneous Multi-Player Multi-Armed Bandits with Non-Zero Rewards on Collisions0
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision ProcessesCode0
Adaptive Exploration in Linear Contextual Bandit0
An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays0
Regret Bounds for Batched Bandits0
Privacy-Preserving Multi-Party Contextual Bandits0
Social Learning in Multi Agent Multi Armed Bandits0
Decision Automation for Electric Power Network Recovery0
An Optimal Algorithm for Multiplayer Multi-Armed Bandits0
NeuralUCB: Contextual Bandits with Neural Network-Based Exploration0
Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood MatchingCode0
Learning Effective Exploration Strategies For Contextual Bandits0
Practical Calculation of Gittins Indices for Multi-armed BanditsCode0
AutoML for Contextual Bandits0
Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret RegimesCode0
Censored Semi-Bandits: A Framework for Resource Allocation with Censored FeedbackCode0
A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits0
Nonparametric Contextual Bandits in an Unknown Metric Space0
Doubly-Robust Lasso BanditCode0
Scaling Multi-Armed Bandit Algorithms0
Doubly robust off-policy evaluation with shrinkage0
Parameterized Exploration0
Productization Challenges of Contextual Multi-Armed Bandits0
Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits0
Exploration Through Reward Biasing: Reward-Biased Maximum Likelihood Estimation for Stochastic Multi-Armed Bandits0
Multi-Armed Bandits with Fairness Constraints for Distributing Resources to Human Teammates0
Bayesian Optimisation over Multiple Continuous and Categorical InputsCode0
Learning in Restless Multi-Armed Bandits via Adaptive Arm Sequencing Rules0
Online Allocation and Pricing: Constant Regret via Bellman Inequalities0
Competing Bandits in Matching Markets0
Bootstrapping Upper Confidence Bound0
Beam Learning -- Using Machine Learning for Finding Beam Directions0
Stochastic Neural Network with Kronecker Flow0
Balanced off-policy evaluation in general action spaces0
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement LearningCode0
Empirical Likelihood for Contextual BanditsCode0
Show:102550
← PrevPage 20 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified