SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 951975 of 1262 papers

TitleStatusHype
Unreliable Multi-Armed Bandits: A Novel Approach to Recommendation Systems0
Triply Robust Off-Policy Evaluation0
Incentivized Exploration for Multi-Armed Bandits under Reward Drift0
Neural Contextual Bandits with UCB-based ExplorationCode0
Confidence Intervals for Policy Evaluation in Adaptive ExperimentsCode0
Multi-Armed Bandits with Correlated ArmsCode0
Persistency of Excitation for Robustness of Neural NetworksCode0
Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs0
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints0
Thompson Sampling via Local UncertaintyCode0
Trend-responsive User Segmentation Enabling Traceable Publishing Insights. A Case Study of a Real-world Large-scale News Recommendation System0
BanditRank: Learning to Rank Using Contextual Bandits0
Smoothness-Adaptive Contextual BanditsCode0
Multi-User MABs with User Dependent Rewards for Uncoordinated Spectrum Access0
Decentralized Heterogeneous Multi-Player Multi-Armed Bandits with Non-Zero Rewards on Collisions0
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision ProcessesCode0
Adaptive Exploration in Linear Contextual Bandit0
An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays0
Regret Bounds for Batched Bandits0
Privacy-Preserving Multi-Party Contextual Bandits0
Social Learning in Multi Agent Multi Armed Bandits0
Decision Automation for Electric Power Network Recovery0
An Optimal Algorithm for Multiplayer Multi-Armed Bandits0
NeuralUCB: Contextual Bandits with Neural Network-Based Exploration0
Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood MatchingCode0
Show:102550
← PrevPage 39 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified