SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 601625 of 1262 papers

TitleStatusHype
Instance-optimal PAC Algorithms for Contextual Bandits0
Concentrated Differential Privacy for Bandits0
Contextual Bandits with Stage-wise Constraints0
A General Theory of the Stochastic Linear Bandit and Its Applications0
Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems0
Contextual Bandits with Sparse Data in Web setting0
Incentivized Exploration via Filtered Posterior Sampling0
Incentivized Exploration for Multi-Armed Bandits under Reward Drift0
BanditMF: Multi-Armed Bandit Based Matrix Factorization Recommender System0
Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement0
Is Prior-Free Black-Box Non-Stationary Reinforcement Learning Feasible?0
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon0
Joint Representation Training in Sequential Tasks with Shared Structure0
Incentivising Exploration and Recommendations for Contextual Bandits with Payments0
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits0
Contextual Bandits with Similarity Information0
Kernel ε-Greedy for Multi-Armed Bandits with Covariates0
Kernel Methods for Cooperative Multi-Agent Contextual Bandits0
KL-regularization Itself is Differentially Private in Bandits and RLHF0
Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits0
Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions0
Lagrangian Index Policy for Restless Bandits with Average Reward0
Improving Offline Contextual Bandits with Distributional Robustness0
Contextual Bandits with Side-Observations0
Exploration Through Reward Biasing: Reward-Biased Maximum Likelihood Estimation for Stochastic Multi-Armed Bandits0
Show:102550
← PrevPage 25 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified