SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 10761100 of 1262 papers

TitleStatusHype
A Hybrid Meta-Learning and Multi-Armed Bandit Approach for Context-Specific Multi-Objective Recommendation Optimization0
A KL-LUCB algorithm for Large-Scale Crowdsourcing0
Algorithms for Differentially Private Multi-Armed Bandits0
Algorithms for multi-armed bandit problems0
Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits0
Almost Boltzmann Exploration0
Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits0
A Model Selection Approach for Corruption Robust Reinforcement Learning0
An Adaptive Method for Contextual Stochastic Multi-armed Bandits with Rewards Generated by a Linear Dynamical System0
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits0
An Analysis of Reinforcement Learning for Malaria Control0
An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits0
A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits0
An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives0
An Efficient Algorithm for Deep Stochastic Contextual Bandits0
An Empirical Evaluation of Federated Contextual Bandit Algorithms0
An Empirical Evaluation of Thompson Sampling0
A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free0
A New Benchmark for Online Learning with Budget-Balancing Constraints0
An Exploration-free Method for a Linear Stochastic Bandit Driven by a Linear Gaussian Dynamical System0
An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits0
An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit0
An Instrumental Value for Data Production and its Application to Data Pricing0
An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays0
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits0
Show:102550
← PrevPage 44 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified