SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 11911200 of 1262 papers

TitleStatusHype
Marginal Density Ratio for Off-Policy Evaluation in Contextual BanditsCode0
Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity ConstraintsCode0
Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and LearningCode0
Bayesian Design Principles for Frequentist Sequential LearningCode0
On Private Online Convex Optimization: Optimal Algorithms in _p-Geometry and High Dimensional Contextual BanditsCode0
Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application to Joint Communications and SensingCode0
Sequential Decision Making with Expert Demonstrations under Unobserved HeterogeneityCode0
Thompson Sampling for Multinomial Logit Contextual BanditsCode0
Sequential Learning of the Pareto Front for Multi-objective BanditsCode0
Medoids in almost linear time via multi-armed banditsCode0
Show:102550
← PrevPage 120 of 127Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified