SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 151175 of 1262 papers

TitleStatusHype
Bandit Regret Scaling with the Effective Loss Range0
Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits0
Bandits Don’t Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits0
Bandits for Learning to Explain from Explanations0
Bandits meet Computer Architecture: Designing a Smartly-allocated Cache0
Bandit Social Learning: Exploration under Myopic Behavior0
Bandits Warm-up Cold Recommender Systems0
Preferences Evolve And So Should Your Bandits: Bandits with Evolving States for Online Platforms0
Bandits with Knapsacks beyond the Worst Case0
Bandits with Partially Observable Confounded Data0
Bandits with Temporal Stochastic Constraints0
Banker Online Mirror Descent0
Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning0
Batched Bandits with Crowd Externalities0
Batched Coarse Ranking in Multi-Armed Bandits0
Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits0
Regret Bounds for Batched Bandits0
Batched Nonparametric Bandits via k-Nearest Neighbor UCB0
A Gang of Bandits0
Batched Online Contextual Sparse Bandits with Sequential Inclusion of Features0
Batched Thompson Sampling0
Batched Thompson Sampling for Multi-Armed Bandits0
Batch Ensemble for Variance Dependent Regret in Stochastic Bandits0
Towards Bayesian Data Selection0
Balanced off-policy evaluation in general action spaces0
Show:102550
← PrevPage 7 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified