SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 11261150 of 1262 papers

TitleStatusHype
A KL-LUCB algorithm for Large-Scale Crowdsourcing0
Online Learning via the Differential Privacy Lens0
Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models0
Estimation Considerations in Contextual Bandits0
Budget-Constrained Multi-Armed Bandits with Multiple Plays0
Skyline Identification in Multi-Armed Bandits0
Small-loss bounds for online learning with partial information0
Multi-Player Bandits Revisited0
Sparsity, variance and curvature in multi-armed bandits0
Medoids in almost linear time via multi-armed banditsCode0
Multi-Armed Bandits with Metric Movement Costs0
Combinatorial Multi-armed Bandits for Real-Time Strategy Games0
An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits0
Trend Detection based Regret Minimization for Bandit Problems0
Optimal Learning for Sequential Decision Making for Expensive Cost Functions with Stochastic Binary Feedbacks0
Variational inference for the multi-armed contextual banditCode0
Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads0
Efficient Contextual Bandits in Non-stationary Worlds0
Reinforcement learning techniques for Outer Loop Link Adaptation in 4G/5G systems0
Safety-Aware Algorithms for Adversarial Contextual Bandit0
A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity0
Nonlinear Sequential Accepts and Rejects for Identification of Top Arms in Stochastic Bandits0
Efficient Reinforcement Learning via Initial Pure Exploration0
Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration0
Boltzmann Exploration Done Right0
Show:102550
← PrevPage 46 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified