SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 11011150 of 1262 papers

TitleStatusHype
PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits0
Delegating via Quitting Games0
Combining Difficulty Ranking with Multi-Armed Bandits to Sequence Educational Content0
Best arm identification in multi-armed bandits with delayed feedback0
What Doubling Tricks Can and Can't Do for Multi-Armed Bandits0
Semiparametric Contextual BanditsCode0
Multi-Armed Bandits for Correlated Markovian Environments with Smoothed Reward Feedback0
Online learning over a finite action set with limited switching0
Practical Contextual Bandits with Regression Oracles0
The K-Nearest Neighbour UCB algorithm for multi-armed bandits with covariates0
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson SamplingCode0
Contextual Bandits with Stochastic ExpertsCode0
Regional Multi-Armed Bandits0
Online Learning with an Unknown Fairness Metric0
Policy Gradients for Contextual Recommendations0
Multi-Armed Bandits on Partially Revealed Unit Interval Graphs0
More Robust Doubly Robust Off-policy Evaluation0
Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits0
Nonparametric Stochastic Contextual Bandits0
Residual Loss Prediction: Reinforcement Learning With No Incremental FeedbackCode0
Contextual memory bandit for pro-active dialog engagement0
Learning Structural Weight Uncertainty for Sequential Decision-MakingCode0
Active Search for High Recall: a Non-Stationary Extension of Thompson Sampling0
Stochastic Multi-armed Bandits in Constant Space0
Gaussian Process bandits with adaptive discretization0
A KL-LUCB algorithm for Large-Scale Crowdsourcing0
Online Learning via the Differential Privacy Lens0
Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models0
Estimation Considerations in Contextual Bandits0
Budget-Constrained Multi-Armed Bandits with Multiple Plays0
Skyline Identification in Multi-Armed Bandits0
Small-loss bounds for online learning with partial information0
Multi-Player Bandits Revisited0
Sparsity, variance and curvature in multi-armed bandits0
Medoids in almost linear time via multi-armed banditsCode0
Multi-Armed Bandits with Metric Movement Costs0
Combinatorial Multi-armed Bandits for Real-Time Strategy Games0
An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits0
Trend Detection based Regret Minimization for Bandit Problems0
Optimal Learning for Sequential Decision Making for Expensive Cost Functions with Stochastic Binary Feedbacks0
Variational inference for the multi-armed contextual banditCode0
Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads0
Efficient Contextual Bandits in Non-stationary Worlds0
Reinforcement learning techniques for Outer Loop Link Adaptation in 4G/5G systems0
Safety-Aware Algorithms for Adversarial Contextual Bandit0
A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity0
Nonlinear Sequential Accepts and Rejects for Identification of Top Arms in Stochastic Bandits0
Efficient Reinforcement Learning via Initial Pure Exploration0
Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration0
Boltzmann Exploration Done Right0
Show:102550
← PrevPage 23 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified