SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 11761200 of 1262 papers

TitleStatusHype
Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling BanditsCode0
Networked Restless Bandits with Positive ExternalitiesCode0
Locally Differentially Private (Contextual) Bandits LearningCode0
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Locally Private Nonparametric Contextual Multi-armed BanditsCode0
Decentralized Cooperative Stochastic BanditsCode0
Gaussian Gated Linear NetworksCode0
Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous ActionsCode0
(Almost) Free Incentivized Exploration from Decentralized Learning AgentsCode0
Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace RecoveryCode0
MABSplit: Faster Forest Training Using Multi-Armed BanditsCode0
Risk-Aware Continuous Control with Neural Contextual BanditsCode0
Thompson Sampling for Linearly Constrained BanditsCode0
Bayesian Optimisation over Multiple Continuous and Categorical InputsCode0
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson SamplingCode0
Marginal Density Ratio for Off-Policy Evaluation in Contextual BanditsCode0
Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity ConstraintsCode0
Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and LearningCode0
Bayesian Design Principles for Frequentist Sequential LearningCode0
On Private Online Convex Optimization: Optimal Algorithms in _p-Geometry and High Dimensional Contextual BanditsCode0
Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application to Joint Communications and SensingCode0
Sequential Decision Making with Expert Demonstrations under Unobserved HeterogeneityCode0
Thompson Sampling for Multinomial Logit Contextual BanditsCode0
Sequential Learning of the Pareto Front for Multi-objective BanditsCode0
Medoids in almost linear time via multi-armed banditsCode0
Show:102550
← PrevPage 48 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified