SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 11511200 of 1262 papers

TitleStatusHype
Finding All ε-Good Arms in Stochastic BanditsCode0
Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit FeedbackCode0
Let's Get It Started: Fostering the Discoverability of New Releases on DeezerCode0
Ranking In Generalized Linear BanditsCode0
Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic ProgrammingCode0
Finite-time Analysis of Globally Nonstationary Multi-Armed BanditsCode0
Online Limited Memory Neural-Linear Bandits with Likelihood MatchingCode0
Online Matching: A Real-time Bandit System for Large-scale RecommendationsCode0
Thompson Sampling for Contextual Bandits with Linear PayoffsCode0
Semiparametric Contextual BanditsCode0
Performance-Aware Self-Configurable Multi-Agent Networks: A Distributed Submodular Approach for Simultaneous Coordination and Network DesignCode0
Active Feature Selection for the Mutual Information CriterionCode0
Corralling a Band of Bandit AlgorithmsCode0
Online Semi-Supervised Learning in Contextual Bandits with Episodic RewardCode0
Correlated Multi-armed Bandits with a Latent Random SourceCode0
A New Bandit Setting Balancing Information from State Evolution and Corrupted ContextCode0
Linear Contextual Bandits with Hybrid Payoff: RevisitedCode0
Persistency of Excitation for Robustness of Neural NetworksCode0
Thompson Sampling for High-Dimensional Sparse Linear Contextual BanditsCode0
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit ApproachCode0
Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many ArmsCode0
Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual BanditsCode0
A Convex Framework for Confounding Robust InferenceCode0
From Restless to Contextual: A Thresholding Bandit Approach to Improve Finite-horizon PerformanceCode0
From Theory to Practice with RAVEN-UCB: Addressing Non-Stationarity in Multi-Armed Bandits through Variance AdaptationCode0
Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling BanditsCode0
Networked Restless Bandits with Positive ExternalitiesCode0
Locally Differentially Private (Contextual) Bandits LearningCode0
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Locally Private Nonparametric Contextual Multi-armed BanditsCode0
Decentralized Cooperative Stochastic BanditsCode0
Gaussian Gated Linear NetworksCode0
Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous ActionsCode0
(Almost) Free Incentivized Exploration from Decentralized Learning AgentsCode0
Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace RecoveryCode0
MABSplit: Faster Forest Training Using Multi-Armed BanditsCode0
Risk-Aware Continuous Control with Neural Contextual BanditsCode0
Thompson Sampling for Linearly Constrained BanditsCode0
Bayesian Optimisation over Multiple Continuous and Categorical InputsCode0
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson SamplingCode0
Marginal Density Ratio for Off-Policy Evaluation in Contextual BanditsCode0
Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity ConstraintsCode0
Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and LearningCode0
Bayesian Design Principles for Frequentist Sequential LearningCode0
On Private Online Convex Optimization: Optimal Algorithms in _p-Geometry and High Dimensional Contextual BanditsCode0
Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application to Joint Communications and SensingCode0
Sequential Decision Making with Expert Demonstrations under Unobserved HeterogeneityCode0
Thompson Sampling for Multinomial Logit Contextual BanditsCode0
Sequential Learning of the Pareto Front for Multi-objective BanditsCode0
Medoids in almost linear time via multi-armed banditsCode0
Show:102550
← PrevPage 24 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified