SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 151175 of 1262 papers

TitleStatusHype
Individual Regret in Cooperative Stochastic Multi-Armed Bandits0
Variance-Aware Linear UCB with Deep Representation for Neural Contextual BanditsCode0
Multi-armed Bandits with Missing OutcomeCode0
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF0
Structure Matters: Dynamic Policy Gradient0
Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset0
Rising Rested Bandits: Lower Bounds and Efficient Algorithms0
PageRank Bandits for Link PredictionCode0
MBExplainer: Multilevel bandit-based explanations for downstream models with augmented graph embeddings0
Minimum Empirical Divergence for Sub-Gaussian Linear BanditsCode0
FedMABA: Towards Fair Federated Learning through Multi-Armed Bandits Allocation0
Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints0
Optimal Streaming Algorithms for Multi-Armed Bandits0
Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits0
Is Prior-Free Black-Box Non-Stationary Reinforcement Learning Feasible?0
Contextual Bandits with Arm Request Costs and Delays0
Online Learning for Function Placement in Serverless ComputingCode0
How Does Variance Shape the Regret in Contextual Bandits?0
Comparative Performance of Collaborative Bandit Algorithms: Effect of Sparsity and Exploration Intensity0
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing0
Contextual Bandits with Non-Stationary Correlated Rewards for User Association in MmWave Vehicular Networks0
EVOLvE: Evaluating and Optimizing LLMs For Exploration0
Diminishing Exploration: A Minimalist Approach to Piecewise Stationary Multi-Armed Bandits0
Stochastic Bandits for Egalitarian Assignment0
DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback0
Show:102550
← PrevPage 7 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified