SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 151200 of 1262 papers

TitleStatusHype
Individual Regret in Cooperative Stochastic Multi-Armed Bandits0
Variance-Aware Linear UCB with Deep Representation for Neural Contextual BanditsCode0
Multi-armed Bandits with Missing OutcomeCode0
Structure Matters: Dynamic Policy Gradient0
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF0
Rising Rested Bandits: Lower Bounds and Efficient Algorithms0
Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset0
PageRank Bandits for Link PredictionCode0
MBExplainer: Multilevel bandit-based explanations for downstream models with augmented graph embeddings0
Minimum Empirical Divergence for Sub-Gaussian Linear BanditsCode0
FedMABA: Towards Fair Federated Learning through Multi-Armed Bandits Allocation0
Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints0
Optimal Streaming Algorithms for Multi-Armed Bandits0
Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits0
Online Learning for Function Placement in Serverless ComputingCode0
Contextual Bandits with Arm Request Costs and Delays0
Is Prior-Free Black-Box Non-Stationary Reinforcement Learning Feasible?0
How Does Variance Shape the Regret in Contextual Bandits?0
Comparative Performance of Collaborative Bandit Algorithms: Effect of Sparsity and Exploration Intensity0
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing0
Contextual Bandits with Non-Stationary Correlated Rewards for User Association in MmWave Vehicular Networks0
Stochastic Bandits for Egalitarian Assignment0
Diminishing Exploration: A Minimalist Approach to Piecewise Stationary Multi-Armed Bandits0
EVOLvE: Evaluating and Optimizing LLMs For Exploration0
DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback0
High Probability Bound for Cross-Learning Contextual Bandits with Unknown Context Distributions0
Minimax-optimal trust-aware multi-armed bandits0
Online Posterior Sampling with a Diffusion Prior0
uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs0
On Lai's Upper Confidence Bound in Multi-Armed Bandits0
Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits0
Stabilizing the Kumaraswamy Distribution0
Optimism in the Face of Ambiguity Principle for Multi-Armed Bandits0
Linear Contextual Bandits with Interference0
Second Order Bounds for Contextual Bandits with Function Approximation0
Designing an Interpretable Interface for Contextual Bandits0
Causal Feature Selection Method for Contextual Multi-Armed Bandits in Recommender System0
Partially Observable Contextual Bandits with Linear Payoffs0
Batch Ensemble for Variance Dependent Regret in Stochastic Bandits0
Batched Online Contextual Sparse Bandits with Sequential Inclusion of Features0
A Hybrid Meta-Learning and Multi-Armed Bandit Approach for Context-Specific Multi-Objective Recommendation Optimization0
Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis0
Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes0
Faster Q-Learning Algorithms for Restless Bandits0
Performance-Aware Self-Configurable Multi-Agent Networks: A Distributed Submodular Approach for Simultaneous Coordination and Network DesignCode0
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits0
Representative Arm Identification: A fixed confidence approach to identify cluster representatives0
Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications0
Online Fair Division with Contextual Bandits0
Dynamic Product Image Generation and Recommendation at Scale for Personalized E-commerce0
Show:102550
← PrevPage 4 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified