SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 176200 of 1262 papers

TitleStatusHype
High Probability Bound for Cross-Learning Contextual Bandits with Unknown Context Distributions0
Minimax-optimal trust-aware multi-armed bandits0
Online Posterior Sampling with a Diffusion Prior0
uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs0
On Lai's Upper Confidence Bound in Multi-Armed Bandits0
Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits0
Stabilizing the Kumaraswamy Distribution0
Optimism in the Face of Ambiguity Principle for Multi-Armed Bandits0
Linear Contextual Bandits with Interference0
Second Order Bounds for Contextual Bandits with Function Approximation0
Designing an Interpretable Interface for Contextual Bandits0
Causal Feature Selection Method for Contextual Multi-Armed Bandits in Recommender System0
Partially Observable Contextual Bandits with Linear Payoffs0
Batch Ensemble for Variance Dependent Regret in Stochastic Bandits0
Batched Online Contextual Sparse Bandits with Sequential Inclusion of Features0
A Hybrid Meta-Learning and Multi-Armed Bandit Approach for Context-Specific Multi-Objective Recommendation Optimization0
Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis0
Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes0
Faster Q-Learning Algorithms for Restless Bandits0
Performance-Aware Self-Configurable Multi-Agent Networks: A Distributed Submodular Approach for Simultaneous Coordination and Network DesignCode0
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits0
Representative Arm Identification: A fixed confidence approach to identify cluster representatives0
Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications0
Online Fair Division with Contextual Bandits0
Dynamic Product Image Generation and Recommendation at Scale for Personalized E-commerce0
Show:102550
← PrevPage 8 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified