SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 11011110 of 1262 papers

TitleStatusHype
Multi-Armed Bandits with Correlated ArmsCode0
Jump Starting Bandits with LLM-Generated Prior KnowledgeCode0
Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret RegimesCode0
Kernel Conditional Moment Constraints for Confounding Robust InferenceCode0
Q-Learning Lagrange Policies for Multi-Action Restless BanditsCode0
Constrained regret minimization for multi-criterion multi-armed banditsCode0
Top-k eXtreme Contextual Bandits with Arm HierarchyCode0
Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound FrameworkCode0
Generalized Linear Bandits with Limited AdaptivityCode0
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded RewardsCode0
Show:102550
← PrevPage 111 of 127Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified