SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 741750 of 1262 papers

TitleStatusHype
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon0
Joint Representation Training in Sequential Tasks with Shared Structure0
Kernel-based Multi-Task Contextual Bandits in Cellular Network Configuration0
Kernel ε-Greedy for Multi-Armed Bandits with Covariates0
Kernel Methods for Cooperative Multi-Agent Contextual Bandits0
KL-regularization Itself is Differentially Private in Bandits and RLHF0
Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits0
Lagrangian Index Policy for Restless Bandits with Average Reward0
Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning0
Latent Contextual Bandits and their Application to Personalized Recommendations for New Users0
Show:102550
← PrevPage 75 of 127Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified