SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 101110 of 1262 papers

TitleStatusHype
Near-Optimal Private Learning in Linear Contextual Bandits0
Model selection for behavioral learning data and applications to contextual bandits0
Contextual Linear Bandits with Delay as Payoff0
Improved Offline Contextual Bandits with Second-Order Bounds: Betting and Freezing0
Contextual bandits with entropy-based human feedbackCode0
Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits0
Heterogeneous Multi-agent Multi-armed Bandits on Stochastic Block Models0
Quantile Multi-Armed Bandits with 1-bit Feedback0
Towards a Sharp Analysis of Offline Policy Learning for f-Divergence-Regularized Contextual Bandits0
Nearly Tight Bounds for Cross-Learning Contextual Bandits with Graphical Feedback0
Show:102550
← PrevPage 11 of 127Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified