SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 591600 of 1262 papers

TitleStatusHype
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences0
Off-Policy Evaluation for Large Action Spaces via EmbeddingsCode2
Shuffle Private Linear Contextual Bandits0
Efficient Kernel UCB for Contextual BanditsCode0
Remote Contextual Bandits0
Settling the Communication Complexity for Distributed Offline Reinforcement Learning0
Smoothed Online Learning is as Easy as Statistical Learning0
Budgeted Combinatorial Multi-Armed Bandits0
Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits0
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior ModelCode2
Show:102550
← PrevPage 60 of 127Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified