SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 441450 of 1262 papers

TitleStatusHype
Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments0
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency0
A Blackbox Approach to Best of Both Worlds in Bandits and Beyond0
Estimating Optimal Policy Value in General Linear Contextual Bandits0
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits0
Improving Fairness in Adaptive Social Exergames via Shapley Bandits0
Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond0
Practical Contextual Bandits with Feedback Graphs0
Infinite Action Contextual Bandits with Reusable Data ExhaustCode0
Genetic multi-armed bandits: a reinforcement learning approach for discrete optimization via simulation0
Show:102550
← PrevPage 45 of 127Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified