SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 511520 of 1262 papers

TitleStatusHype
Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments0
α-Fair Contextual Bandits0
Full Gradient Deep Reinforcement Learning for Average-Reward Criterion0
Adapting to Misspecification in Contextual Bandits with Offline Regression Oracles0
Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation0
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models0
Survival of the strictest: Stable and unstable equilibria under regularized learning with partial information0
A Closer Look at Small-loss Bounds for Bandits with Graph Feedback0
Fully Gap-Dependent Bounds for Multinomial Logit Bandit0
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits0
Show:102550
← PrevPage 52 of 127Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified