SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 11811190 of 1262 papers

TitleStatusHype
Making Contextual Decisions with Low Technical Debt0
Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits0
Contextual Bandits with Latent Confounders: An NMF Approach0
Open Problem: Best Arm Identification: Almost Instance-Wise Optimality and the Gap Entropy Conjecture0
Fairness in Learning: Classic and Contextual Bandits0
Graph Clustering Bandits for Recommendation0
Stochastic Contextual Bandits with Known Reward Functions0
Latent Contextual Bandits and their Application to Personalized Recommendations for New Users0
Cascading Bandits for Large-Scale Recommendation ProblemsCode0
PAC Reinforcement Learning with Rich Observations0
Show:102550
← PrevPage 119 of 127Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified