SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 10011010 of 1262 papers

TitleStatusHype
Upper Counterfactual Confidence Bounds: a New Optimism Principle for Contextual Bandits0
Value Directed Exploration in Multi-Armed Bandits with Structured Priors0
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency0
Variance-Dependent Regret Lower Bounds for Contextual Bandits0
Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits0
Variational Inference for Model-Free and Model-Based Reinforcement Learning0
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences0
Vertical Federated Linear Contextual Bandits0
Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits0
Bandit algorithms to emulate human decision making using probabilistic distortions0
Show:102550
← PrevPage 101 of 127Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified