SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 926950 of 1262 papers

TitleStatusHype
A General Theory of the Stochastic Linear Bandit and Its Applications0
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles0
Adversarial Attacks on Linear Contextual Bandits0
Inference for Batched Bandits0
Selfish Robustness and Equilibria in Multi-Player Bandits0
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity0
A Closer Look at Small-loss Bounds for Bandits with Graph Feedback0
Safe Exploration for Optimizing Contextual BanditsCode0
Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits0
Bandits with Knapsacks beyond the Worst-Case0
Ballooning Multi-Armed Bandits0
Incentivising Exploration and Recommendations for Contextual Bandits with Payments0
Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits0
Gradient-free Online Learning in Continuous Games with Delayed Rewards0
Exploration Through Bias: Revisiting Biased Maximum Likelihood Estimation in Stochastic Multi-Armed Bandits0
Fair Contextual Multi-Armed Bandits: Theory and Experiments0
Sublinear Optimal Policy Value Estimation in Contextual Bandits0
Nonparametric Contextual Bandits in Metric Spaces with Unknown Metric0
Epsilon-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits0
Thompson Sampling for Multinomial Logit Contextual BanditsCode0
Offline Contextual Bandits with High Probability Fairness GuaranteesCode0
Learning in Generalized Linear Contextual Bandits with Stochastic Delays0
Surrogate Objectives for Batch Policy Optimization in One-step Decision Making0
Contextual Combinatorial Conservative Bandits0
Automatic Ensemble Learning for Online Influence Maximization0
Show:102550
← PrevPage 38 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified