Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–175 of 1262 papers

Title	Date	Tasks	Status
Bandit Regret Scaling with the Effective Loss Range	May 15, 2017	Multi-Armed Bandits	—Unverified
Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits	Oct 13, 2021	Machine TranslationMulti-Armed Bandits	—Unverified
Bandits Don’t Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits	Nov 1, 2021	Machine TranslationMulti-Armed Bandits	—Unverified
Bandits for Learning to Explain from Explanations	Feb 7, 2021	Gaussian ProcessesMulti-Armed Bandits	—Unverified
Bandits meet Computer Architecture: Designing a Smartly-allocated Cache	Jan 31, 2016	Multi-Armed Bandits	—Unverified
Bandit Social Learning: Exploration under Myopic Behavior	Feb 15, 2023	Multi-Armed Bandits	—Unverified
Bandits Warm-up Cold Recommender Systems	Jul 10, 2014	Multi-Armed BanditsRecommendation Systems	—Unverified
Preferences Evolve And So Should Your Bandits: Bandits with Evolving States for Online Platforms	Jul 21, 2023	Multi-Armed BanditsRecommendation Systems	—Unverified
Bandits with Knapsacks beyond the Worst Case	Dec 1, 2021	Multi-Armed Bandits	—Unverified
Bandits with Partially Observable Confounded Data	Jun 11, 2020	Multi-Armed Bandits	—Unverified
Bandits with Temporal Stochastic Constraints	Nov 22, 2018	Multi-Armed Bandits	—Unverified
Banker Online Mirror Descent	Jun 16, 2021	Multi-Armed Bandits	—Unverified
Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning	Jan 25, 2023	Multi-Armed Bandits	—Unverified
Batched Bandits with Crowd Externalities	Sep 29, 2021	Multi-Armed Bandits	—Unverified
Batched Coarse Ranking in Multi-Armed Bandits	Dec 1, 2020	Multi-Armed Bandits	—Unverified
Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits	Oct 15, 2021	Multi-Armed Bandits	—Unverified
Regret Bounds for Batched Bandits	Oct 11, 2019	Multi-Armed Bandits	—Unverified
Batched Nonparametric Bandits via k-Nearest Neighbor UCB	May 15, 2025	Decision MakingMarketing	—Unverified
A Gang of Bandits	Jun 4, 2013	ClusteringMulti-Armed Bandits	—Unverified
Batched Online Contextual Sparse Bandits with Sequential Inclusion of Features	Sep 13, 2024	Decision MakingFairness	—Unverified
Batched Thompson Sampling	Oct 1, 2021	Multi-Armed BanditsThompson Sampling	—Unverified
Batched Thompson Sampling for Multi-Armed Bandits	Aug 15, 2021	Multi-Armed BanditsThompson Sampling	—Unverified
Batch Ensemble for Variance Dependent Regret in Stochastic Bandits	Sep 13, 2024	Multi-Armed BanditsReinforcement Learning (RL)	—Unverified
Towards Bayesian Data Selection	Jun 18, 2024	Active LearningAdditive models	—Unverified
Balanced off-policy evaluation in general action spaces	Jun 9, 2019	Binary Classificationcounterfactual	—Unverified

Show:10 25 50

← PrevPage 7 of 51Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NeuralLinear FullPosterior-MR	Cumulative regret	1.92	—	Unverified
2	Linear FullPosterior-MR	Cumulative regret	1.82	—	Unverified