SOTAVerified|Agents Browse Leaderboard About Blog

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 1262 papers

Title	Date	Tasks	Status	Hype
Hypothesis Generation with Large Language Models	Apr 5, 2024	Multi-Armed Bandits	CodeCode Available	2
Off-Policy Evaluation for Large Action Spaces via Embeddings	Feb 13, 2022	Multi-Armed BanditsOff-policy evaluation	CodeCode Available	2
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model	Feb 3, 2022	Multi-Armed BanditsOff-policy evaluation	CodeCode Available	2
Performance-bounded Online Ensemble Learning Method Based on Multi-armed bandits and Its Applications in Real-time Safety Assessment	Mar 19, 2025	Ensemble LearningMulti-Armed Bandits	CodeCode Available	1
Balans: Multi-Armed Bandits-based Adaptive Large Neighborhood Search for Mixed-Integer Programming Problem	Dec 18, 2024	Combinatorial OptimizationMulti-Armed Bandits	CodeCode Available	1
A unifying framework for generalised Bayesian online learning in non-stationary environments	Nov 15, 2024	Continual LearningMulti-Armed Bandits	CodeCode Available	1
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits	Oct 2, 2024	Instruction FollowingMath	CodeCode Available	1
Discovering Minimal Reinforcement Learning Environments	Jun 18, 2024	continuous-controlContinuous Control	CodeCode Available	1
In-Context Reinforcement Learning for Variable Action Spaces	Dec 20, 2023	In-Context Reinforcement LearningMulti-Armed Bandits	CodeCode Available	1
Equitable Restless Multi-Armed Bandits: A General Framework Inspired By Digital Health	Aug 17, 2023	Decision MakingFairness	CodeCode Available	1

Show:10 25 50

← PrevPage 1 of 127Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NeuralLinear FullPosterior-MR	Cumulative regret	1.92	—	Unverified
2	Linear FullPosterior-MR	Cumulative regret	1.82	—	Unverified