SOTAVerified|Agents Browse Leaderboard About Blog

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1091–1100 of 1262 papers

Title	Date	Tasks	Status
Estimation of Warfarin Dosage with Reinforcement Learning	Sep 15, 2021	Multi-Armed Banditsreinforcement-learning	CodeCode Available
Evaluating Deep Vs. Wide & Deep Learners As Contextual Bandits For Personalized Email Promo Recommendations	Jan 31, 2022	Multi-Armed BanditsThompson Sampling	CodeCode Available
Model selection for contextual bandits	Jun 3, 2019	modelModel Selection	CodeCode Available
Best Arm Identification with Fixed Budget: A Large Deviation Perspective	Dec 19, 2023	Multi-Armed Bandits	CodeCode Available
Evolutionary Multi-Armed Bandits with Genetic Thompson Sampling	Apr 26, 2022	Decision MakingEvolutionary Algorithms	CodeCode Available
Optimal Learning for Structured Bandits	Jul 14, 2020	Decision MakingDecision Making Under Uncertainty	CodeCode Available
Conditionally Risk-Averse Contextual Bandits	Oct 24, 2022	ManagementMulti-Armed Bandits	CodeCode Available
Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits	Jun 3, 2023	Multi-Armed BanditsOpen-Ended Question Answering	CodeCode Available
Confidence Intervals for Policy Evaluation in Adaptive Experiments	Nov 7, 2019	Experimental DesignMulti-Armed Bandits	CodeCode Available
Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting	Jun 18, 2020	Multi-Armed BanditsOff-policy evaluation	CodeCode Available

Show:10 25 50

← PrevPage 110 of 127Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NeuralLinear FullPosterior-MR	Cumulative regret	1.92	—	Unverified
2	Linear FullPosterior-MR	Cumulative regret	1.82	—	Unverified