SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 11311140 of 1262 papers

TitleStatusHype
Quantum Natural Policy Gradients: Towards Sample-Efficient Reinforcement LearningCode0
Output-Weighted Sampling for Multi-Armed Bandits with Extreme PayoffsCode0
Offline Contextual Bandits with Overparameterized ModelsCode0
Solving Inverse Problem for Multi-armed Bandits via Convex OptimizationCode0
Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action SpacesCode0
Learning Structural Weight Uncertainty for Sequential Decision-MakingCode0
Nonstationary Continuum-Armed Bandit Strategies for Automated Trading in a Simulated Financial MarketCode0
Contextual Bandits with Stochastic ExpertsCode0
Empirical analysis of representation learning and exploration in neural kernel banditsCode0
Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewardsCode0
Show:102550
← PrevPage 114 of 127Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified