SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 781790 of 1262 papers

TitleStatusHype
Productization Challenges of Contextual Multi-Armed Bandits0
Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization0
Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems0
Provable General Function Class Representation Learning in Multitask Bandits and MDPs0
Provably and Practically Efficient Neural Contextual Bandits0
Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks0
Transfer Learning with Partially Observable Offline Data via Causal Bounds0
Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback0
Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits0
Provably Optimal Algorithms for Generalized Linear Contextual Bandits0
Show:102550
← PrevPage 79 of 127Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified