SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 10211030 of 1262 papers

TitleStatusHype
Privacy Amplification via Shuffling for Linear Contextual Bandits0
Privacy-Preserving Communication-Efficient Federated Multi-Armed Bandits0
Privacy-Preserving Multi-Party Contextual Bandits0
Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs0
Productization Challenges of Contextual Multi-Armed Bandits0
Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization0
Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems0
Provable General Function Class Representation Learning in Multitask Bandits and MDPs0
Provably and Practically Efficient Neural Contextual Bandits0
Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks0
Show:102550
← PrevPage 103 of 127Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified