SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 951960 of 1262 papers

TitleStatusHype
Unreliable Multi-Armed Bandits: A Novel Approach to Recommendation Systems0
Triply Robust Off-Policy Evaluation0
Incentivized Exploration for Multi-Armed Bandits under Reward Drift0
Neural Contextual Bandits with UCB-based ExplorationCode0
Confidence Intervals for Policy Evaluation in Adaptive ExperimentsCode0
Multi-Armed Bandits with Correlated ArmsCode0
Persistency of Excitation for Robustness of Neural NetworksCode0
Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs0
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints0
Thompson Sampling via Local UncertaintyCode0
Show:102550
← PrevPage 96 of 127Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified