SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 11761200 of 1262 papers

TitleStatusHype
Risk-Aware Algorithms for Adversarial Contextual Bandits0
Exploration Potential0
On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits0
On the Identification and Mitigation of Weaknesses in the Knowledge Gradient Policy for Multi-Armed Bandits0
An optimal learning method for developing personalized treatment regimes0
Making Contextual Decisions with Low Technical Debt0
Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits0
Contextual Bandits with Latent Confounders: An NMF Approach0
Open Problem: Best Arm Identification: Almost Instance-Wise Optimality and the Gap Entropy Conjecture0
Fairness in Learning: Classic and Contextual Bandits0
Graph Clustering Bandits for Recommendation0
Stochastic Contextual Bandits with Known Reward Functions0
Latent Contextual Bandits and their Application to Personalized Recommendations for New Users0
Cascading Bandits for Large-Scale Recommendation ProblemsCode0
PAC Reinforcement Learning with Rich Observations0
BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits0
Bandits meet Computer Architecture: Designing a Smartly-allocated Cache0
Personalized Course Sequence Recommendations0
On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs0
Algorithms for Differentially Private Multi-Armed Bandits0
Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits0
Context-Aware Bandits0
A Survey of Online Experiment Design with the Stochastic Multi-Armed BanditCode0
Multi-armed Bandits with Application to 5G Small Cells0
Sequential Design for Ranking Response Surfaces0
Show:102550
← PrevPage 48 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified