SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 11511200 of 1262 papers

TitleStatusHype
Combinatorial Multi-Armed Bandits with Filtered Feedback0
Boundary Crossing Probabilities for General Exponential Families0
Multi-Task Learning for Contextual Bandits0
Combinatorial Semi-Bandits with Knapsacks0
Practical Algorithms for Best-K Identification in Multi-Armed Bandits0
Bandit Regret Scaling with the Effective Loss Range0
Mostly Exploration-Free Algorithms for Contextual BanditsCode0
Value Directed Exploration in Multi-Armed Bandits with Structured Priors0
On Kernelized Multi-armed Bandits0
Efficient Benchmarking of NLP APIs using Multi-armed Bandits0
Selective Harvesting over Networks0
Horde of Bandits using Gaussian Markov Random Fields0
Contextual Linear Bandits under Noisy Features: Towards Bayesian OraclesCode0
QoS-Aware Multi-Armed Bandits0
Provably Optimal Algorithms for Generalized Linear Contextual Bandits0
Rotting Bandits0
Beyond the Hazard Rate: More Perturbation Algorithms for Adversarial Multi-armed Bandits0
Learning to Use Learners' Advice0
The Price of Differential Privacy For Online Learning0
Corralling a Band of Bandit AlgorithmsCode0
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits0
Active Search for Sparse Signals with Region Sensing0
Multi-armed Bandits: Competing with Optimal Sequences0
Bandit algorithms to emulate human decision making using probabilistic distortions0
Fair Algorithms for Infinite and Contextual Bandits0
Risk-Aware Algorithms for Adversarial Contextual Bandits0
Exploration Potential0
On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits0
On the Identification and Mitigation of Weaknesses in the Knowledge Gradient Policy for Multi-Armed Bandits0
An optimal learning method for developing personalized treatment regimes0
Making Contextual Decisions with Low Technical Debt0
Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits0
Contextual Bandits with Latent Confounders: An NMF Approach0
Open Problem: Best Arm Identification: Almost Instance-Wise Optimality and the Gap Entropy Conjecture0
Fairness in Learning: Classic and Contextual Bandits0
Graph Clustering Bandits for Recommendation0
Stochastic Contextual Bandits with Known Reward Functions0
Latent Contextual Bandits and their Application to Personalized Recommendations for New Users0
Cascading Bandits for Large-Scale Recommendation ProblemsCode0
PAC Reinforcement Learning with Rich Observations0
BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits0
Bandits meet Computer Architecture: Designing a Smartly-allocated Cache0
Personalized Course Sequence Recommendations0
On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs0
Algorithms for Differentially Private Multi-Armed Bandits0
Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits0
Context-Aware Bandits0
Multi-armed Bandits with Application to 5G Small Cells0
A Survey of Online Experiment Design with the Stochastic Multi-Armed BanditCode0
Sequential Design for Ranking Response Surfaces0
Show:102550
← PrevPage 24 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified