SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 901950 of 1262 papers

TitleStatusHype
Convex Hull Monte-Carlo Tree Search0
Online Residential Demand Response via Contextual Multi-Armed Bandits0
A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option0
Generalized Policy Elimination: an efficient algorithm for Nonparametric Contextual Bandits0
Stochastic Linear Contextual Bandits with Diverse Contexts0
Robustness Guarantees for Mode Estimation with an Application to Bandits0
Taking a hint: How to leverage loss predictors in contextual bandits?0
Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits0
Model Selection in Contextual Stochastic Bandit Problems0
Bounded Regret for Finitely Parameterized Multi-Armed Bandits0
Decentralized Multi-player Multi-armed Bandits with No Collision Information0
Designing Truthful Contextual Multi-Armed Bandits based Sponsored Search Auctions0
Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis0
Bandit Learning with Delayed Impact of Actions0
The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many ArmsCode0
Survey Bandits with Regret Guarantees0
Online Learning in Contextual Bandits using Gated Linear Networks0
Residual Bootstrap Exploration for Bandit Algorithms0
On conditional versus marginal bias in multi-armed bandits0
Adaptive Estimator Selection for Off-Policy EvaluationCode0
Coordination without communication: optimal regret in two players multi-armed bandits0
Tight Lower Bounds for Combinatorial Multi-Armed Bandits0
A General Theory of the Stochastic Linear Bandit and Its Applications0
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles0
Adversarial Attacks on Linear Contextual Bandits0
Inference for Batched Bandits0
Selfish Robustness and Equilibria in Multi-Player Bandits0
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity0
Safe Exploration for Optimizing Contextual BanditsCode0
A Closer Look at Small-loss Bounds for Bandits with Graph Feedback0
Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits0
Bandits with Knapsacks beyond the Worst-Case0
Ballooning Multi-Armed Bandits0
Incentivising Exploration and Recommendations for Contextual Bandits with Payments0
Exploration Through Bias: Revisiting Biased Maximum Likelihood Estimation in Stochastic Multi-Armed Bandits0
Gradient-free Online Learning in Continuous Games with Delayed Rewards0
Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits0
A Modern Introduction to Online LearningCode1
Fair Contextual Multi-Armed Bandits: Theory and Experiments0
Sublinear Optimal Policy Value Estimation in Contextual Bandits0
Surrogate Objectives for Batch Policy Optimization in One-step Decision Making0
Offline Contextual Bandits with High Probability Fairness GuaranteesCode0
Learning in Generalized Linear Contextual Bandits with Stochastic Delays0
Nonparametric Contextual Bandits in Metric Spaces with Unknown Metric0
Epsilon-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits0
Thompson Sampling for Multinomial Logit Contextual BanditsCode0
Contextual Combinatorial Conservative Bandits0
Automatic Ensemble Learning for Online Influence Maximization0
Corruption-robust exploration in episodic reinforcement learning0
Contextual Bandits Evolving Over Finite Time0
Show:102550
← PrevPage 19 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified