SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 901950 of 1262 papers

TitleStatusHype
Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs0
Small-loss bounds for online learning with partial information0
Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with Application to Fairness0
SmartChoices: Augmenting Software with Learned Implementations0
Smoothed Online Learning is as Easy as Statistical Learning0
Smooth Sequential Optimisation with Delayed Feedback0
Social Learning in Multi Agent Multi Armed Bandits0
Sparse Additive Contextual Bandits: A Nonparametric Approach for Online Decision-making with High-dimensional Covariates0
Sparse Nonparametric Contextual Bandits0
Sparsity, variance and curvature in multi-armed bandits0
SPRT-based Efficient Best Arm Identification in Stochastic Bandits0
Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits0
Stability Enforced Bandit Algorithms for Channel Selection in Remote State Estimation of Gauss-Markov Processes0
Stabilizing the Kumaraswamy Distribution0
Stateful Offline Contextual Policy Evaluation and Learning0
Statistical Inference with M-Estimators on Adaptively Collected Data0
Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed Bandits0
Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits0
Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond0
Concentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling0
Stochastic Bandits for Egalitarian Assignment0
Stochastic Bandits with Linear Constraints0
Stochastic Bandits with Vector Losses: Minimizing ^-Norm of Relative Losses0
Stochastic Contextual Bandits with Graph-based Contexts0
Stochastic contextual bandits with graph feedback: from independence number to MAS number0
Stochastic Contextual Bandits with Known Reward Functions0
Stochastic Contextual Bandits with Long Horizon Rewards0
Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits0
Stochastic Graph Bandit Learning with Side-Observations0
Stochastic Linear Contextual Bandits with Diverse Contexts0
Stochastic Multi-armed Bandits in Constant Space0
Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions0
Achieving Fairness in Stochastic Multi-armed Bandit Problem0
Stochastic Multi-Armed Bandits with Control Variates0
Stochastic Multi-armed Bandits with Non-stationary Rewards Generated by a Linear Dynamical System0
Stochastic Multi-Objective Multi-Armed Bandits: Regret Definition and Algorithm0
Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach0
Stochastic Neural Network with Kronecker Flow0
Strategic Linear Contextual Bandits0
Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk0
Streaming Algorithms for Stochastic Multi-armed Bandits0
Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis0
Structured Reinforcement Learning for Delay-Optimal Data Transmission in Dense mmWave Networks0
Structure Matters: Dynamic Policy Gradient0
Sublinear Optimal Policy Value Estimation in Contextual Bandits0
Surrogate Objectives for Batch Policy Optimization in One-step Decision Making0
Survey Bandits with Regret Guarantees0
Taking a hint: How to leverage loss predictors in contextual bandits?0
Target Tracking for Contextual Bandits: Application to Demand Side Management0
Task Selection and Assignment for Multi-modal Multi-task Dialogue Act Classification with Non-stationary Multi-armed Bandits0
Show:102550
← PrevPage 19 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified